# 介绍
DeepARG 是一种机器学习解决方案,它使用深度学习来表征和注释宏基因组中的抗生素抗性基因。它由两种输入模型组成:短序列 Reads 和 gene-like 序列。
# 安装软件
通过
conda
安装# 创建环境
conda create -n deeparg_env python=2.7.18
conda activate deeparg_env
# 安装 diamond
conda install -c bioconda diamond==0.9.24
# 安装其他依赖
conda install trimmomatic vsearch bedtools bowtie2 samtools
# 安装 DeepARG
pip install deeparg==1.0.2
# 下载数据库等, -o 指定下载路径
deeparg download_data -o tools/deeparg
通过其他方法安装
参考官方文档进行。
# 使用软件
# 预测 Reads 中的 ARGs
# 输入文件
双端 Reads。
# 命令
deeparg short_reads_pipeline --forward_pe_file Reads/LD201221-0003_S20210104-0015_F01_clean.R1.fq.gz --reverse_pe_file Reads/LD201221-0003_S20210104-0015_F01_clean.R2.fq.gz --output_file F01.deeparg -d ~/tools/deeparg/ |
参数解析
-h, --help
show this help message and exit--forward_pe_file
FORWARD_PE_FILE: forward mate from paired end library--reverse_pe_file
REVERSE_PE_FILE: reverse mate from paired end library--output_file
OUTPUT_FILE: save results to this file prefix-d
DEEPARG_DATA_PATH: Path where data was downloaded [see deeparg download-data --help for details]--deeparg_identity
DEEPARG_IDENTITY: minimum identity for ARG alignments [default 80]--deeparg_probability
DEEPARG_PROBABILITY: minimum probability for considering a reads as ARG-like [default 0.8]--deeparg_evalue
DEEPARG_EVALUE: minimum e-value for ARG alignments [default 1e-10]--gene_coverage
GENE_COVERAGE: minimum coverage required for considering a full gene in percentage. This parameter looks at the full gene and all hits that align to the gene. If the overlap of all hits is below the threshold the gene is discarded. Use with caution [default 1]
# 预测 FASTA 序列中的 ARGs
# 输入文件
可以是 核苷酸序列
或者是 氨基酸序列
。
# 命令
# 1) Annotate gene-like sequences when the input is a nucleotide FASTA file: | |
deeparg predict --model LS --type nucl --input /path/file.fasta --out /path/to/out/file.out | |
# 2) Annotate gene-like sequences when the input is an amino acid FASTA file: | |
deeparg predict --model LS --type prot --input /path/file.fasta --out /path/to/out/file.out | |
# 3) Annotate short sequence reads when the input is a nucleotide FASTA file: | |
deeparg predict --model SS --type nucl --input /path/file.fasta --out /path/to/out/file.out | |
# 4) Annotate short sequence reads when the input is a protein FASTA file (unusual case): | |
deeparg predict --model SS --type prot --input /path/file.fasta --out /path/to/out/file.out |
参数解析
usage: deeparg predict-h, --help
show this help message and exit--model
MODEL: Select model to use (short sequences for reads | long sequences for genes) SS|LS [No default]-i, --input-file
INPUT_FILE: Input file (Fasta input file)-o, --output-file
OUTPUT_FILE: Output file where to store results-d, --data-path
DATA_PATH: Path where data was downloaded [see deeparg download-data --help for details]--type
TYPE: Molecular data type prot/nucl [Default: nucl]--min-prob
MIN_PROB: Minimum probability cutoff [Default: 0.8]--arg-alignment-identity
ARG_ALIGNMENT_IDENTITY: Identity cutoff for sequence alignment [Default: 50]--arg-alignment-evalue
ARG_ALIGNMENT_EVALUE: Evalue cutoff [Default: 1e-10]--arg-alignment-overlap
ARG_ALIGNMENT_OVERLAP: Alignment read overlap [Default: 0.8]--arg-num-alignments-per-entry
ARG_NUM_ALIGNMENTS_PER_ENTRY: Diamond, minimum number of alignments per entry [Default: 1000]--model-version
MODEL_VERSION: Model deepARG version [Default: v2]
# 输出
* ARG_NAME | |
* QUERY_START | |
* QUERY_END | |
* QUERY_ID | |
* PREDICTED_ARG_CLASS | |
* BEST_HIT_FROM_DATABASE | |
* PREDICTION_PROBABILITY | |
* ALIGNMENT_BESTHIT_IDENTITY (%) | |
* ALIGNMENT_BESTHIT_LENGTH | |
* ALIGNMENT_BESTHIT_BITSCORE | |
* ALIGNMENT_BESTHIT_EVALUE | |
* COUNTS |
# 参考
- DeepARG 官网
- DeepARG Repository
# 代码获取
关注公众号 “生信之巅”,聊天窗口回复 “92eb” 获取下载链接。
敬告:使用文中脚本请引用本文网址,请尊重本人的劳动成果,谢谢!