# 介绍

DeepARG 是一种机器学习解决方案,它使用深度学习来表征和注释宏基因组中的抗生素抗性基因。它由两种输入模型组成:短序列 Reads 和 gene-like 序列。

Automatic annotation of highly homologous ARGs

# 安装软件

  • 通过 conda 安装

    # 创建环境
    conda create -n deeparg_env python=2.7.18
    conda activate deeparg_env
    # 安装 diamond
    conda install -c bioconda diamond==0.9.24
    # 安装其他依赖
    conda install trimmomatic vsearch bedtools bowtie2 samtools
    # 安装 DeepARG
    pip install deeparg==1.0.2
    # 下载数据库等, -o 指定下载路径
    deeparg download_data -o tools/deeparg
  • 通过其他方法安装

    参考官方文档进行。

# 使用软件

# 预测 Reads 中的 ARGs

# 输入文件

双端 Reads。

# 命令

deeparg short_reads_pipeline --forward_pe_file Reads/LD201221-0003_S20210104-0015_F01_clean.R1.fq.gz --reverse_pe_file Reads/LD201221-0003_S20210104-0015_F01_clean.R2.fq.gz --output_file F01.deeparg -d ~/tools/deeparg/
参数解析

-h, --help show this help message and exit
--forward_pe_file FORWARD_PE_FILE: forward mate from paired end library
--reverse_pe_file REVERSE_PE_FILE: reverse mate from paired end library
--output_file OUTPUT_FILE: save results to this file prefix
-d DEEPARG_DATA_PATH: Path where data was downloaded [see deeparg download-data --help for details]
--deeparg_identity DEEPARG_IDENTITY: minimum identity for ARG alignments [default 80]
--deeparg_probability DEEPARG_PROBABILITY: minimum probability for considering a reads as ARG-like [default 0.8]
--deeparg_evalue DEEPARG_EVALUE: minimum e-value for ARG alignments [default 1e-10]
--gene_coverage GENE_COVERAGE: minimum coverage required for considering a full gene in percentage. This parameter looks at the full gene and all hits that align to the gene. If the overlap of all hits is below the threshold the gene is discarded. Use with caution [default 1]

# 预测 FASTA 序列中的 ARGs

# 输入文件

可以是 核苷酸序列 或者是 氨基酸序列

# 命令

# 1) Annotate gene-like sequences when the input is a nucleotide FASTA file:
    deeparg predict --model LS --type nucl --input /path/file.fasta --out /path/to/out/file.out
# 2) Annotate gene-like sequences when the input is an amino acid FASTA file:
    deeparg predict --model LS --type prot --input /path/file.fasta --out /path/to/out/file.out
# 3) Annotate short sequence reads when the input is a nucleotide FASTA file:
    deeparg predict --model SS --type nucl --input /path/file.fasta --out /path/to/out/file.out
# 4) Annotate short sequence reads when the input is a protein FASTA file (unusual case):
    deeparg predict --model SS --type prot --input /path/file.fasta --out /path/to/out/file.out
参数解析

usage: deeparg predict
-h, --help show this help message and exit
--model MODEL: Select model to use (short sequences for reads | long sequences for genes) SS|LS [No default]
-i, --input-file INPUT_FILE: Input file (Fasta input file)
-o, --output-file OUTPUT_FILE: Output file where to store results
-d, --data-path DATA_PATH: Path where data was downloaded [see deeparg download-data --help for details]
--type TYPE: Molecular data type prot/nucl [Default: nucl]
--min-prob MIN_PROB: Minimum probability cutoff [Default: 0.8]
--arg-alignment-identity ARG_ALIGNMENT_IDENTITY: Identity cutoff for sequence alignment [Default: 50]
--arg-alignment-evalue ARG_ALIGNMENT_EVALUE: Evalue cutoff [Default: 1e-10]
--arg-alignment-overlap ARG_ALIGNMENT_OVERLAP: Alignment read overlap [Default: 0.8]
--arg-num-alignments-per-entry ARG_NUM_ALIGNMENTS_PER_ENTRY: Diamond, minimum number of alignments per entry [Default: 1000]
--model-version MODEL_VERSION: Model deepARG version [Default: v2]

# 输出

* ARG_NAME
* QUERY_START
* QUERY_END
* QUERY_ID
* PREDICTED_ARG_CLASS
* BEST_HIT_FROM_DATABASE
* PREDICTION_PROBABILITY
* ALIGNMENT_BESTHIT_IDENTITY (%)
* ALIGNMENT_BESTHIT_LENGTH
* ALIGNMENT_BESTHIT_BITSCORE
* ALIGNMENT_BESTHIT_EVALUE
* COUNTS

# 参考

  • DeepARG 官网
  • DeepARG Repository

# 代码获取

关注公众号 “生信之巅”,聊天窗口回复 “92eb” 获取下载链接。

生信之巅微信公众号生信之巅小程序码

敬告:使用文中脚本请引用本文网址,请尊重本人的劳动成果,谢谢!Notice: When you use the scripts in this article, please cite the link of this webpage. Thank you!

Edited on Views times

Give me a cup of [coffee]~( ̄▽ ̄)~*

Hualin Liu WeChat Pay

WeChat Pay

Hualin Liu Alipay

Alipay