向 NCBI 提交基因簇的时候需要提供 sqn 格式的文件,这个文件需要通过 tbl2asn 生成。
# 文件准备
tbl2asn 依赖三个文件来生成 sqn 文件:
- 文件 1:fasta 格式的基因组序列文件
Header 处的中括号部分可以不写。
>Toyoncin_biosynthesis_gene_cluster [organism=Bacillus toyonensis] [strain=XIN-YC13] [topology=linear] [moltype=DNA] [tech=wgs] [gcode=11] [country=China] Bacillus toyonensis strain XIN-YC13 Toyoncin biosynthesis gene cluster, complete sequence | |
ttaaaa taatttaata | |
gggaagtttt ttagttgttt tggactcttc ccaaacactg ctttaagtgt tggattaaca | |
tcatccctat tccccgaaaa cataatgtga ggatttatga ataatgcata tgctctaaca | |
ttattatcat caacaccact ctctgaacga gccataatac ccttatcaat taattttcta | |
accaatggac taactttagt ttcatgtctt ccaatttttt tagctaattc tcgctgagtt | |
aatgggattt gttctttcga attaatatca ttaactaaac aattacttaa aaaacctaca | |
cacattgaaa tatctactaa aaataccttc tcagcatttg ttaaataatc aatttcgaac | |
aaatactgga tattttgttg aataatctga acaaacttcg ctttattttt cactttacgc | |
tcaggaacta atttcattcc tcttgaacga gcttttgatt gaagtttatt tgctaaatac | |
atctcttctt cagacaagac ctttaaatcc tcaatatctc tcaatcttga atttttttca | |
gcttgttcta agttgataaa ctttgacata ttctttttgc tcctcttttc taagattttc | |
aactagagaa ggaaaaaatt ttatgttatg attcctgtag aatttacaat tcaatatgta | |
caaaagaact ccccttttct aattgatagt ttggtcgctt tcaattataa tacaagggga | |
ttttttacat cttaaaattt ttcatttttg aatcaatccc tgaaaatata aagaacacat | |
cacataaatt attcttaata ttttataatc gaaaaaataa taggaataaa gaaaaatact | |
gcaataaata tattcatctg tttcttactc aaaccggcca ctatatttaa tcccattcct | |
ataataatta attcccaaat tgaaaacact tcaaatttac tacaaattat atatagtaat | |
gtacctggtt caaatattga acccaaacta gtatacgtta ctatttctcc tcctataaat | |
agtgttaata atgtattaat taatttacct aaaatagaaa ttacactagc aaatattgta | |
atagatacta actttttata agaaacatct ttactcatca gcatcattac aatctttaaa | |
ataatccccc aaataaaagg tgtaattaaa gcaatgaaaa tcgatgcaaa acctcctaac | |
atcatttggg aaacaagggg tatttccata tctgcaaata cttctttttg aattttaacc | |
aattctggat tgctatgtct tgcatataca gataaaatcc ctattattgc ttgtataact | |
gataaataca taagaggaaa ccatatcgga ctaattattt tcatacgctc gaattcagaa | |
ataggagatg taatcataaa aattagagat ggtttttcat aattgttttt ttctttattc | |
actactaaac tattatccat atattaacac cttctttttt tattcataac gtaatgcttc | |
aattggatct aattttgcag ccttattggc tggaatcaat ccaaatataa taccaagcga | |
catcgaaaat aatacgccac ccacaacaac ttcccatgaa acaagaggcg gccattttgc | |
aaatgtggac acaatgtacg ctccacaata accaagtcca atcccaatca atccaccaag | |
aagtgtcaac ataattgctt caattaaaaa ttgcaacaaa attttaccac gcgttgctcc | |
aagtgcttta cgtaccccaa tctcacgtgt acgctctgtt acagaaacaa gcatgatatt | |
cataactcca attccgccta caactaaaga aatacttgca atacctgcaa taatcattgt | |
cataatatta gtaactttag aaataccttt ttggatttct tctaaattta caatttcata | |
tttcccttta aactcttcag attgtctatc atttaataat tttactccct tttttccagc | |
tgtttgtaat tgatcaaccc ctattgcttg aattgtaata gattgttgag agttatcatc | |
tccatataat attggccata ttgaaagtgg tattaaaatt tctgacattc caaaaccaag | |
ctcttcatct cctgaactga atagaccaat aatttgaagt ggctgacctt taatttctat | |
aattttacca atgactgatt catgctcatt aggaaataac tctttcacta atgtttgatt | |
aaccattatt acattattac cttgcatcaa atcatcttca ttaagagaac gacctttctc | |
tattttcatt ttagtcatat taaaatattc ttttgtaata ccatttatat tagttacaac | |
ctttttatca tcaccaatta atgtctctgt actagagttt tgaacaatta catttttaat | |
ttcttttatc ttttttaact caaaaagatc ttcttcactt acagatggtt ttttgtcatt | |
catagatcct gttgttaata actcattaat atcttcttta tatgtaatcg gaatagtgtt | |
attgccagaa gcggtaaatt gtgatttaag cattgcttct ccacctttac caatggctac | |
aacagtaata atagaaccta caccaataat aattccaagc atcgtaagag ctgagcgcag | |
tttatgagct aaaatagaag ataaggcaat ttttatacta tctaataaac tcataccata | |
caccttctat cttctgtaat tttcccatct cgcaatatga tgcgacgtga agaataagct | |
gctacctctt cttcatgtgt aaccataacg attgtcgtac cttctgcatt taacttcgta | |
aagatatcca taacttgtgc accagacttc gtatcaagcg caccagttgg ctcatcagcc | |
ataataaacg ttggattatt cgcaatcgat cttgcaatag caacacgctg cttctgtcca | |
cctgacagct cactaggtaa atgatgtact ctatccgcta atccaacttt cccaagcgct | |
tcgagcgctc tttgacgacg ctctgctttc ttcactccac cataaatcag tggtaattca | |
acgttttcca ctgcggaaag gcgcggcaat aaattaaaat gctggaacac aaaaccgata | |
tattcattac gaattaaagc aagttttgac tcatctgctg ttaaaatatt cacatcattc | |
agcatatatt cgccttctgt tggacgatct aaacaaccga taatattcat aagagttgat | |
ttaccagaac cagacggtcc cataattgaa acaaattcac caccttgaat agttaaacta | |
ataccgtgca aaataggaac cgccattttt ccttgataat acgttttagc aatattattt | |
aacgtaatca tttctctttc acttccattc cgtcatatac gttgtcggaa ggatttttaa | |
ccaccttttg ccccactgtt gcgccctcta caatctctgt ccaatctcca tcagtagcac | |
cttttttcac attttgttta cgaagcttac ctttctcttc gatatataca aatgcatcat | |
cgcctttttc aacaatactc ttacttggaa cagcaatcat tcttttattc tctaaattta | |
cttgtaacga aacatgataa cctggagata aaccatcttg actatcaaga cttgctttat | |
atgtatattg agacatattt tgagtcactt cccccatgcc atcagcttga gccatttcta | |
cacttgttgg gaactcactt acctctgtaa tcttccctgt ccactttttc ttactatttg | |
ctttcgcagt tacagtaaac gtttgatcct tttgaatttg cgacttctga agctcagtta | |
atgttccttg aatttggaat ggatctttag aagcaacttg taaaaaggct ttcccttgac | |
cacctaacgc ttgtgatgaa ctttgtgctg catctttatc taacttttga acaacaccag | |
caaaattgct ataaatcgta agttcgttct gctttttatt taactcttct ttttgtaact | |
tccctttctc tttctcaagg tctgttgtct tttgcgctat ttctaattca cttacttgct | |
cttccatcgg atctattact tctttcccag ctccgctatc tttcgccttc ttaatttctt | |
tcttcaacga atcaatcttc tttttccctt ggtcataacg catatctgcc atcttttgat | |
caagcacagc ttgcttcatt tgcaaattaa tttcttcatt atcgtaagaa aacaatttcg | |
ttcccttttc tatttcttgt ccttctttca cttcaatatc tttcactttt cctttagtca | |
gatccgcgta gaaactttca atattccccg gcttcacttg accagaaatt aactttgtat | |
tattaagatt gcgctctgtg actttttcaa aactaacagt atctattttt gttaccgctt | |
tcttcttact ttgcactacg aaaatattaa taaatgtaac aataacaatt aacgcaataa | |
ctccaataat agctcctttc tttttatttt taaaaataaa aagttctttt ttaatcacaa | |
caatcttctc cttattcata tctaaaattt aaacttttaa attttacata aaaatttaaa | |
acttctaaaa tataacatgt ataatttacc atagatgatt tattttgtat aatataaaaa | |
tatctatata aataatgcta attttcaaac aatggggtgg aagatactaa tgttagaaaa | |
aaaagataga ctaacagaaa tagaggaaca aattatatac ttaatttcaa aggaattagg | |
aaataaagaa atagcggaaa aattaaatta ttcacaacgt agcatcggtt acaaaataaa | |
taatattttt aaaaaattaa atgttaattc aagaatcgga ctgattatag aagctgtaaa | |
aaaaaatata atttaaatat aagaatgctt tcatgttaat attttataga aactaaatat | |
agaggtgatt aaaatgcaaa aattttttga agctattagt gctataggta tagtaggtta | |
ctttttaggt aaattcacaa gtattccttt aatagacaaa tatacattgt atttcggcgt | |
aatgttgatg attggggtta ttggaagatt tattataaaa gtaattaact cagaagaaga | |
gacacatgat tcaaacaaat aaaatactct aataaaaatg gaagaagatt gcacttaagt | |
gcaatcttct tccattttta ttgaaaattg attaaataat gttaatattg caattgtgtg | |
gtgcagatta gggtgattat gtaatagggg gaaattaaaa atgatcaata cagcttggaa | |
aattattaaa gcactacaaa aatacggtac aaaagcatac aatgttatca aaaaaggcgg | |
ccaagcaatg tacgacagct tcatggcagc taaagctaaa ggttggacac atgcagcttg | |
gtggctagta gaacatggtt caactttagg aacattctat gatttattaa aagctgctgg | |
attaatcgac taattacagc aactaaacaa ctaaacaact aaacaactta aaaatacaaa | |
ttaccctaaa ctgtacccct attacatatt aactaattat tttaaaggtt ggatgataat | |
atgtcaaata acatcatatc tgtaaaaaat ttaattaaaa gcttcgataa caaaatagta | |
ttagataaat taaatttcga aatgaaagaa aactccactg ttgtaataat aggtaaaaac | |
ggtgcaggta aaagtgtctt tctaaattgt ttacttggat ttattcatta caaccaaggt | |
tcaatactaa tagatggaca acctgtagaa aatcgattac atctccgcaa gattacatcg | |
ttaatttctt cagaccatca agaacatcta aatttattaa cccccaatga atatttttct | |
tttttacaag atatttacca actaaaaagt aataataaag acaaaattca aaattactca | |
gaagatctat atgttactaa agaactcaat actgtatttt catcactttc ttttggaaca | |
aaaaagaaaa tacaattaat tggtagccta ttatattctc ctaaattatt gatttgcgac | |
gaaatatttg aagggcttga tacagactca gtaaaatggg ttaaaaactt atttcaacaa | |
agaaaacaag aaaatctttc tactttattt acaactcata ttactgaaca tataacagat | |
ataacagaaa aaaattacat acttgaaaat ggaaaattaa ttgtgtaagt ttaaccactt | |
atatttaaag ctaaaattaa ggagcttaaa atatgaattt taatatatat aagagactat | |
atgataaatc aacagaagaa aaaagcaaaa caataacaaa acaaatatta tttggaatta | |
taaatagttc tatattaata ggtatactac tcacatgttt ggagattttc aactttaaaa | |
tttcaactgt aatgtatggt tatttcacta tatatataat actagaactt ttactattat | |
tctctgcaaa tcaactatat gaaagtacag aattcataat aaaattcctt aaatatacac | |
caataaccat aaataaacta tatttctcac attttctaag ttctaaatat tcattttcca | |
atctttttga aataataact ctcacatcaa ttttattaat atataatgtc gatatcttat | |
attcatttat tttcataatt agcttacaaa ttattagctt aataagaaca tatttagaat | |
ttttactatt atattctcaa aaaaaacagg ttaaaatttt tactctaacc cattttgttt | |
tcataatatc tatggttttt tatattattg ttaaaacaaa atcgatagat ttagtattct | |
ttgaaaacac aaatatgtta attatatctg ttcttctcat aacattcttg atatcacttt | |
taacatataa acatattata gaatacttaa tgaaaaataa tgaaattgta tataatgcta | |
tttttatcaa gttaactttt aacacagcta atttaattag taaattattt aaatttaata | |
catcaattgc atctttaata aaaatacata taatacgatt attacgtaat caagactata | |
taagtagatt actaaaaata ggaatattac tatttatttt ttcttctata agctttctat | |
ttttcgataa atcatcaaca aacaatgaaa tgagtgatat actttacttt tcatttttta | |
tttccttatt tagtttttct aacatacgat tagactataa cttagtttct aaattaagct | |
tagaggatta tccaataaca aaattacaat caagattaag cattgatata gcacatggaa | |
ttttactatt tatactatct ttatttcttt tattaacaca atacttattg aatccaacaa | |
atattctaac tctaattgat ggtttattat catttatttg tttttatttt ctaagtcttg | |
gtatagaaaa agcagatatt ataataacac caaaaacaaa atggaaaatg tatccattat | |
tttttgtgat gggattaata attgaagcaa tatttctatt aaaattcaaa atatggataa | |
aattaataac tttattcctt tgtatactgt ggtcatattt acgtgtttat tggaaattaa | |
aaaaacaata aacacaatta aaaagttccc ttcatatttt ttgaagggaa cttttatttt | |
aaacaaaaat tacaaacaag caaagttatt taaaagtaaa cttttaaaat tattgaatta | |
ataacaatta gtctaagata tatcagccaa atttaatttt taaacaaacc gaaaaaccct | |
ttccgttttt gtttctgatt ttggctctgt atttctctaa tgttttcaag caataactga | |
tctcgttttt caaatttttt ctctataaaa acctctaatt caatattttt atcttctact | |
tcctttaatt ttctctccgt attagccaaa tgttcttttg tggtaactaa ttcattcgta | |
atctcttgta atttttgaac aagcgtttga ttgaactgat tttgtaattg ttgattttct | |
aatacttcat ccaacttctt ttctaattcc gatttttcct ctcttgaaac aaacaaatca | |
agttctccat tccgccatgc tcgaacttga tcataagacc atttttgcac ctgtattttt | |
tctttaattg ctatcaattc ttctatattt tcctttgagt acctacgatg ccctccctga | |
cttcgctccg tttgtatatt aaattcgttt gaccatgctt ttaacaagtc aggggtaatc | |
cctaaacgat ccgcaacaat tttcggtgta tacatttctg attttaattc caa |
- 文件 2:描述基因特征的 feature table 文件(.tbl)
该文件可以用 prokka 对文件 1 进行注释而得到,但是需要自己加以修改,加上文件前几行以及 gene 相关的信息,各列之间用制表符分隔。
>Feature Toyoncin_biosynthesis_gene_cluster | |
1 8409 source | |
organism Bacillus toyonensis | |
mol_type genomic DNA | |
strain XIN-YC13 | |
585 1 gene | |
gene orf1 | |
585 1 CDS | |
inference ab initio prediction:Prodigal:002006 | |
locus_tag Toyoncin_biosynthesis_gene_cluster_00001 | |
product MarR family transcriptional regulator | |
1476 811 gene | |
gene orf2 | |
1476 811 CDS | |
inference ab initio prediction:Prodigal:002006 | |
locus_tag Toyoncin_biosynthesis_gene_cluster_00002 | |
product YIP1 family membrane protein | |
2710 1496 gene | |
gene orf3 | |
2710 1496 CDS | |
inference ab initio prediction:Prodigal:002006 | |
locus_tag Toyoncin_biosynthesis_gene_cluster_00003 | |
product ABC transporter permease | |
3387 2707 gene | |
gene orf4 | |
3387 2707 CDS | |
inference ab initio prediction:Prodigal:002006 | |
locus_tag Toyoncin_biosynthesis_gene_cluster_00004 | |
product ABC transporter ATP-binding protein | |
4595 3384 gene | |
gene orf5 | |
4595 3384 CDS | |
inference ab initio prediction:Prodigal:002006 | |
locus_tag Toyoncin_biosynthesis_gene_cluster_00005 | |
product RND family efflux transporter, MFP subunit | |
4746 4952 gene | |
gene orf6 | |
4746 4952 CDS | |
inference ab initio prediction:Prodigal:002006 | |
locus_tag Toyoncin_biosynthesis_gene_cluster_00006 | |
product Helix-turn-helix transcriptional regulator | |
5010 5198 gene | |
gene orf7 | |
5010 5198 CDS | |
inference ab initio prediction:Prodigal:002006 | |
locus_tag Toyoncin_biosynthesis_gene_cluster_00007 | |
product Putative membrane protein | |
5337 5549 gene | |
gene toyA | |
5337 5549 CDS | |
inference ab initio prediction:Prodigal:002006 | |
locus_tag Toyoncin_biosynthesis_gene_cluster_00008 | |
product Toyonsin precusor | |
5657 6304 gene | |
gene orf9 | |
5657 6304 CDS | |
inference ab initio prediction:Prodigal:002006 | |
locus_tag Toyoncin_biosynthesis_gene_cluster_00009 | |
product ABC transporter ATP-binding protein | |
6349 7707 gene | |
gene orf10 | |
6349 7707 CDS | |
inference ab initio prediction:Prodigal:002006 | |
locus_tag Toyoncin_biosynthesis_gene_cluster_00010 | |
product Putative membrane protein | |
8391 7849 gene | |
gene orf11 | |
8391 7849 CDS | |
inference ab initio prediction:Prodigal:002006 | |
locus_tag Toyoncin_biosynthesis_gene_cluster_00011 | |
product MarR family transcriptional regulator |
- 文件 3:描述作者信息的模板文件(.sbt)
可以在 NCBI 上生成该文件。
Submit-block ::= { | |
contact { | |
contact { | |
name name { | |
last "xin", | |
first "bingyue", | |
middle "", | |
initials "", | |
suffix "", | |
title "" | |
}, | |
affil std { | |
affil "Huaibei Normal University", | |
div "College of Life Sciences", | |
city "Huaibei", | |
sub "Anhui", | |
country "China", | |
street "Dongshan road No.100", | |
email "xinbingyuex@163.com", | |
postal-code "235000" | |
} | |
} | |
}, | |
cit { | |
authors { | |
names std { | |
{ | |
name name { | |
last "Xin", | |
first "Bingyue", | |
middle "", | |
initials "", | |
suffix "", | |
title "" | |
} | |
} | |
}, | |
affil std { | |
affil "Huaibei Normal University", | |
div "College of Life Sciences", | |
city "Huaibei", | |
sub "Anhui", | |
country "China", | |
street "Dongshan road No.100", | |
postal-code "235000" | |
} | |
} | |
}, | |
subtype new | |
} | |
Seqdesc ::= pub { | |
pub { | |
gen { | |
cit "unpublished", | |
authors { | |
names std { | |
{ | |
name name { | |
last "Xin", | |
first "Bingyue", | |
middle "", | |
initials "", | |
suffix "", | |
title "" | |
} | |
} | |
} | |
}, | |
title "Purification and characterization of a novel leaderless bacteriocin, toyoncin, produced by Bacillus toyonensis XIN-YC13 that specifically active against Bacilus cereus and Listeria monocytogenes" | |
} | |
} | |
} | |
Seqdesc ::= user { | |
type str "Submission", | |
data { | |
{ | |
label str "AdditionalComment", | |
data str "ALT EMAIL:xinbingyuex@163.com" | |
} | |
} | |
} | |
Seqdesc ::= user { | |
type str "Submission", | |
data { | |
{ | |
label str "AdditionalComment", | |
data str "Submission Title:None" | |
} | |
} | |
} |
注意:文件 1 和文件 2 的序列描述信息必须一致,此例中均为 “Toyoncin_biosynthesis_gene_cluster”。
# 文件生成
tbl2asn -t template.sbt -p ./ -V vb -x .fna |
-t 模板文件
- p 输入文件所在路径
- V
-v 生成验证文件,保存错误信息
- b 生成 gbf 文件
- x 文件 1(FASTA 文件)的后缀名,根据实际情况填写
注意:如果用 Prokka 带的 tbl2asn,生成的 sqn 和 gbf 文件中的日期通常是 1-JAN-2019,需要自己手动改正为当前时间,这是因为 Prokka 里的 tbl2asn 是经过修改的。建议使用官方版的 tbl2asn,可避免日期错误。
# 参考
- tbl2asn
- 上传基因组数据到 NCBI