Annotation of the Tomato Genome

Rombauts, Stephane

doi:10.1007/978-3-662-53389-5_9

Stephane Rombauts⁶

Part of the book series: Compendium of Plant Genomes ((CPG))

1625 Accesses

Abstract

The annotation of the tomato genome performed by the iTAG consortium (international Tomato Annotation Group) relied on a pipeline operating as a distributed, worldwide network of resources and experts. It used SGN (http://solgenomics.net/) as a central data repository and exchange node. For the iTAG pipeline, used for tomato and potato, we relied on software, as it has besides its own ab initio prediction capabilities, also an extended flexibility to integrate and combine a high diversity of extrinsic data, and other prediction results from other software. Transcript data of numerous origins were mapped on the genome sequence using several software. The detailed procedure is described.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abeel T, Van Parys T, Saeys Y, Galagan J, Van de Peer Y (2012) Genomeview: a next-generation genome browser. Nucleic Acids Res 40(2):e12
Article CAS PubMed Google Scholar
Allen JE, Salzberg SL (2005) JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics 21(18):3596–3603
Article CAS PubMed Google Scholar
Altschul SF et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Article CAS PubMed Google Scholar
Besemer J, Borodovsky M (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33:W451–W454
Article CAS PubMed PubMed Central Google Scholar
Besemer J, Lomsadze A, Borodovsky M (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29(12):2607–2618
Article CAS PubMed PubMed Central Google Scholar
Birney E, Clamp M, Durbin R (2004) GeneWise and genomewise. Genome Res 14(5):988–995
Article CAS PubMed PubMed Central Google Scholar
Bonizzoni P, Rizzi R, Pesole G (2005) ASPIC: a novel method to predict the exon–intron structure of a gene that is optimally compatible to a set of transcript sequences. BMC Bioinform 6:244
Article Google Scholar
Brunak S, Engelbrecht J, Knudsen S (1991) Prediction of human mRNA donor and acceptor sites from the DNA sequence. J Mol Biol 220:49–65
Article CAS PubMed Google Scholar
Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94
Article CAS PubMed Google Scholar
Camon EB, Barrell DG, Dimmer EC, Lee V, Magrane M, Maslen J, Binns D, Apweiler R (2005) An evaluation of GO annotation retrieval for BioCreAtIvE and GOA. BMC Bioinform 6(Suppl 1):S17
Article Google Scholar
Cantarel BL, Korf I, Robb SMC, Parra G, Ross E, Moore B et al (2008) MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res 18(1):188–196
Article CAS PubMed PubMed Central Google Scholar
Coghlan et al (2008) nGASP—the nematode genome annotation assessment project. BMC Bioinform 19(9):549
Article Google Scholar
Coleman SJ, Zeng Z et al (2010) Structural annotation of equine protein-coding genes determined by mRNA sequencing. Anim Genet 41:121–130
Article PubMed Google Scholar
DeCaprio D, Vinson JP, Pearson MD, Montgomery P, Doherty M, Galagan JE (2007) Conrad: Gene prediction using conditional random fields. Genome Res 17(9):1389–1398
Article CAS PubMed PubMed Central Google Scholar
Delcher AL, Harmon D et al (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res 27:4636–4641
Article CAS PubMed PubMed Central Google Scholar
Djebali S, Delaplace F, Roest Crollius H (2006) Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA. Genome Biology 7(Suppl 1):S7.1–S7.10
Article Google Scholar
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21
Article CAS PubMed Google Scholar
ENCODE Project Consortium (2004) The ENCODE (Encyclopedia of DNA elements) project. Science 306(5696):636–640
Article Google Scholar
Foissac S et al (2008) Genome annotation in plants and fungi: EuGene as a model platform. Curr Bioinform 3:87–97
Article CAS Google Scholar
Götz S, García-Gómez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ, Robles M, Talón M, Dopazo J, Conesa A (2008) High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res 36(10):3420–3435
Article PubMed PubMed Central Google Scholar
Gremme G, Brendel V, Sparks ME, Kurtz S (2005) Engineering a software tool for gene structure prediction in higher organisms. Inf Softw Technol 47(15):965–978
Article Google Scholar
Gross SS, Brent MR (2006a) Using multiple alignments to improve gene prediction. J Comput Biol 13:379–393
Article CAS PubMed Google Scholar
Gross SS, Brent MR (2006b) Using multiple alignments to improve gene prediction. J Comput Biol 13(2):379–393
Article CAS PubMed Google Scholar
Guigó R, Knudsen S, Drake N, Smith T (1992) Prediction of gene structure. J Mol Biol 226(1):141–157
Article PubMed Google Scholar
Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, Salzberg SL (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 31(19):5654–5666
Article CAS PubMed PubMed Central Google Scholar
Hebsgaard SM, Korning PG, Tolstrup N, Engelbrecht J, Rouze P, Brunak S (1996) Splice site prediction in Arabidopsis thaliana DNA by combining local and global sequence information. Nucleic Acids Res 24(17):3439–3452
Article CAS PubMed PubMed Central Google Scholar
Howe KL, Chothia T, Durbin R (2002) GAZE: a generic framework for the integration of gene-prediction data by dynamic programming. Genome Res 12(9):1418–1427
Article CAS PubMed PubMed Central Google Scholar
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14(4):R36
Article PubMed PubMed Central Google Scholar
Korf I (2004) Gene finding in novel genomes. BMC Bioinform 5:59
Article Google Scholar
Krogh A, Mian IS et al (1994) A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res 22:4768–4778
Article CAS PubMed PubMed Central Google Scholar
Kulp D, Haussler D et al (1996) A generalized hidden Markov model for the recognition of human genes in DNA. Proc Int Conf Intell Syst Mol Biol 4:134–142
CAS PubMed Google Scholar
Li H, Jiang T (2005) A class of edit kernels for SVMs to predict translation initiation sites in eukaryotic mRNAs. J Comput Biol 12:702–718
Article PubMed Google Scholar
Li L, Stoeckert CJ Jr, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13(9):2178–2189
Article CAS PubMed PubMed Central Google Scholar
Lukashin AV, Borodovsky M (1998) GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res 26(4):1107–1115
Article CAS PubMed PubMed Central Google Scholar
Marco-Sola S, Sammeth M, Guigó R, Ribeca P (2012) The GEM mapper: fast, accurate and versatile alignment by filtration. Nat Methods 9(12):1185–1188
Article CAS PubMed Google Scholar
Mizrachi E, Hefer CA et al (2010) De novo assembled expressed gene catalog of a fast-growing Eucalyptus tree produced by Illumina mRNA-Seq. BMC Genomics 11:681
Article CAS PubMed PubMed Central Google Scholar
Nussbaumer T et al (2013) MIPS PlantsDB: a database framework for comparative plant genome research. Nucleic Acids Res. 41:D1144–D1151
Article CAS PubMed Google Scholar
Parra G, Blanco E, Guigó R (2000) GeneID in Drosophila. Genome Res 10(4):511–515
Article CAS PubMed PubMed Central Google Scholar
Parra G, Agarwal P et al (2003) Comparative gene prediction in human and mouse. Genome Res 13:108–117
Article CAS PubMed PubMed Central Google Scholar
Passalacqua KD, Varadarajan A et al (2012) Strand-specific RNA-seq reveals ordered patterns of sense and antisense transcription in Bacillus anthracis. PLoS One 7:e43350
Article CAS PubMed PubMed Central Google Scholar
Pertea M, Lin X, Salzberg SL (2001) GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res 29(5):1185–1190
Article CAS PubMed PubMed Central Google Scholar
Philippe N, Salson M, Commes T, Rivals E (2013) CRAC: an integrated approach to the analysis of RNA-seq reads. Genome Biol 14(3):R30
Article PubMed PubMed Central Google Scholar
Picardi E, Pesole G (2010) Computational methods for ab initio and comparative gene finding. Methods Mol Biol 609:269–284
Article CAS PubMed Google Scholar
Rätsch Gunnar, Sonnenburg S, Srinivasan J, Witte H, Müller KR, Sommer RJ, Schölkopf B (2007) Improving the C. elegans genome annotation using machine learning. PLoS Comput Biol 3(2):e20
Article PubMed PubMed Central Google Scholar
Salamov A, Solovyev V (2000) Ab initio gene finding in Drosophila genomic DNA. Genome Res 10:516–522
Article CAS PubMed PubMed Central Google Scholar
Schiex T, Moisan A, Rouzé P (2001) EuGène: An eucaryotic gene finder that combines several sources of evidence. Lect. Notes Comput Sci 2066:111–125
Google Scholar
Schoof et al. (2012) https://github.com/groupschoof/PhyloFun
Schweikert G, Behr J, Zien A et al (2009) mGene.web: a web service for accurate computational gene finding. Nucleic Acids Res 37:W312–W316
Article CAS PubMed PubMed Central Google Scholar
Slater GStC*, Birney E (2005a) Automated generation of heuristics for biological sequence comparison. BMC Bioinform 2005(6):31
Article Google Scholar
Slater GS, Birney E (2005b) Automated generation of heuristics for biological sequence comparison. BMC Bioinform 6:31
Article Google Scholar
Smit AFA, Hubley R, Green P (1996) RepeatMasker at http://repeatmasker.org
Stanke M, Schoffmann O et al (2006) Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinform 7:62
Article Google Scholar
Sterck L, Billiau K et al (2012) ORCAE: online resource for community annotation of eukaryotes. Nat Methods 9(11):1041
Article CAS PubMed Google Scholar
Tisserant E, Da Silva C et al (2011) Deep RNA sequencing improved the structural annotation of the Tuber melanosporum transcriptome. New Phytol 189:883–891
Article CAS PubMed Google Scholar
Trapnell C*, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25(9):1105–1111
Article CAS PubMed PubMed Central Google Scholar
Wu TD, Nacu S (2010) Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26(7):873–881
Article CAS PubMed PubMed Central Google Scholar
Yandell M, Ence D (2012) A beginner’s guide to eukaryotic genome annotation. Nat Rev Genet 13:329–342
Article CAS PubMed Google Scholar
Yeh RF, Lim LP et al (2001) Computational inference of homologous gene structures in the human genome. Genome Res 11:803–816
Article CAS PubMed PubMed Central Google Scholar
Zhang MQ (2002) Computational prediction of eukaryotic protein-coding genes. Nat Rev Genet 3:698–709
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Univ Ghent VIB, 9052, Ghent, Belgium
Stephane Rombauts

Authors

Stephane Rombauts
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephane Rombauts .

Editor information

Editors and Affiliations

GAFL, INRA, Montfavet Cedex, France
Mathilde Causse
Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, New York, USA
Jim Giovannoni
INRA-INP Toulouse, Castanet Tolosan, France
Mondher Bouzayen
INRA-INP Toulouse, Castanet Tolosan, France
Mohamed Zouine

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rombauts, S. (2016). Annotation of the Tomato Genome. In: Causse, M., Giovannoni, J., Bouzayen, M., Zouine, M. (eds) The Tomato Genome. Compendium of Plant Genomes. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-53389-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-662-53389-5_9
Published: 24 November 2016
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-53387-1
Online ISBN: 978-3-662-53389-5
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics