Abstract
The annotation of the tomato genome performed by the iTAG consortium (international Tomato Annotation Group) relied on a pipeline operating as a distributed, worldwide network of resources and experts. It used SGN (http://solgenomics.net/) as a central data repository and exchange node. For the iTAG pipeline, used for tomato and potato, we relied on software, as it has besides its own ab initio prediction capabilities, also an extended flexibility to integrate and combine a high diversity of extrinsic data, and other prediction results from other software. Transcript data of numerous origins were mapped on the genome sequence using several software. The detailed procedure is described.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abeel T, Van Parys T, Saeys Y, Galagan J, Van de Peer Y (2012) Genomeview: a next-generation genome browser. Nucleic Acids Res 40(2):e12
Allen JE, Salzberg SL (2005) JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics 21(18):3596–3603
Altschul SF et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Besemer J, Borodovsky M (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33:W451–W454
Besemer J, Lomsadze A, Borodovsky M (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29(12):2607–2618
Birney E, Clamp M, Durbin R (2004) GeneWise and genomewise. Genome Res 14(5):988–995
Bonizzoni P, Rizzi R, Pesole G (2005) ASPIC: a novel method to predict the exon–intron structure of a gene that is optimally compatible to a set of transcript sequences. BMC Bioinform 6:244
Brunak S, Engelbrecht J, Knudsen S (1991) Prediction of human mRNA donor and acceptor sites from the DNA sequence. J Mol Biol 220:49–65
Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94
Camon EB, Barrell DG, Dimmer EC, Lee V, Magrane M, Maslen J, Binns D, Apweiler R (2005) An evaluation of GO annotation retrieval for BioCreAtIvE and GOA. BMC Bioinform 6(Suppl 1):S17
Cantarel BL, Korf I, Robb SMC, Parra G, Ross E, Moore B et al (2008) MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res 18(1):188–196
Coghlan et al (2008) nGASP—the nematode genome annotation assessment project. BMC Bioinform 19(9):549
Coleman SJ, Zeng Z et al (2010) Structural annotation of equine protein-coding genes determined by mRNA sequencing. Anim Genet 41:121–130
DeCaprio D, Vinson JP, Pearson MD, Montgomery P, Doherty M, Galagan JE (2007) Conrad: Gene prediction using conditional random fields. Genome Res 17(9):1389–1398
Delcher AL, Harmon D et al (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res 27:4636–4641
Djebali S, Delaplace F, Roest Crollius H (2006) Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA. Genome Biology 7(Suppl 1):S7.1–S7.10
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21
ENCODE Project Consortium (2004) The ENCODE (Encyclopedia of DNA elements) project. Science 306(5696):636–640
Foissac S et al (2008) Genome annotation in plants and fungi: EuGene as a model platform. Curr Bioinform 3:87–97
Götz S, GarcÃa-Gómez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ, Robles M, Talón M, Dopazo J, Conesa A (2008) High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res 36(10):3420–3435
Gremme G, Brendel V, Sparks ME, Kurtz S (2005) Engineering a software tool for gene structure prediction in higher organisms. Inf Softw Technol 47(15):965–978
Gross SS, Brent MR (2006a) Using multiple alignments to improve gene prediction. J Comput Biol 13:379–393
Gross SS, Brent MR (2006b) Using multiple alignments to improve gene prediction. J Comput Biol 13(2):379–393
Guigó R, Knudsen S, Drake N, Smith T (1992) Prediction of gene structure. J Mol Biol 226(1):141–157
Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, Salzberg SL (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 31(19):5654–5666
Hebsgaard SM, Korning PG, Tolstrup N, Engelbrecht J, Rouze P, Brunak S (1996) Splice site prediction in Arabidopsis thaliana DNA by combining local and global sequence information. Nucleic Acids Res 24(17):3439–3452
Howe KL, Chothia T, Durbin R (2002) GAZE: a generic framework for the integration of gene-prediction data by dynamic programming. Genome Res 12(9):1418–1427
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14(4):R36
Korf I (2004) Gene finding in novel genomes. BMC Bioinform 5:59
Krogh A, Mian IS et al (1994) A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res 22:4768–4778
Kulp D, Haussler D et al (1996) A generalized hidden Markov model for the recognition of human genes in DNA. Proc Int Conf Intell Syst Mol Biol 4:134–142
Li H, Jiang T (2005) A class of edit kernels for SVMs to predict translation initiation sites in eukaryotic mRNAs. J Comput Biol 12:702–718
Li L, Stoeckert CJ Jr, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13(9):2178–2189
Lukashin AV, Borodovsky M (1998) GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res 26(4):1107–1115
Marco-Sola S, Sammeth M, Guigó R, Ribeca P (2012) The GEM mapper: fast, accurate and versatile alignment by filtration. Nat Methods 9(12):1185–1188
Mizrachi E, Hefer CA et al (2010) De novo assembled expressed gene catalog of a fast-growing Eucalyptus tree produced by Illumina mRNA-Seq. BMC Genomics 11:681
Nussbaumer T et al (2013) MIPS PlantsDB: a database framework for comparative plant genome research. Nucleic Acids Res. 41:D1144–D1151
Parra G, Blanco E, Guigó R (2000) GeneID in Drosophila. Genome Res 10(4):511–515
Parra G, Agarwal P et al (2003) Comparative gene prediction in human and mouse. Genome Res 13:108–117
Passalacqua KD, Varadarajan A et al (2012) Strand-specific RNA-seq reveals ordered patterns of sense and antisense transcription in Bacillus anthracis. PLoS One 7:e43350
Pertea M, Lin X, Salzberg SL (2001) GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res 29(5):1185–1190
Philippe N, Salson M, Commes T, Rivals E (2013) CRAC: an integrated approach to the analysis of RNA-seq reads. Genome Biol 14(3):R30
Picardi E, Pesole G (2010) Computational methods for ab initio and comparative gene finding. Methods Mol Biol 609:269–284
Rätsch Gunnar, Sonnenburg S, Srinivasan J, Witte H, Müller KR, Sommer RJ, Schölkopf B (2007) Improving the C. elegans genome annotation using machine learning. PLoS Comput Biol 3(2):e20
Salamov A, Solovyev V (2000) Ab initio gene finding in Drosophila genomic DNA. Genome Res 10:516–522
Schiex T, Moisan A, Rouzé P (2001) EuGène: An eucaryotic gene finder that combines several sources of evidence. Lect. Notes Comput Sci 2066:111–125
Schoof et al. (2012) https://github.com/groupschoof/PhyloFun
Schweikert G, Behr J, Zien A et al (2009) mGene.web: a web service for accurate computational gene finding. Nucleic Acids Res 37:W312–W316
Slater GStC*, Birney E (2005a) Automated generation of heuristics for biological sequence comparison. BMC Bioinform 2005(6):31
Slater GS, Birney E (2005b) Automated generation of heuristics for biological sequence comparison. BMC Bioinform 6:31
Smit AFA, Hubley R, Green P (1996) RepeatMasker at http://repeatmasker.org
Stanke M, Schoffmann O et al (2006) Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinform 7:62
Sterck L, Billiau K et al (2012) ORCAE: online resource for community annotation of eukaryotes. Nat Methods 9(11):1041
Tisserant E, Da Silva C et al (2011) Deep RNA sequencing improved the structural annotation of the Tuber melanosporum transcriptome. New Phytol 189:883–891
Trapnell C*, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25(9):1105–1111
Wu TD, Nacu S (2010) Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26(7):873–881
Yandell M, Ence D (2012) A beginner’s guide to eukaryotic genome annotation. Nat Rev Genet 13:329–342
Yeh RF, Lim LP et al (2001) Computational inference of homologous gene structures in the human genome. Genome Res 11:803–816
Zhang MQ (2002) Computational prediction of eukaryotic protein-coding genes. Nat Rev Genet 3:698–709
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Rombauts, S. (2016). Annotation of the Tomato Genome. In: Causse, M., Giovannoni, J., Bouzayen, M., Zouine, M. (eds) The Tomato Genome. Compendium of Plant Genomes. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-53389-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-662-53389-5_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-53387-1
Online ISBN: 978-3-662-53389-5
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)