Skip to main content

Annotation of the Tomato Genome

  • Chapter
  • First Online:
The Tomato Genome

Part of the book series: Compendium of Plant Genomes ((CPG))

  • 1625 Accesses

Abstract

The annotation of the tomato genome performed by the iTAG consortium (international Tomato Annotation Group) relied on a pipeline operating as a distributed, worldwide network of resources and experts. It used SGN (http://solgenomics.net/) as a central data repository and exchange node. For the iTAG pipeline, used for tomato and potato, we relied on software, as it has besides its own ab initio prediction capabilities, also an extended flexibility to integrate and combine a high diversity of extrinsic data, and other prediction results from other software. Transcript data of numerous origins were mapped on the genome sequence using several software. The detailed procedure is described.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Abeel T, Van Parys T, Saeys Y, Galagan J, Van de Peer Y (2012) Genomeview: a next-generation genome browser. Nucleic Acids Res 40(2):e12

    Article  CAS  PubMed  Google Scholar 

  • Allen JE, Salzberg SL (2005) JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics 21(18):3596–3603

    Article  CAS  PubMed  Google Scholar 

  • Altschul SF et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    Article  CAS  PubMed  Google Scholar 

  • Besemer J, Borodovsky M (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33:W451–W454

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Besemer J, Lomsadze A, Borodovsky M (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29(12):2607–2618

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Birney E, Clamp M, Durbin R (2004) GeneWise and genomewise. Genome Res 14(5):988–995

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bonizzoni P, Rizzi R, Pesole G (2005) ASPIC: a novel method to predict the exon–intron structure of a gene that is optimally compatible to a set of transcript sequences. BMC Bioinform 6:244

    Article  Google Scholar 

  • Brunak S, Engelbrecht J, Knudsen S (1991) Prediction of human mRNA donor and acceptor sites from the DNA sequence. J Mol Biol 220:49–65

    Article  CAS  PubMed  Google Scholar 

  • Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94

    Article  CAS  PubMed  Google Scholar 

  • Camon EB, Barrell DG, Dimmer EC, Lee V, Magrane M, Maslen J, Binns D, Apweiler R (2005) An evaluation of GO annotation retrieval for BioCreAtIvE and GOA. BMC Bioinform 6(Suppl 1):S17

    Article  Google Scholar 

  • Cantarel BL, Korf I, Robb SMC, Parra G, Ross E, Moore B et al (2008) MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res 18(1):188–196

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Coghlan et al (2008) nGASP—the nematode genome annotation assessment project. BMC Bioinform 19(9):549

    Article  Google Scholar 

  • Coleman SJ, Zeng Z et al (2010) Structural annotation of equine protein-coding genes determined by mRNA sequencing. Anim Genet 41:121–130

    Article  PubMed  Google Scholar 

  • DeCaprio D, Vinson JP, Pearson MD, Montgomery P, Doherty M, Galagan JE (2007) Conrad: Gene prediction using conditional random fields. Genome Res 17(9):1389–1398

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Delcher AL, Harmon D et al (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res 27:4636–4641

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Djebali S, Delaplace F, Roest Crollius H (2006) Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA. Genome Biology 7(Suppl 1):S7.1–S7.10

    Article  Google Scholar 

  • Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21

    Article  CAS  PubMed  Google Scholar 

  • ENCODE Project Consortium (2004) The ENCODE (Encyclopedia of DNA elements) project. Science 306(5696):636–640

    Article  Google Scholar 

  • Foissac S et al (2008) Genome annotation in plants and fungi: EuGene as a model platform. Curr Bioinform 3:87–97

    Article  CAS  Google Scholar 

  • Götz S, García-Gómez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ, Robles M, Talón M, Dopazo J, Conesa A (2008) High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res 36(10):3420–3435

    Article  PubMed  PubMed Central  Google Scholar 

  • Gremme G, Brendel V, Sparks ME, Kurtz S (2005) Engineering a software tool for gene structure prediction in higher organisms. Inf Softw Technol 47(15):965–978

    Article  Google Scholar 

  • Gross SS, Brent MR (2006a) Using multiple alignments to improve gene prediction. J Comput Biol 13:379–393

    Article  CAS  PubMed  Google Scholar 

  • Gross SS, Brent MR (2006b) Using multiple alignments to improve gene prediction. J Comput Biol 13(2):379–393

    Article  CAS  PubMed  Google Scholar 

  • Guigó R, Knudsen S, Drake N, Smith T (1992) Prediction of gene structure. J Mol Biol 226(1):141–157

    Article  PubMed  Google Scholar 

  • Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, Salzberg SL (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 31(19):5654–5666

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Hebsgaard SM, Korning PG, Tolstrup N, Engelbrecht J, Rouze P, Brunak S (1996) Splice site prediction in Arabidopsis thaliana DNA by combining local and global sequence information. Nucleic Acids Res 24(17):3439–3452

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Howe KL, Chothia T, Durbin R (2002) GAZE: a generic framework for the integration of gene-prediction data by dynamic programming. Genome Res 12(9):1418–1427

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14(4):R36

    Article  PubMed  PubMed Central  Google Scholar 

  • Korf I (2004) Gene finding in novel genomes. BMC Bioinform 5:59

    Article  Google Scholar 

  • Krogh A, Mian IS et al (1994) A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res 22:4768–4778

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kulp D, Haussler D et al (1996) A generalized hidden Markov model for the recognition of human genes in DNA. Proc Int Conf Intell Syst Mol Biol 4:134–142

    CAS  PubMed  Google Scholar 

  • Li H, Jiang T (2005) A class of edit kernels for SVMs to predict translation initiation sites in eukaryotic mRNAs. J Comput Biol 12:702–718

    Article  PubMed  Google Scholar 

  • Li L, Stoeckert CJ Jr, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13(9):2178–2189

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Lukashin AV, Borodovsky M (1998) GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res 26(4):1107–1115

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Marco-Sola S, Sammeth M, Guigó R, Ribeca P (2012) The GEM mapper: fast, accurate and versatile alignment by filtration. Nat Methods 9(12):1185–1188

    Article  CAS  PubMed  Google Scholar 

  • Mizrachi E, Hefer CA et al (2010) De novo assembled expressed gene catalog of a fast-growing Eucalyptus tree produced by Illumina mRNA-Seq. BMC Genomics 11:681

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Nussbaumer T et al (2013) MIPS PlantsDB: a database framework for comparative plant genome research. Nucleic Acids Res. 41:D1144–D1151

    Article  CAS  PubMed  Google Scholar 

  • Parra G, Blanco E, Guigó R (2000) GeneID in Drosophila. Genome Res 10(4):511–515

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Parra G, Agarwal P et al (2003) Comparative gene prediction in human and mouse. Genome Res 13:108–117

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Passalacqua KD, Varadarajan A et al (2012) Strand-specific RNA-seq reveals ordered patterns of sense and antisense transcription in Bacillus anthracis. PLoS One 7:e43350

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Pertea M, Lin X, Salzberg SL (2001) GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res 29(5):1185–1190

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Philippe N, Salson M, Commes T, Rivals E (2013) CRAC: an integrated approach to the analysis of RNA-seq reads. Genome Biol 14(3):R30

    Article  PubMed  PubMed Central  Google Scholar 

  • Picardi E, Pesole G (2010) Computational methods for ab initio and comparative gene finding. Methods Mol Biol 609:269–284

    Article  CAS  PubMed  Google Scholar 

  • Rätsch Gunnar, Sonnenburg S, Srinivasan J, Witte H, Müller KR, Sommer RJ, Schölkopf B (2007) Improving the C. elegans genome annotation using machine learning. PLoS Comput Biol 3(2):e20

    Article  PubMed  PubMed Central  Google Scholar 

  • Salamov A, Solovyev V (2000) Ab initio gene finding in Drosophila genomic DNA. Genome Res 10:516–522

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Schiex T, Moisan A, Rouzé P (2001) EuGène: An eucaryotic gene finder that combines several sources of evidence. Lect. Notes Comput Sci 2066:111–125

    Google Scholar 

  • Schoof et al. (2012) https://github.com/groupschoof/PhyloFun

  • Schweikert G, Behr J, Zien A et al (2009) mGene.web: a web service for accurate computational gene finding. Nucleic Acids Res 37:W312–W316

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Slater GStC*, Birney E (2005a) Automated generation of heuristics for biological sequence comparison. BMC Bioinform 2005(6):31

    Article  Google Scholar 

  • Slater GS, Birney E (2005b) Automated generation of heuristics for biological sequence comparison. BMC Bioinform 6:31

    Article  Google Scholar 

  • Smit AFA, Hubley R, Green P (1996) RepeatMasker at http://repeatmasker.org

  • Stanke M, Schoffmann O et al (2006) Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinform 7:62

    Article  Google Scholar 

  • Sterck L, Billiau K et al (2012) ORCAE: online resource for community annotation of eukaryotes. Nat Methods 9(11):1041

    Article  CAS  PubMed  Google Scholar 

  • Tisserant E, Da Silva C et al (2011) Deep RNA sequencing improved the structural annotation of the Tuber melanosporum transcriptome. New Phytol 189:883–891

    Article  CAS  PubMed  Google Scholar 

  • Trapnell C*, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25(9):1105–1111

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wu TD, Nacu S (2010) Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26(7):873–881

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Yandell M, Ence D (2012) A beginner’s guide to eukaryotic genome annotation. Nat Rev Genet 13:329–342

    Article  CAS  PubMed  Google Scholar 

  • Yeh RF, Lim LP et al (2001) Computational inference of homologous gene structures in the human genome. Genome Res 11:803–816

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhang MQ (2002) Computational prediction of eukaryotic protein-coding genes. Nat Rev Genet 3:698–709

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephane Rombauts .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Rombauts, S. (2016). Annotation of the Tomato Genome. In: Causse, M., Giovannoni, J., Bouzayen, M., Zouine, M. (eds) The Tomato Genome. Compendium of Plant Genomes. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-53389-5_9

Download citation

Publish with us

Policies and ethics