Functional Annotation of Plant Genomes

  • Vindhya Amarasinghe
  • Palitha Dharmawardhana
  • Justin Elser
  • Pankaj JaiswalEmail author


The recent introduction of highly-efficient next-generation sequencing platforms (Roche 454, Illumina, PacBio, Life Technologies SOLiD, etc.) has lead to an increased number of sequenced plant genomes.


Enrichment Analysis Functional Annotation Enzyme Commission Ortholog Cluster Model Organism Database 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The authors VA, PD, JE and PJ are supported by the Gramene project award (# IOS:0703908) and the Plant Ontology project (# IOS:0822201) from the National Science Foundation (NSF) of USA. The Jaiswal lab is also supported by the startup funds provided to PJ by the Oregon State University (OSU), Corvallis, OR, USA. Authors would also like to thank Rajani Raja of OSU for the InterProScan tabular output (Fig. 7.4); Justin Preece of OSU for editorial comments; and Sarah Hunter, InterPro team and the European Bioinformatics Institute for giving the permission to use screen shots of the InterProScan web interface (Figs. 7.2, 7.3).


  1. AGI (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408(6814):796–815CrossRefGoogle Scholar
  2. Al-Dous EK, George B et al (2011) De novo genome sequencing and comparative genomics of date palm (Phoenix dactylifera). Nat Biotechnol 29(6):521–527PubMedCrossRefGoogle Scholar
  3. Alexeyenko A, Tamas I et al (2006) Automatic clustering of orthologs and inparalogs shared by multiple proteomes. Bioinformatics 22(14):e9–e15PubMedCrossRefGoogle Scholar
  4. Ashburner M, Ball CA et al (2000) Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat Genet 25(1):25–29PubMedCrossRefGoogle Scholar
  5. Banks JA, Nishiyama T et al (2011) The selaginella genome identifies genetic changes associated with the evolution of vascular plants. Science 332(6032):960–963PubMedCrossRefGoogle Scholar
  6. Beissbarth T, Speed TP (2004) GOstat: find statistically overrepresented gene ontologies within a group of genes. Bioinformatics 20(9):1464–1465PubMedCrossRefGoogle Scholar
  7. Benjamini YH, Yosef (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc 57(1):289–300Google Scholar
  8. Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27(2):573–580PubMedCrossRefGoogle Scholar
  9. Berglund AC, Sjolund E et al (2008) InParanoid 6: eukaryotic ortholog clusters with inparalogs. Nucleic Acids Res 36(Database issue):D263–266Google Scholar
  10. Berriz GF, Beaver JE et al (2009) Next generation software for functional trend analysis. Bioinformatics 25(22):3043–3044PubMedCrossRefGoogle Scholar
  11. Blanco E, Abril JF (2009) Computational gene annotation in new genome assemblies using GeneID. Methods Mol Biol 537:243–261PubMedCrossRefGoogle Scholar
  12. Blanco E, Parra G et al (2007) Using geneid to identify genes. Curr Protoc Bioinformatics Chapter 4: Unit 4 3Google Scholar
  13. Camacho C, Coulouris G et al (2009) BLAST+: architecture and applications. BMC Bioinf 10:421CrossRefGoogle Scholar
  14. Chen F, Mackey AJ et al (2006) OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res 34(Database issue):D363–368Google Scholar
  15. Cock JM, Sterck L et al (2010) The Ectocarpus genome and the independent evolution of multicellularity in brown algae. Nature 465(7298):617–621PubMedCrossRefGoogle Scholar
  16. Couch JA, Zintel HA et al (1993) The genome of the tropical tree Theobroma cacao L. Mol Gen Genet 237(1–2):123–128PubMedGoogle Scholar
  17. Du Z, Zhou X et al (2010) AgriGO: a GO analysis toolkit for the agricultural community. Nucleic Acids Res 38(Web Server issue):W64–W70Google Scholar
  18. Filichkin SA, Breton G et al (2011) Global profiling of rice and poplar transcriptomes highlights key conserved circadian-controlled pathways and cis-regulatory modules. PLoS ONE 6(6):e16907PubMedCrossRefGoogle Scholar
  19. Goff SA, Ricke D et al (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296(5565):92–100PubMedCrossRefGoogle Scholar
  20. Hermjakob H, Montecchi-Palazzi L et al (2004) IntAct: an open source molecular interaction database. Nucleic Acids Res 32(Database issue):D452–D455Google Scholar
  21. Huang S, Li R et al (2009) The genome of the cucumber, cucumis sativus L. Nat Genet 41(12):1275–1281PubMedCrossRefGoogle Scholar
  22. Hunter S, Apweiler R et al (2009) InterPro: the integrative protein signature atabase. Nucleic Acids Res 37(Database issue):D211–D215Google Scholar
  23. International Union of Biochemistry and Molecular Biology. Nomenclature Committee. and E. C. Webb (1992) Enzyme nomenclature 1992: recommendations of the nomenclature committee of the international union of biochemistry and molecular biology on the nomenclature and classification of enzymes. Published for the International Union of Biochemistry and Molecular Biology by Academic Press, San DiegoGoogle Scholar
  24. IRGSP (2005) The map-based sequence of the rice genome. Nature 436(7052):793–800CrossRefGoogle Scholar
  25. Jaillon O, Aury JM et al (2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449(7161):463–467PubMedCrossRefGoogle Scholar
  26. Jurka J, Kapitonov VV et al (2005) Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110(1–4):462–467PubMedCrossRefGoogle Scholar
  27. Kopp J, Schwede T (2006) The SWISS-MODEL repository: new features and functionalities. Nucleic Acids Res 34(Database issue):D315–D318Google Scholar
  28. Korf I (2004) Gene finding in novel genomes. BMC Bioinf 5:59CrossRefGoogle Scholar
  29. Kriventseva EV, Fleischmann W et al (2001) CluSTr: a database of clusters of SWISS-PROT+TrEMBL proteins. Nucleic Acids Res 29(1):33–36PubMedCrossRefGoogle Scholar
  30. Li L, Stoeckert CJ Jr et al (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13(9):2178–2189PubMedCrossRefGoogle Scholar
  31. Liang C, Mao L et al (2009) Evidence-based gene predictions in plant genomes. Genome Res 19(10):1912–1923PubMedCrossRefGoogle Scholar
  32. Maere S, Heymans K et al (2005) BiNGO: a cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 21(16):3448–3449PubMedCrossRefGoogle Scholar
  33. Merchant SS, Prochnik SE et al (2007) The chlamydomonas genome reveals the evolution of key animal and plant functions. Science 318(5848):245–250PubMedCrossRefGoogle Scholar
  34. Ming R, Hou S et al (2008) The draft genome of the transgenic tropical fruit tree papaya (carica papaya Linnaeus). Nature 452(7190):991–996PubMedCrossRefGoogle Scholar
  35. Mockler TC, Michael TP et al (2007) The DIURNAL project: DIURNAL and circadian expression profiling, model-based pattern matching, and promoter analysis. Cold Spring Harb Symp Quant Biol 72:353–363PubMedCrossRefGoogle Scholar
  36. Mulder N, Apweiler R (2007) InterPro and InterProScan: tools for protein sequence classification and comparison. Methods Mol Biol 396:59–70PubMedCrossRefGoogle Scholar
  37. O’Brien KP, Remm M et al (2005) Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res 33(Database issue):D476–D480Google Scholar
  38. Ostlund G, Schmitt T et al (2010). InParanoid 7: new algorithms and tools for eukaryotic orthology analysis.” Nucleic Acids Res 38(Database issue):D196–D203Google Scholar
  39. Ouyang S, Buell CR (2004) The TIGR plant repeat databases: a collective resource for the identification of repetitive sequences in plants. Nucleic Acids Res 32(Database issue):D360–D363Google Scholar
  40. Paterson AH, Bowers JE et al (2009) The sorghum bicolor genome and the diversification of grasses. Nature 457(7229):551–556PubMedCrossRefGoogle Scholar
  41. Pieper U, Webb BM et al (2011) ModBase, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res 39(Database issue): D465–D474Google Scholar
  42. Potter SC, Clarke L et al (2004) The Ensembl analysis pipeline. Genome Res 14(5):934–941PubMedCrossRefGoogle Scholar
  43. Rawlings ND, Tolle DP et al (2004) MEROPS: the peptidase database. Nucleic Acids Res 32(Database issue):D160–D164Google Scholar
  44. Remm M, Storm CE et al (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol 314(5):1041–1052PubMedCrossRefGoogle Scholar
  45. Rensing SA, Lang D et al (2008) The physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science 319(5859):64–69PubMedCrossRefGoogle Scholar
  46. Sato S, Hirakawa H et al (2011) Sequence analysis of the genome of an oil-bearing tree, Jatropha curcas L. DNA Res 18(1):65–76PubMedCrossRefGoogle Scholar
  47. Schmutz J, Cannon SB et al (2010) Genome sequence of the palaeopolyploid soybean. Nature 463(7278):178–183PubMedCrossRefGoogle Scholar
  48. Schnable PS, Ware D et al (2009) The B73 maize genome: complexity, diversity, and dynamics. Science 326(5956):1112–1115PubMedCrossRefGoogle Scholar
  49. Shulaev V, Sargent DJ et al (2011) The genome of woodland strawberry (Fragaria vesca). Nat Genet 43(2):109–116PubMedCrossRefGoogle Scholar
  50. Smoot ME, Ono K et al (2011) Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27(3):431–432PubMedCrossRefGoogle Scholar
  51. Solovyev V, Kosarev P et al (2006) Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome Biol 7 Suppl 1:S10 11–12Google Scholar
  52. Spannagl M, Noubibou O et al (2007) MIPSPlantsDB–plant database resource for integrative and comparative plant genome research. Nucleic Acids Res 35(Database issue):D834–D840Google Scholar
  53. Stanke, M. and B. Morgenstern (2005). “AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints.” Nucleic Acids Res 33(Web Server issue): W465-467Google Scholar
  54. Tarailo-Graovac M, Chen N (2009) Using repeatmasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics Chapter 4: Unit 4 10Google Scholar
  55. Tipney HHL (2010) An introduction to effective use of enrichment analysis software. Hum Genomics 4(3):202PubMedCrossRefGoogle Scholar
  56. Tuskan GA, Difazio S et al (2006) The genome of black cottonwood, Populus trichocarpa (Torr. and Gray). Science 313(5793):1596–1604PubMedCrossRefGoogle Scholar
  57. Velasco R, Zharkikh A et al (2010) The genome of the domesticated apple (Malus x domestica Borkh.). Nat Genet 42(10):833–839PubMedCrossRefGoogle Scholar
  58. Vilella AJ, Severin J et al (2009) Ensembl Compara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res 19(2):327–335PubMedCrossRefGoogle Scholar
  59. Vogel (2010) Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463(7282):763–768CrossRefGoogle Scholar
  60. Yeats C, Lees J et al (2011) The Gene3D Web Services: a platform for identifying, annotating and comparing structural domains in protein sequences. Nucleic Acids Res 39(Web Server issue):W546–W550Google Scholar
  61. Youens-Clark K, Buckler E et al (2011) Gramene database in 2010: updates and extensions. Nucleic Acids Res 39(Database issue): D1085–D1094Google Scholar
  62. Zheng Q, Wang XJ (2008). GOEAST: a web-based software toolkit for gene ontology enrichment analysis. Nucleic Acids Res 36(Web Server issue): W358–W363Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  • Vindhya Amarasinghe
    • 1
  • Palitha Dharmawardhana
    • 1
  • Justin Elser
    • 1
  • Pankaj Jaiswal
    • 1
    Email author
  1. 1.Deparment of Botany and Plant PathologyOregon State UniversityCorvallisUSA

Personalised recommendations