Bioinformatics Tools and Databases for Genomics Research

  • B. D. Singh
  • A. K. Singh


Bioinformatics involves the development of statistical tools and techniques and computer software for acquisition, storage, analysis, and visualization of biological information. The European Molecular Biology laboratory (EMBL), the National Center for Biotechnology Information (NCBI), and the DNA Databank of Japan (DDBJ) have been catering to the needs of the researchers around the globe for decades, and the databases and tools hosted by these institutes are continually growing at a rapid pace. Analytical tools such as BLAST and CLUSTAL have been the workhorses for sequence data search and analysis, and these programs have been maintained since the 1990s. In addition, many others tools like AutoSNP, SNP2CAPS, TASSEL, STRUCTURE, etc. are useful for sequence data analysis and for deriving biologically meaningful conclusions based on these analyses. On the other hand, databases like GenBank, Phytozome, the EMBL Nucleotide Sequence Database, SwissProt, and Uniprot Knowledgebase, etc. store huge amounts of nucleotide and protein sequence information that are readily accessible to the public. In addition, the Kyoto Encyclopaedia of Genes and Genomes (KEGG) attempts to understand higher-order biological functions by integrating gene, protein, and metabolic pathway information. This chapter is devoted to the description of various bioinformatics tools and databases relevant for plant breeding activities and discusses their relevant features and applications.


Query Sequence European Bioinformatics Institute European Molecular Biology Laboratory Open Reading Frame Finder Restriction Site Polymorphism 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Altchul SFW, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410CrossRefGoogle Scholar
  2. Bairoch A, Apweiler R (1996) The SWISS-PROT protein sequence data bank and its new supplement TrEMBL. Nucleic Acids Res 24:21–25PubMedCentralPubMedCrossRefGoogle Scholar
  3. Barker G, Batley J, O’Sullivan H et al (2003) Redundancy based detection of sequence polymorphisms in expressed sequence tag data using autoSNP. Bioinformatics Appl Note 19:421–422CrossRefGoogle Scholar
  4. Baxevanis D (2000) The Molecular Biology Database Collection: an online compilation of relevant database resources. Nucleic Acids Res 28:1–7PubMedCentralPubMedCrossRefGoogle Scholar
  5. Brazma A, Parkinson H, Sarkans U et al (2003) ArrayExpress - a public repository for microarray gene expression data at the EBI. Nucleic Acids Res 31:68–71PubMedCentralPubMedCrossRefGoogle Scholar
  6. Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94PubMedCrossRefGoogle Scholar
  7. Carollo V, Matthews DE, Lazo GR et al (2005) GrainGenes 2.0. An improved resource for the small-grains community. Plant Physiol 139:643–651PubMedCentralPubMedCrossRefGoogle Scholar
  8. Chenna R, Sugawara H, Koike T et al (2003) Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 31:3497–3500. doi: 10.1093/nar/gkg500 PubMedCentralPubMedCrossRefGoogle Scholar
  9. Duran C, Appleby N, Clark T et al (2009) AutoSNPdb: an annotated single nucleotide polymorphism database for crop plants. Nucleic Acids Res 37(Database issue):D951–953. doi: 10.1093/nar/gkn650 PubMedCentralPubMedCrossRefGoogle Scholar
  10. Edgar R, Domrachev M, Lash AE (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30:207–210PubMedCentralPubMedCrossRefGoogle Scholar
  11. Goodstein DM, Shu S, Howson R et al (2012) Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res 40(Database issue):D1178–D1186PubMedCentralPubMedCrossRefGoogle Scholar
  12. Hashimoto K, Goto S, Kawano S et al (2006) KEGG as a glycome informatics resource. Glycobiology 16:63R–70RPubMedCrossRefGoogle Scholar
  13. Jenkins H, Hardy N, Beckmann M et al (2004) A proposed framework for the description of plant metabolomics experiments and their results. Nature Biotechnol 22:1601–1606CrossRefGoogle Scholar
  14. Kanehisa M, Goto S, Kawashima S et al (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res 32(Database issue):D277–D280PubMedCentralPubMedCrossRefGoogle Scholar
  15. Kanz C, Aldebert P, Althorpe N et al (2005) The EMBL nucleotide sequence database. Nucleic Acids Res 33(Database issue):D29–D3PubMedCentralPubMedCrossRefGoogle Scholar
  16. Larkin MA, Blackshields G, Brown NP et al (2007) ClustalW and ClustalX version 2. Bioinformatics 23:2947–2948. doi: 10.1093/bioinformatics/btm404 PubMedCrossRefGoogle Scholar
  17. Lawrence CJ, Seigfried TE, Brendel V (2005) The Maize Genetics and Genomics Database. The community resource for access to diverse maize data. Plant Physiol 138:55–58PubMedCentralPubMedCrossRefGoogle Scholar
  18. Magrane M, UniProt Consortium (2011) UniProt knowledgebase: a hub of integrated protein data. Database vol 2011, Article ID bar009. doi: 10.1093/database/bar009
  19. Parkinson H, Sarkans U, Shojatalab M et al (2005) ArrayExpress—a public repository for microarray gene expression data at the EBI. Nucleic Acids Res 33(Database issue):D553–D555PubMedCentralPubMedCrossRefGoogle Scholar
  20. Pritchard JK, Stephens M, Rosenberg NA et al (2000a) Association mapping in structured populations. Am J Hum Genet 67:170–181PubMedCentralPubMedCrossRefGoogle Scholar
  21. Saeed AI, Sharov V, White J et al (2003) TM4: a free, open-source system for microarray data management and analysis. BioTechniques 34:374–378PubMedGoogle Scholar
  22. Savage D, Batley J, Erwin T et al (2005) SNPServer: a real-time SNP discovery tool. Nucleic Acids Res 33(Web Server issue):W493–W495PubMedCentralPubMedCrossRefGoogle Scholar
  23. Thiel T, Kota R, Grosse I et al (2004) SNP2CAPS: a SNP and INDEL analysis tool for CAPS marker development. Nucleic Acids Res 32:e5. doi: 10.1093/nar/gnh006 PubMedCentralPubMedCrossRefGoogle Scholar
  24. Thimm O, Blasing O, Gibon Y et al (2004) MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. Plant J 37:914–939PubMedCrossRefGoogle Scholar
  25. Thompson JD, Gibson TJ, Plewniak F et al (1997) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25:4876–4882. doi: 10.1093/nar/25.24.4876 PubMedCentralPubMedCrossRefGoogle Scholar
  26. Thongjuea S, Ruanjaichon V, Bruskiewich R et al (2009) RiceGeneThresher: a web-based application for mining genes underlying QTL in rice genome. Nucleic Acids Res 37(Database issue):D996–D1000PubMedCentralPubMedCrossRefGoogle Scholar
  27. UniProt Consortium (2013) Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res 41(Database issue):D43–D47CrossRefGoogle Scholar
  28. Ware DH, Jaiswal P, Ni J et al (2002) Gramene, a tool for grass genomics. Plant Physiol 130:1606–1613PubMedCentralPubMedCrossRefGoogle Scholar
  29. Youens-Clark K, Buckler E, Casstevens T et al (2011) Gramene database in 2010: updates and extensions. Nucleic Acids Res 39(Database issue):D1085–D1094PubMedCentralPubMedCrossRefGoogle Scholar

Copyright information

© Author(s) 2015

Authors and Affiliations

  • B. D. Singh
    • 1
  • A. K. Singh
    • 2
  1. 1.School of BiotechnologyBanaras Hindu UniversityVaranasiIndia
  2. 2.Division of GeneticsIndian Agricultural Research InstituteNew DelhiIndia

Personalised recommendations