Databases and Data Mining

  • Carolyn J. Lawrence
  • Doreen Ware

Over the course of the past decade, the breadth of information that is made available through online resources for plant biology has increased astronomically, as have the interconnectedness among databases, online tools, and methods of data acquisition and analysis. For maize researchers, the number of resources available is both impressive and daunting, in many cases leaving them at a loss regarding where to begin. Described here is an historical perspective on the origin of these resources, as well as how they are expected to change and grow in the future. We outline the current types of resources, how they are connected, and methods for data acquisition, analysis, and interpretation. In addition, we offer guidance to assist researchers place data generated by their maize projects into appropriate databases for long-term storage and use.


Data Warehousing Laboratory Information Management System Arabidopsis Information Resource Distribute Annotation System Database Issue 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Benson, D.A., Boguski, M.S., Lipman, D.J., and Ostell, J. (1997) GenBank. Nucleic Acids Res. 25(1), 1–6.PubMedCrossRefGoogle Scholar
  2. Benson, D.A. Karsch-Mizrachi, I., Lipman P., Gelbart, W.M., and the FlyBase Consortium. (2007) FlyBase: genomes by the dozen. Nucleic Acids Res. 35(Database issue), D486–D491.CrossRefGoogle Scholar
  3. Bieri, T., D. Blasiar, P. Ozersky, I. Antoshechkin, C. Bastiani, P. Canaran, J. Chan, N. Chen, W.J. Chen, P. Davis, T.J. Fiedler, L. Girard, M. Han, T.W. Harris, R. Kishore, R. Lee, S. McKay, H.M. Muller, C. Nakamura, A. Petcherski, A. Rangarajan, A. Rogers, G. Schindelman, E.M. Schwarz, W. Spooner, M.A. Tuli, K. Van Auken, D. Wang, X. Wang, G. Williams, R. Durbin, L.D. Stein, P.W. Sternberg, and J. Spieth. 2007. WormBase: new content and better access. Nucleic Acids Res 35: D506–510.PubMedCrossRefGoogle Scholar
  4. Carollo, V., Matthews, D.E., Lazo, G.R., Blake, T.K., Hummel, D.D., Lui, N., Hane, D.L., and Anderson, O.D. (2005) GrainGenes 2.0. An improved resource for the small-grains community. Plant Physiol. 139(2), 643–651.PubMedCrossRefGoogle Scholar
  5. Cartinhour, SW. (1997) Public informatics resources for rice and other grasses. Plant Mol Biol 35(1–2),241–251.PubMedCrossRefGoogle Scholar
  6. Chan, A., Cheung, F., Lee, D., Zheng, L., Whitelaw, D., Pontaroli, A., Sanmiguel, P., Yuan, Y., Bennetzen, J., Barbazuk, W.B., Quackenbush, J., and Rabinowicz, P.D. (2006) The TIGR Maize Database. Nucleic Acids Res. 34, D771–D776.PubMedCrossRefGoogle Scholar
  7. Codd, E.F. (1970) A relational model of data for large shared data banks. Communications of the ACM 13(6), 377–387.CrossRefGoogle Scholar
  8. Dowell, R.D., R.M. Jokerst, A. Day, S.R. Eddy, and L. Stein. 2001. The distributed annotation system. BMC Bioinformatics 2: 7.PubMedCrossRefGoogle Scholar
  9. Eppig, J.T., Blake, J.A., Bult, C.J., Kadin, J.A., Richardson, J.E., and the Mouse Genome Database Group (2007) The mouse genome database (MGD): new features facilitating a model system. Nucleic Acids Res. 35(Database issue), D630–D637.PubMedCrossRefGoogle Scholar
  10. Fernández-Suárez, X.M., and Schuster, M.K. (2007) Using the Ensembl genome server to browse genomic sequence data. Curr Protoc Bioinformatics. 1,1.15.Google Scholar
  11. Fu, Y., Emrich, S.J., Guo, L., Wen, T.J., Ashlock, D.A., Aluru, S., and Schnable, P.S. (2005) Quality assessment of maize assembled genomic islands (MAGIs) and large-scale experimental verification of predicted genes. Proc. Natl. Acad. Sci. U.S.A. 102(34), 12282–12287.PubMedCrossRefGoogle Scholar
  12. Gardiner, J., Schroeder, S., Polacco, M.L., Sanchez-Villeda, H., Fang, Z., Morgante, M., Landewe, T., Fengler, K., Useche, F., Hanafey, M., Tingey, S., Chou, H., Wing, R., Soderlund, C., and Coe, E.H. (2004) Anchoring 93,971 maize expressed sequence tagged unigenes to the bacterial artificial chromosome contig map by two-dimensional overgo hybridization. Plant Physiol. 134,1317–1326.PubMedCrossRefGoogle Scholar
  13. Gonzales, M.D., Archuleta, E., Farmer, A., Gajendran, K., Grant, D., Shoemaker, R., Beavis, W.D., and Waugh, M.E. (2005) The Legume Information System (LIS): an integrated information resource for comparative legume biology. Nucleic Acids Res. 33(Database issue), D660–D665.PubMedCrossRefGoogle Scholar
  14. Grant, D. and Shoemaker, R.C. (2007) SoyBase, The USDA-ARS Soybean Genome Database.
  15. Huala, E., Dickerman, A.W., Garcia-Hernandez, M., Weems, D., Reiser, L., LaFond, F., Hanley, D., Kiphart, D., Zhuang, M., Huang, W., Mueller, L.A., Bhattacharyya, D., Bhaya, D., Sobral, B.W., Beavis, W., Meinke, D.W., Town, C.D., Somerville, C., and Rhee, S.Y. (2001) The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucleic Acids Res. 29(1), 102–5.PubMedCrossRefGoogle Scholar
  16. Hubbard, T., D. Barker, E. Birney, G. Cameron, Y. Chen, L. Clark, T. Cox, J. Cuff, V. Curwen, T. Down, R. Durbin, E. Eyras, J. Gilbert, M. Hammond, L. Huminiecki, A. Kasprzyk, H. Lehvaslaiho, P. Lijnzaad, C. Melsopp, E. Mongin, R. Pettett, M. Pocock, S. Potter, A. Rust, E. Schmidt, S. Searle, G. Slater, J. Smith, W. Spooner, A. Stabenau, J. Stalker, E. Stupka, A. Ureta-Vidal, I. Vastrik, and M. Clamp. 2002. The Ensembl genome database project. Nucleic Acids Res 30: 38–41.PubMedCrossRefGoogle Scholar
  17. Jaiswal, P., Avraham, S., Ilic, K., Kellogg, E., McCouch, S.R., Pujar, A., Reiser, L., Rhee, S., Sachs, M., Schaeffer, M., et al. (2005) Plant Ontology (PO): a controlled vocabulary of plant structures and growth stages. Comp. Funct. Genomics 6, 388–406.PubMedCrossRefGoogle Scholar
  18. Jaiswal, P., Ni, J., Yap, I., Ware, D., Spooner, W., Youens-Clark, K., Ren, L., Liang, C., Zhao, W., Ratnapu, K., Faga, B., Canaran, P., Fogleman, M., Hebbard, C., Avraham, S., Schmidt, S., Casstevens, T.M., Buckler, E.S., Stein, L., and McCouch, S. (2006) Gramene: a bird's eye view of cereal genomes. Nucleic Acids Res. 2006 Jan 1;34(Database issue), D717–D723.CrossRefGoogle Scholar
  19. Lacroix, Z. and Critchlow, T. (2003) Bioinformatics: Managing Scientific Data. Morgan Kaufmann Publishers, pp. 21–24.Google Scholar
  20. Lawrence, C.J., Dong, Q., Polacco, M.L., Seigfried, T.E., and Brendel, V. (2004) MaizeGDB, the community database for maize genetics and genomics. Nucleic Acids Res. 32(Database issue), D393–D397.PubMedCrossRefGoogle Scholar
  21. Lawrence, C.J., Schaeffer, M.L., Seigfried, T.E., Campbell, D.A., and Harper, L.C. (2007) MaizeGDB's new data types, resources and activities. Nucleic Acids Res. 35(Database issue), D895–900.PubMedCrossRefGoogle Scholar
  22. Lisch, D., Chomet, P., and Freeling, M. (1995) Genetic characterization of the Mutator system in maize: behavior and regulation of Mu transposons in a minimal line. Genetics 139, 1777–1796.PubMedGoogle Scholar
  23. Lushbough, C., Bergman, M.K., Lawrence, C.J., Jennewein, D., and Brendel, V. (2008) BioExtract Server—an integrated workflow-enabling system to access and analyze heterogenous, distributed biomolecular data. IEEE. ACM Transactions on Computational Biology and Bioinformatics. 11 Sept 2008. IEEE computer Society Digital Library. IEEE Computer Society, 10 November 2008
  24. Mueller, L.A., Solow, T.G., Taylor, N., Skwarecki, B., Buels, R., Binns, J., Lin, C., Wright, M.H., Ahrens, R., Wang, Y., Herbst, E.V., Keyder, E.R., Menda, N., Zamir, D., and Tanksley, S.D. (2005) The SOL Genomics Network: a comparative resource for Solanaceae biology and beyond. Plant Physiol. 138(3), 1310–1317.PubMedCrossRefGoogle Scholar
  25. Neale, D. (2007) Dendrome, The USDA Forest Service's Forest Tree Genome Database. http://
  26. Polacco, M. and Coe, E. (1999) MaizeDB: The maize database. In Bioinformatics Databases and Systems, Letovsky, S.I., ed. Kluwer Academic Publishers, Boston.Google Scholar
  27. Schlueter, S.D., Wilkerson, M.D., Dong, Q., and Brendel, V. (2006) xGDB: open-source computational infrastructure for the integrated evaluation and analysis of genome features. Genome Biol. 7(11), R111.CrossRefGoogle Scholar
  28. Scholl, R., Sachs, M., and Ware, D. (2003) Maintaining collections of mutants for plant functional genomics. In Grotewold, E., ed. Plant Function Genomics, Totowa, NJ Humana Press Vol. 236, pp. 311–326.Google Scholar
  29. Sheth, A.P. and Larson, J.A. (1990) Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys. 22(3), 183–236.CrossRefGoogle Scholar
  30. Shyu, C., Green, J.M., Lun, D.P.K., Kazic, T, Schaeffer, M., and Coe, E. (2007) Image analysis for mapping immeasurable phenotypes in maize. IEEE Signal Processing Maga. May, 115–118.Google Scholar
  31. Sprague, J., Bayraktaroglu, L., Clements, D., Conlin, T., Fashena, D., Frazer, K., Haendel, M., Howe, D.G., Mani, P., Ramachandran, S., Schaper, K., Segerdell, E., Song, P., Sprunger, B., Taylor, S., Van Slyke, C.E., and Westerfield, M. (2006) The Zebrafish Information Network: the zebrafish model organism database. Nucleic Acids Res. 34(Database issue), D581–D585.PubMedCrossRefGoogle Scholar
  32. Stoesser, G., Sterk, P., Tuli, M.A., Stoehr, P.J., and Cameron, G.N. (1997) The EMBL nucleotide sequence database. Nucleic Acids Res. 25(1), 7–14.PubMedCrossRefGoogle Scholar
  33. Tateno, Y. and Gojobori, T. (1997) DNA Data Bank of Japan in the age of information biology. Nucleic Acids Res. 25(1), 14–17.PubMedCrossRefGoogle Scholar
  34. The Gene Ontology Consortium (2000) Gene Ontology: tool for the unification of biology. Nature Genet. 25, 25–29.CrossRefGoogle Scholar
  35. Wang, Q. and Dooner, H.K. (2006) Remarkable variation in maize genome structure inferred from haplotype diversity at the bz locus. Proc. Natl. Acad. Sci. U.S.A. 2006 103(47), 17644–9.PubMedCrossRefGoogle Scholar
  36. Ware, D., Jaiswal, P., Ni, J., Pan, X., Chang, K., Clark, K., Teytelman, L., Schmidt, S., Zhao, W., Cartinhour, S., McCouch, S., and Stein, L. (2002) Gramene: a resource for comparative grass genomics. Nucleic Acids Res. 30(Database issue), 103–105.PubMedCrossRefGoogle Scholar
  37. Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., Helmberg, W., Kenton, D.L., Khovayko, O., Lipman, D.J., Madden, T.L., Maglott, D.R., Ostell, J., Pontius, J.U., Pruitt, K.D., Schuler, G.D., Schriml, L.M., Sequeira, E., Sherry, S.T., Sirotkin, K., Starchenko, G., Suzek, T.O., Tatusov, R., Tatusova, T.A., Wagner, L., and Yaschenko, E. (2005) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 33(Database issue), D39–D45.PubMedCrossRefGoogle Scholar
  38. Wiederhold, G. and Genesereth, M. (1997) The conceptual basis for mediation services. IEEE Expert, 12(5), 38–47.CrossRefGoogle Scholar
  39. Zhao, W., Canaran, P., Jurkuta, R., Fulton, T., Glaubitz, J., Buckler, E., Doebley, J., Gaut, B., Goodman, M., Holland, J., Kresovich, S., McMullen, M., Stein, L., and Ware, D. (2006) Panzea: a database and resource for molecular and functional diversity in the maize genome. Nucleic Acids Res. 34(Database issue), D752–D757.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science + Business Media, LLC 2009

Authors and Affiliations

  • Carolyn J. Lawrence
  • Doreen Ware

There are no affiliations available

Personalised recommendations