Databases and Data Mining

Lawrence, Carolyn J.; Ware, Doreen

doi:10.1007/978-0-387-77863-1_33

Carolyn J. Lawrence &
Doreen Ware

4084 Accesses

Over the course of the past decade, the breadth of information that is made available through online resources for plant biology has increased astronomically, as have the interconnectedness among databases, online tools, and methods of data acquisition and analysis. For maize researchers, the number of resources available is both impressive and daunting, in many cases leaving them at a loss regarding where to begin. Described here is an historical perspective on the origin of these resources, as well as how they are expected to change and grow in the future. We outline the current types of resources, how they are connected, and methods for data acquisition, analysis, and interpretation. In addition, we offer guidance to assist researchers place data generated by their maize projects into appropriate databases for long-term storage and use.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Benson, D.A., Boguski, M.S., Lipman, D.J., and Ostell, J. (1997) GenBank. Nucleic Acids Res. 25(1), 1–6.
Article PubMed CAS Google Scholar
Benson, D.A. Karsch-Mizrachi, I., Lipman P., Gelbart, W.M., and the FlyBase Consortium. (2007) FlyBase: genomes by the dozen. Nucleic Acids Res. 35(Database issue), D486–D491.
Article Google Scholar
Bieri, T., D. Blasiar, P. Ozersky, I. Antoshechkin, C. Bastiani, P. Canaran, J. Chan, N. Chen, W.J. Chen, P. Davis, T.J. Fiedler, L. Girard, M. Han, T.W. Harris, R. Kishore, R. Lee, S. McKay, H.M. Muller, C. Nakamura, A. Petcherski, A. Rangarajan, A. Rogers, G. Schindelman, E.M. Schwarz, W. Spooner, M.A. Tuli, K. Van Auken, D. Wang, X. Wang, G. Williams, R. Durbin, L.D. Stein, P.W. Sternberg, and J. Spieth. 2007. WormBase: new content and better access. Nucleic Acids Res 35: D506–510.
Article PubMed CAS Google Scholar
Carollo, V., Matthews, D.E., Lazo, G.R., Blake, T.K., Hummel, D.D., Lui, N., Hane, D.L., and Anderson, O.D. (2005) GrainGenes 2.0. An improved resource for the small-grains community. Plant Physiol. 139(2), 643–651.
Article PubMed CAS Google Scholar
Cartinhour, SW. (1997) Public informatics resources for rice and other grasses. Plant Mol Biol 35(1–2),241–251.
Article PubMed CAS Google Scholar
Chan, A., Cheung, F., Lee, D., Zheng, L., Whitelaw, D., Pontaroli, A., Sanmiguel, P., Yuan, Y., Bennetzen, J., Barbazuk, W.B., Quackenbush, J., and Rabinowicz, P.D. (2006) The TIGR Maize Database. Nucleic Acids Res. 34, D771–D776.
Article PubMed CAS Google Scholar
Codd, E.F. (1970) A relational model of data for large shared data banks. Communications of the ACM 13(6), 377–387.
Article Google Scholar
Dowell, R.D., R.M. Jokerst, A. Day, S.R. Eddy, and L. Stein. 2001. The distributed annotation system. BMC Bioinformatics 2: 7.
Article PubMed CAS Google Scholar
Eppig, J.T., Blake, J.A., Bult, C.J., Kadin, J.A., Richardson, J.E., and the Mouse Genome Database Group (2007) The mouse genome database (MGD): new features facilitating a model system. Nucleic Acids Res. 35(Database issue), D630–D637.
Article PubMed CAS Google Scholar
Fernández-Suárez, X.M., and Schuster, M.K. (2007) Using the Ensembl genome server to browse genomic sequence data. Curr Protoc Bioinformatics. 1,1.15.
Google Scholar
Fu, Y., Emrich, S.J., Guo, L., Wen, T.J., Ashlock, D.A., Aluru, S., and Schnable, P.S. (2005) Quality assessment of maize assembled genomic islands (MAGIs) and large-scale experimental verification of predicted genes. Proc. Natl. Acad. Sci. U.S.A. 102(34), 12282–12287.
Article PubMed CAS Google Scholar
Gardiner, J., Schroeder, S., Polacco, M.L., Sanchez-Villeda, H., Fang, Z., Morgante, M., Landewe, T., Fengler, K., Useche, F., Hanafey, M., Tingey, S., Chou, H., Wing, R., Soderlund, C., and Coe, E.H. (2004) Anchoring 93,971 maize expressed sequence tagged unigenes to the bacterial artificial chromosome contig map by two-dimensional overgo hybridization. Plant Physiol. 134,1317–1326.
Article PubMed Google Scholar
Gonzales, M.D., Archuleta, E., Farmer, A., Gajendran, K., Grant, D., Shoemaker, R., Beavis, W.D., and Waugh, M.E. (2005) The Legume Information System (LIS): an integrated information resource for comparative legume biology. Nucleic Acids Res. 33(Database issue), D660–D665.
Article PubMed CAS Google Scholar
Grant, D. and Shoemaker, R.C. (2007) SoyBase, The USDA-ARS Soybean Genome Database. http://soybase.org.
Huala, E., Dickerman, A.W., Garcia-Hernandez, M., Weems, D., Reiser, L., LaFond, F., Hanley, D., Kiphart, D., Zhuang, M., Huang, W., Mueller, L.A., Bhattacharyya, D., Bhaya, D., Sobral, B.W., Beavis, W., Meinke, D.W., Town, C.D., Somerville, C., and Rhee, S.Y. (2001) The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucleic Acids Res. 29(1), 102–5.
Article PubMed CAS Google Scholar
Hubbard, T., D. Barker, E. Birney, G. Cameron, Y. Chen, L. Clark, T. Cox, J. Cuff, V. Curwen, T. Down, R. Durbin, E. Eyras, J. Gilbert, M. Hammond, L. Huminiecki, A. Kasprzyk, H. Lehvaslaiho, P. Lijnzaad, C. Melsopp, E. Mongin, R. Pettett, M. Pocock, S. Potter, A. Rust, E. Schmidt, S. Searle, G. Slater, J. Smith, W. Spooner, A. Stabenau, J. Stalker, E. Stupka, A. Ureta-Vidal, I. Vastrik, and M. Clamp. 2002. The Ensembl genome database project. Nucleic Acids Res 30: 38–41.
Article PubMed CAS Google Scholar
Jaiswal, P., Avraham, S., Ilic, K., Kellogg, E., McCouch, S.R., Pujar, A., Reiser, L., Rhee, S., Sachs, M., Schaeffer, M., et al. (2005) Plant Ontology (PO): a controlled vocabulary of plant structures and growth stages. Comp. Funct. Genomics 6, 388–406.
Article PubMed CAS Google Scholar
Jaiswal, P., Ni, J., Yap, I., Ware, D., Spooner, W., Youens-Clark, K., Ren, L., Liang, C., Zhao, W., Ratnapu, K., Faga, B., Canaran, P., Fogleman, M., Hebbard, C., Avraham, S., Schmidt, S., Casstevens, T.M., Buckler, E.S., Stein, L., and McCouch, S. (2006) Gramene: a bird's eye view of cereal genomes. Nucleic Acids Res. 2006 Jan 1;34(Database issue), D717–D723.
Article Google Scholar
Lacroix, Z. and Critchlow, T. (2003) Bioinformatics: Managing Scientific Data. Morgan Kaufmann Publishers, pp. 21–24.
Google Scholar
Lawrence, C.J., Dong, Q., Polacco, M.L., Seigfried, T.E., and Brendel, V. (2004) MaizeGDB, the community database for maize genetics and genomics. Nucleic Acids Res. 32(Database issue), D393–D397.
Article PubMed CAS Google Scholar
Lawrence, C.J., Schaeffer, M.L., Seigfried, T.E., Campbell, D.A., and Harper, L.C. (2007) MaizeGDB's new data types, resources and activities. Nucleic Acids Res. 35(Database issue), D895–900.
Article PubMed CAS Google Scholar
Lisch, D., Chomet, P., and Freeling, M. (1995) Genetic characterization of the Mutator system in maize: behavior and regulation of Mu transposons in a minimal line. Genetics 139, 1777–1796.
PubMed CAS Google Scholar
Lushbough, C., Bergman, M.K., Lawrence, C.J., Jennewein, D., and Brendel, V. (2008) BioExtract Server—an integrated workflow-enabling system to access and analyze heterogenous, distributed biomolecular data. IEEE. ACM Transactions on Computational Biology and Bioinformatics. 11 Sept 2008. IEEE computer Society Digital Library. IEEE Computer Society, 10 November 2008 http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.98.
Mueller, L.A., Solow, T.G., Taylor, N., Skwarecki, B., Buels, R., Binns, J., Lin, C., Wright, M.H., Ahrens, R., Wang, Y., Herbst, E.V., Keyder, E.R., Menda, N., Zamir, D., and Tanksley, S.D. (2005) The SOL Genomics Network: a comparative resource for Solanaceae biology and beyond. Plant Physiol. 138(3), 1310–1317.
Article PubMed CAS Google Scholar
Neale, D. (2007) Dendrome, The USDA Forest Service's Forest Tree Genome Database. http:// dendrome.ucdavis.edu.
Polacco, M. and Coe, E. (1999) MaizeDB: The maize database. In Bioinformatics Databases and Systems, Letovsky, S.I., ed. Kluwer Academic Publishers, Boston.
Google Scholar
Schlueter, S.D., Wilkerson, M.D., Dong, Q., and Brendel, V. (2006) xGDB: open-source computational infrastructure for the integrated evaluation and analysis of genome features. Genome Biol. 7(11), R111.
Article Google Scholar
Scholl, R., Sachs, M., and Ware, D. (2003) Maintaining collections of mutants for plant functional genomics. In Grotewold, E., ed. Plant Function Genomics, Totowa, NJ Humana Press Vol. 236, pp. 311–326.
Google Scholar
Sheth, A.P. and Larson, J.A. (1990) Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys. 22(3), 183–236.
Article Google Scholar
Shyu, C., Green, J.M., Lun, D.P.K., Kazic, T, Schaeffer, M., and Coe, E. (2007) Image analysis for mapping immeasurable phenotypes in maize. IEEE Signal Processing Maga. May, 115–118.
Google Scholar
Sprague, J., Bayraktaroglu, L., Clements, D., Conlin, T., Fashena, D., Frazer, K., Haendel, M., Howe, D.G., Mani, P., Ramachandran, S., Schaper, K., Segerdell, E., Song, P., Sprunger, B., Taylor, S., Van Slyke, C.E., and Westerfield, M. (2006) The Zebrafish Information Network: the zebrafish model organism database. Nucleic Acids Res. 34(Database issue), D581–D585.
Article PubMed CAS Google Scholar
Stoesser, G., Sterk, P., Tuli, M.A., Stoehr, P.J., and Cameron, G.N. (1997) The EMBL nucleotide sequence database. Nucleic Acids Res. 25(1), 7–14.
Article PubMed CAS Google Scholar
Tateno, Y. and Gojobori, T. (1997) DNA Data Bank of Japan in the age of information biology. Nucleic Acids Res. 25(1), 14–17.
Article PubMed CAS Google Scholar
The Gene Ontology Consortium (2000) Gene Ontology: tool for the unification of biology. Nature Genet. 25, 25–29.
Article Google Scholar
Wang, Q. and Dooner, H.K. (2006) Remarkable variation in maize genome structure inferred from haplotype diversity at the bz locus. Proc. Natl. Acad. Sci. U.S.A. 2006 103(47), 17644–9.
Article PubMed CAS Google Scholar
Ware, D., Jaiswal, P., Ni, J., Pan, X., Chang, K., Clark, K., Teytelman, L., Schmidt, S., Zhao, W., Cartinhour, S., McCouch, S., and Stein, L. (2002) Gramene: a resource for comparative grass genomics. Nucleic Acids Res. 30(Database issue), 103–105.
Article PubMed CAS Google Scholar
Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., Helmberg, W., Kenton, D.L., Khovayko, O., Lipman, D.J., Madden, T.L., Maglott, D.R., Ostell, J., Pontius, J.U., Pruitt, K.D., Schuler, G.D., Schriml, L.M., Sequeira, E., Sherry, S.T., Sirotkin, K., Starchenko, G., Suzek, T.O., Tatusov, R., Tatusova, T.A., Wagner, L., and Yaschenko, E. (2005) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 33(Database issue), D39–D45.
Article PubMed CAS Google Scholar
Wiederhold, G. and Genesereth, M. (1997) The conceptual basis for mediation services. IEEE Expert, 12(5), 38–47.
Article Google Scholar
Zhao, W., Canaran, P., Jurkuta, R., Fulton, T., Glaubitz, J., Buckler, E., Doebley, J., Gaut, B., Goodman, M., Holland, J., Kresovich, S., McMullen, M., Stein, L., and Ware, D. (2006) Panzea: a database and resource for molecular and functional diversity in the maize genome. Nucleic Acids Res. 34(Database issue), D752–D757.
Article PubMed CAS Google Scholar

Download references

Authors

Carolyn J. Lawrence
View author publications
You can also search for this author in PubMed Google Scholar
Doreen Ware
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Genetics, University of Georgia, Athens, GA, 30602–7223, USA
Jeffrey L. Bennetzen
USDA Plant Gene Expression Center, 800 Buchanan Street, Albany, CA, 94710, USA
Sarah Hake

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lawrence, C.J., Ware, D. (2009). Databases and Data Mining. In: Bennetzen, J.L., Hake, S. (eds) Handbook of Maize. Springer, New York, NY. https://doi.org/10.1007/978-0-387-77863-1_33

Download citation

DOI: https://doi.org/10.1007/978-0-387-77863-1_33
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-77862-4
Online ISBN: 978-0-387-77863-1
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics