Bioinformatics pp 293-329 | Cite as

Standards for Functional Genomics

  • Stephen A. Chervitz
  • Helen Parkinson
  • Jennifer M. Fostel
  • Helen C. Causton
  • Susanna-Assunta Sanson
  • Eric W. Deutsch
  • Dawn Field
  • Chris F. Taylor
  • Philippe Rocca-Serra
  • Joe White
  • Christian J. Stoeckert


Fuelled by the fruits of the genome sequencing projects that are defining the complete sets of genes, transcripts, and proteins within an organism and the advent of highly multiplex technologies capable of measuring thousands to millions of biomolecules per sample in one assay, functional genomics studies are enabling new approaches for studying biological systems. A single experiment can generate very large amounts of raw data as well as summaries in the form of lists of sequences, genes, proteins, metabolites, SNPs, etc. which have been identified by various analytical tests. Managing, reporting, and integrating the results from these experiments present challenges to researchers and bioinformaticians in this relatively young field because the standards and conventions developed for single-gene or single-protein studies do not accommodate the needs of functional genomics studies (Boguski 1999). Functional genomics technologies and their applications are evolving rapidly, and there is widespread awareness of the need for, and value of, standards in the life sciences community. Not only do the widely-adopted standards help scientists and data analysts utilize the ever-growing mountain of functional genomics data sets better, they also are essential for the application of functional genomics approaches in healthcare environments. This chapter provides an introduction to the major functional genomics standards initiatives in the domains of genomics, transcriptomics, proteomics, and metabolomics, thereby providing a summary of goals, example applications, and references for further information. It also covers the application of standards in healthcare settings, where functional genomics technologies are having an increasing impact. New standards and organizations may come along in the future that will augment or ­supersede the ones described here. Interested readers are invited to further explore the s­tandards mentioned in this chapter (as well as others not mentioned) and keep up with the latest developments by visiting the website


Functional Genomic Open Biomedical Ontology Functional Genomic Data Data Exchange Format Data Exchange Standard 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



SAC acknowledges financial support received from Affymetrix, Inc. during the preparation of this manuscript. The following people provided useful feedback: Nigel Hardy, Henning Hermjakob, Janet Warrington, and the OBI-developers mailing list.


  1. Allison M (2008) Is personalized medicine finally arriving? Nat Biotechnol 26(5):509–517CrossRefPubMedGoogle Scholar
  2. Ashburner M, Lewis S (2002) On ontologies for biologists: the Gene Ontology – untangling the web. Novartis Found Symp 247:66–80 discussion 80-3, 84-90, 244-52CrossRefPubMedGoogle Scholar
  3. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C et al (2007) NCBI GEO: mining tens of millions of expression profiles – database and tools update. Nucleic Acids Res 35(Database issue):D760–D765CrossRefPubMedGoogle Scholar
  4. Biomed Central Genome Medicine Journal announcement (2008) Personalized medicine: Innovative online journal leads the way. From
  5. Bland PH, Laderach GE, Meyer CR (2007) A web-based interface for communication of data between the clinical and research environments without revealing identifying information. Acad Radiol 14(6):757–764CrossRefGoogle Scholar
  6. Boguski MS (1999) Biosequence exegesis. Science 286(5439):453–455CrossRefPubMedGoogle Scholar
  7. Brazma A (2001) On the importance of standardisation in life sciences. Bioinformatics 17(2):113–114CrossRefPubMedGoogle Scholar
  8. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C et al (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29(4):365–371CrossRefPubMedGoogle Scholar
  9. Brazma A, Krestyaninova M, Sarkans U (2006) Standards for systems biology. Nat Rev Genet 7(8):593–605CrossRefPubMedGoogle Scholar
  10. Brooksbank C, Quackenbush J (2006) Data standards: a call to action. OMICS 10(2):94–99CrossRefPubMedGoogle Scholar
  11. Day A, Carlson MR, Dong J, O’Connor BD, Nelson SF (2007) Celsius: a community resource for Affymetrix microarray data. Genome Biol 8(6):R112CrossRefPubMedGoogle Scholar
  12. DeFrancesco L (2002) Journal trio embraces MIAME. News from The Scientist. 3:20021010-05Google Scholar
  13. Deutsch E (2008) mzML: A single, unifying data format for mass spectrometer output. Proteomics 8(14):2776–2777CrossRefPubMedGoogle Scholar
  14. Deutsch EW, Lam H, Aebersold R (2008) Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics. Physiol Genomics 33(1):18–25CrossRefPubMedGoogle Scholar
  15. Ferris, T. A., G. M. Garrison and H. J. Lowe (2002). A proposed key escrow system for secure patient information disclosure in biomedical research databases. Proc AMIA Symp: 245-9.Google Scholar
  16. Fiehn O, Kristal B, van Ommen B, Sumner LW, Sansone SA, Taylor C et al (2006) Establishing reporting standards for metabolomic and metabonomic studies: a call for participation. OMICS 10(2):158–163CrossRefPubMedGoogle Scholar
  17. Fiehn O, Robertson D, Griffin J, van der Werf M, Nikolau B, Morrison N et al (2007a) The metabolomics standards initiative (MSI). Metabolomics 3(3):175–178CrossRefGoogle Scholar
  18. Fiehn O, Sumner L, Rhee S, Ward J, Dickerson J, Lange B et al (2007b) Minimum reporting standards for plant biology context information in metabolomic studies. Metabolomics 3(3):195–201CrossRefGoogle Scholar
  19. Field D, Garrity G, Gray T, Morrison N, Selengut J, Sterk P, et al (2008) The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol 26(5):541–547Google Scholar
  20. Fostel JM (2008) Towards standards for data exchange and integration and their impact on a public database such as CEBS (Chemical Effects in Biological Systems). Toxicol Appl Pharmacol 233(1):54–62CrossRefPubMedGoogle Scholar
  21. Galperin MY, Cochrane GR (2009) Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009. Nucleic Acids Res 37:D1–D4CrossRefPubMedGoogle Scholar
  22. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5(10):R80CrossRefPubMedGoogle Scholar
  23. Gollub J, Ball CA, Binkley G, Demeter J, Finkelstein DB, Hebert JM, Hernandez-Boussard T, Jin H, Kaloper M, Matese JC, Schroeder M, Brown PO, Botstein D, Sherlock G (2003) The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res 31(1):94–96CrossRefPubMedGoogle Scholar
  24. Goodacre R, Broadhurst D, Smilde A, Kristal B, Baker J, Beger R et al (2007) Proposed minimum reporting standards for data analysis in metabolomics. Metabolomics 3(3):231–241CrossRefGoogle Scholar
  25. Hardy N, Taylor C (2007) A roadmap for the establishment of standard data exchange structures for metabolomics. Metabolomics 3(3):243–248CrossRefGoogle Scholar
  26. Jenkins H, Hardy N, Beckmann M, Draper J, Smith AR, Taylor J et al (2004) A proposed framework for the description of plant metabolomics experiments and their results. Nat Biotechnol 22(12):1601–1606CrossRefPubMedGoogle Scholar
  27. Jenkins H, Johnson H, Kular B, Wang T, Hardy N (2005) Toward supportive data collection tools for plant metabolomics. Plant Physiol 138(1):67–77CrossRefPubMedGoogle Scholar
  28. Jones AR, Lister AL, Hermida L, Wilkinson P, Eisenacher M, Belhajjame K, Gibson F, Lord P, Pocock M, Rosenfelder H, Santoyo-Lopez J, Wipat A, Paton NW (2009) Modelling and managing experimental data using FUGE. Omics 13(3):239–251Google Scholar
  29. Jones AR, Paton NW (2005) An analysis of extensible modelling for functional genomics data. BMC Bioinformatics 6:235CrossRefPubMedGoogle Scholar
  30. Jones AR, Miller M, Aebersold R, Apweiler R, Ball CA, Brazma A et al (2007) The Functional Genomics Experiment model (FuGE): an extensible framework for standards in functional genomics. Nat Biotechnol 25(10):1127–1133CrossRefPubMedGoogle Scholar
  31. Jones P, Côté RG, Cho SY, Kile S, Martens L, Quinn AF, Thorneycroft D, Hermjakob H (2008) PRIDE: New developments and new data sets. Nucleic Acids Res 36 (Database issue): D878–D883.Google Scholar
  32. Keller A, Eng J, Zhang N, Li XJ, Aebersold R (2005) A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol Syst Biol 1(2005):0017PubMedGoogle Scholar
  33. Kerrien S, Orchard S, Montecchi-Palazzi L, Aranda B, Quinn AF, Vinod N et al (2007) Broadening the horizon – level 2.5 of the HUPO-PSI format for molecular interactions. BMC Biol 5:44CrossRefPubMedGoogle Scholar
  34. Kile S, Martens L, Vizcaíno JA, Côté R, Jones P, Apweiler R, Hinneburg A. Hermjakob H (2008) Analyzing large-scale proteomics projects with latent semantic indexing. J Proteome Res 7(1):182–191Google Scholar
  35. Kottmann R, Gray T, Murphy S, Kagan L, Kravitz S, Lombardot T, Field D, Glöckner FO (2008) A standard MIGS/MIMS compliant XML schema: toward the development of the Genomic Contextual Data Markup Language (GCDML). Omics 12(2):115–121Google Scholar
  36. Kumar D (2007) From evidence-based medicine to genomic medicine. Genomic Med 1(3–4):95–104CrossRefPubMedGoogle Scholar
  37. Manduchi E, Grant GR, He H, Liu J, Mailman MD, Pizarro AD et al (2004) RAD and the RAD Study-Annotator: an approach to collection, organization and exchange of all relevant information for high-throughput gene expression studies. Bioinformatics 20(4):452–459CrossRefPubMedGoogle Scholar
  38. Meslin EM (2006) Shifting paradigms in health services research ethics. Consent, privacy, and the challenges for IRBs. J Gen Intern Med 21(3):279–280CrossRefPubMedGoogle Scholar
  39. Nature Cell Biology Editorial (2008) Standardizing data. Nat Cell Biol 10(10):1123–1124CrossRefGoogle Scholar
  40. Navarange M, Game L, Fowler D, Wadekar V, Banks H, Cooley N et al (2005) MiMiR: a comprehensive solution for storage, annotation and exchange of microarray data. BMC Bioinformatics 6:268CrossRefPubMedGoogle Scholar
  41. Ochsner SA, Steffen DL, Stoeckert CJ Jr, McKenna NJ (2008) Much room for improvement in deposition rates of expression microarray datasets. Nat Methods 5(12):991CrossRefPubMedGoogle Scholar
  42. Orchard S, Hermjakob H (2008) The HUPO proteomics standards initiative–easing communication and minimizing data loss in a changing world. Brief Bioinform 9(2):166–173CrossRefPubMedGoogle Scholar
  43. Orchard S, Salwinski L, Kerrien S, Montecchi-Palazzi L, Oesterheld M, Stumpflen V et al (2007) The minimum information required for reporting a molecular interaction experiment (MIMIx). Nat Biotechnol 25(8):894–898CrossRefPubMedGoogle Scholar
  44. Parkinson H, Kapushesky M, Kolesnikov N, Rustici G, Shojatalab M, Abeygunawardena N et al (2009) ArrayExpress update – from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res 37:D868–D872CrossRefPubMedGoogle Scholar
  45. Pedrioli PG, Eng JK, Hubley R, Vogetzang M, Deutsch EW, Raught B, et al (2004) A common open representation of mass spectrometry data and its application to proteomics research. Nat Biotechnol 22(11):1459–1466Google Scholar
  46. Piwowar HA, Chapman W (2008) Identifying data sharing in biomedical literature. AMIA Annu Symp Proc 6:596–600Google Scholar
  47. Piwowar HA, Becich MJ, Bilofsky H, Crowley RS (2008) Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers. PLoS Med 5(9):e183CrossRefPubMedGoogle Scholar
  48. Quackenbush J (2006) Standardizing the standards. Mol Syst Biol 2(2006):0010PubMedGoogle Scholar
  49. Rayner T, Rocca-Serra P, Spellman PT, Causton HC, Farne A, Holloway E, Liu J, Maier DS, Miller M, Petersen K, Quackenbush J, Sherlock G, Stoeckert C Jr, White J, Whetzel P, Wymore F, Parkinson H, Sarkans U, Ball C, Brazma A (2006) A simple spreadsheet-based, MIAME-supportive format for microarray data. BMC Bioinformatics 7:489CrossRefPubMedGoogle Scholar
  50. Rayner TF, Rezwan FI, Lukk M, Bradley XZ, Farne A, Holloway E et al (2009) MAGETabulator, a suite of tools to support the microarray data format MAGE-TAB. Bioinformatics 25(2):279–280CrossRefPubMedGoogle Scholar
  51. Rogers S, Cambrosio A (2007) Making a new technology work: the standardization and regulation of microarrays. Yale J Biol Med 80(4):165–178PubMedGoogle Scholar
  52. Rubin DL, Lewis SE, Mungall CJ, Misra S, Westerfield M, Ashburner M et al (2006) National Center for Biomedical Ontology: advancing biomedicine through structured organization of scientific knowledge. OMICS 10(2):185–198CrossRefPubMedGoogle Scholar
  53. Sagotsky JA, Zhang L, Wang Z, Martin S, Deisboeck TS (2008) Life Sciences and the web: a new era for collaboration. Mol Syst Biol 4:201CrossRefPubMedGoogle Scholar
  54. Salit M (2006) Standards in gene expression microarray experiments. Methods Enzymol 411:63–78CrossRefPubMedGoogle Scholar
  55. Sansone SA, Rocca-Serra P, Tong W, Fostel J, Morrison N, Jones AR (2006) A strategy capitalizing on synergies: the Reporting Structure for Biological Investigation (RSBI) working group. OMICS 10(2):164–171CrossRefPubMedGoogle Scholar
  56. Sansone S-A, Schober D, Atherton H, Fiehn O, Jenkins H, Rocca-Serra P et al (2007a) Metabolomics standards initiative: ontology working group work in progress. Metabolomics 3(3):249–256CrossRefGoogle Scholar
  57. Sansone SA, Fan T, Goodacre R, Griffin JL, Hardy NW, Kaddurah-Daouk R et al (2007b) The metabolomics standards initiative. Nat Biotechnol 25(8):846–848CrossRefPubMedGoogle Scholar
  58. Sansone SA, Rocca-Serra P, Brandizi M, Brazma A, Field D, Fostel J et al (2008) The first RSBI (ISA-TAB) workshop: “can a simple format work for complex studies?”. OMICS 12(2):143–149CrossRefPubMedGoogle Scholar
  59. Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC et al (2006) The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24(9):1151–1161CrossRefPubMedGoogle Scholar
  60. Smith B, Ceusters W, Klagges B, Köhler J, Kumar A, Lomax J, Mungall C, Neuhaus F, Rector AL, Rosse C (2005) Relations in biomedical ontologies. Genome Biol 6:R46CrossRefPubMedGoogle Scholar
  61. Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W et al (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 25(11):1251–1255CrossRefPubMedGoogle Scholar
  62. Spasic´ I, Dunn WB, Velarde G, Tseng A, Jenkins H, Hardy N et al (2006) MeMo: a hybrid SQL/XML approach to metabolomic data management for functional genomics. BMC Bioinformatics 7:281CrossRefPubMedGoogle Scholar
  63. Spellman PT, Miller M, Stewart J, Troup C, Sarkans U, Chervitz S et al (2002) Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol 3(9):RESEARCH0046CrossRefPubMedGoogle Scholar
  64. Stein LD (2008) Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges. Nat Rev Genet 9(9):678–688CrossRefPubMedGoogle Scholar
  65. Stoeckert CJ Jr, Causton HC, Ball CA (2002) Microarray databases: standards and ontologies. Nat Genet 32(Suppl):469–473CrossRefPubMedGoogle Scholar
  66. Taylor CF (2006) Minimum reporting requirements for proteomics: a MIAPE primer. Proteomics 6(Suppl 2):39–44CrossRefPubMedGoogle Scholar
  67. Taylor CF, Paton NW, Lilley KS, Binz PA, Julian RK Jr, Jones AR et al (2007) The minimum information about a proteomics experiment (MIAPE). Nat Biotechnol 25(8):887–893CrossRefPubMedGoogle Scholar
  68. Taylor CF, Field D, Sansone SA, Aerts J, Apweiler R, Ashburner M et al (2008) Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat Biotechnol 26(8):889–896CrossRefPubMedGoogle Scholar
  69. Warrington JA (2008) Standard controls and protocols for microarray based assays in clinical applications. Book of Genes and Medicine. H. Aburatan, Osaka, Medical Do CoGoogle Scholar
  70. Whetzel PL, Brinkman RR, Causton HC, Fan L, Field D, Fostel J et al (2006a) Development of FuGO: an ontology for functional genomics investigations. OMICS 10(2):199–204CrossRefPubMedGoogle Scholar
  71. Whetzel PL, Parkinson H, Causton HC, Fan L, Fostel J, Fragoso G et al (2006b) The MGED Ontology: a resource for semantics-based description of microarray experiments. Bioinformatics 22(7):866–873CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Stephen A. Chervitz
    • 1
  • Helen Parkinson
  • Jennifer M. Fostel
  • Helen C. Causton
  • Susanna-Assunta Sanson
  • Eric W. Deutsch
  • Dawn Field
  • Chris F. Taylor
  • Philippe Rocca-Serra
  • Joe White
  • Christian J. Stoeckert
  1. 1.Affymetrix, Inc.Santa ClaraUSA

Personalised recommendations