Skip to main content

Quality Assessment of MAGE-ML Genomic Datasets Using DescribeX

  • Conference paper
  • 513 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6254))

Abstract

The functional genomics and informatics community has made extensive microarray experimental data available online, facilitating independent evaluation of experiment conclusions and enabling researchers to access and reuse a growing body of gene expression knowledge. While there are several data-exchange standards, numerous microarray experiment datasets are published using the MAGE-ML XML schema. Assessing the quality of published experiments is a challenging task, and there is no consensus among microarray users on a framework to measure dataset quality.

In this paper, we develop techniques based on DescribeX (a summary-based visualization tool for XML) that quantitatively and qualitatively analyze MAGE-ML public collections, gaining insights about schema usage. We address specific questions such as detection of common instance patterns and coverage, precision of the experiment descriptions, and usage of controlled vocabularies. Our case study shows that DescribeX is a useful tool for the evaluation of microarray experiment data quality that enhances the understanding of the instance-level structure of MAGE-ML datasets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lockhart, D.J., Winzeler, E.A.: Genomics, gene expression and dna arrays. Nature 405(6788), 827–836 (2000)

    Article  Google Scholar 

  2. Kohane, I.S., Kho, A., Butte, A.J.: Microarrays for an Integrative Genomics. MIT Press, Cambridge (2002)

    Google Scholar 

  3. Stekel, D.: Microarray bioinformatics. Cambridge University Press, New York (2003)

    Book  Google Scholar 

  4. Ball, C.A., Brazma, A., Causton, H., Chervitz, S., Edgar, R., Hingamp, P., Matese, J.C., Parkinson, H., Quackenbush, J., Ringwald, M., Sansone, S.A., Sherlock, G., Spellman, P., Stoeckert, C., Tateno, Y., Taylor, R., White, J., Winegarden, N.: Submission of Microarray Data to Public Repositories. PLoS Biol. 2(9) (2004)

    Google Scholar 

  5. Spellman, P.T., Miller, M., Stewart, J., Troup, C., Sarkans, U., Chervitz, S., Bernhart, D., Sherlock, G., Ball, C., Lepage, M., Swiatek, M., Marks, W.L., Goncalves, J., Markel, S., Iordan, D., Shojatalab, M., Pizarro, A., White, J., Hubley, R., Deutsch, E., Senger, M., Aronow, B.J., Robinson, A., Bassett, D., Stoeckert, C.J., Brazma, A.: Design and Implementation of Microarray Gene Expression Markup Language (MAGE-ML). Genome biology 3(9) (2002)

    Google Scholar 

  6. Rayner, T., Rocca-Serra, P., Spellman, P., Causton, H., Farne, A., Holloway, E., Irizarry, R., Liu, J., Maier, D., Miller, M., Petersen, K., Quackenbush, J., Sherlock, G., Stoeckert, C., White, J., Whetzel, P., Wymore, F., Parkinson, H., Sarkans, U., Ball, C., Brazma, A.: A Simple spreadsheet-based, MIAME-supportive Format for Microarray Data: MAGETAB. BMC Bioinformatics 7, 489 (2006)

    Article  Google Scholar 

  7. Rayner, T.F., Rezwan, F.I., Lukk, M., Bradley, X.Z., Farne, A., Holloway, E., Malone, J., Williams, E., Parkinson, H.: Magetabulator, a suite of tools to support the microarray data format mage-tab. Bioinformatics 25(2), 279–280 (2009)

    Article  Google Scholar 

  8. MINiML, MIAME Notation in Markup Language (2009), http://www.ncbi.nlm.nih.gov/geo/info/MINiML.html

  9. Brazma, A., Parkinson, H., Sarkans, U., Shojatalab, M., Vilo, J., Abeygunawardena, N., Holloway, E., Kapushesky, M., Kemmeren, P., Lara, G.G., Oezcimen, A., Rocca-Serra, P., Sansone, S.A.: ArrayExpress: a Public Repository for Microarray Gene Expression Data at the EBI. Nucleic Acids Research 31(1), 68–71 (2003)

    Article  Google Scholar 

  10. Bian, X., Klemm, J., Basu, A., Hadfield, J., Srinivasa, R., Parnell, T., Miller, S., Mason, W., Kokotov, D., Duncan, M., Duvall, P., Gurses, L., Boal, T., Misquitta, L., Swan, D., Wysong, R., Klink, A., Johnson, A., Fontenay, G., Liu, J., Colbert, M., Komatsoulis, G.: Data Submission and Curation for caArray, a Standard Based Microarray Data Repository System. In: Nature Proceedings (2009)

    Google Scholar 

  11. Ikeo, K., Ishi-i, J., Tamura, T., Gojobori, T., Tateno, Y.: CIBEX: Center for Information Biology gene EXpression database. Comptes Rendus Biologies 326(10-11), 1079–1082 (2003)

    Article  Google Scholar 

  12. Demeter, J., Beauheim, C., Gollub, J., Hernandez-Boussard, T., Jin, H., Maier, D., Matese, J.C., Nitzberg, M., Wymore, F., Zachariah, Z.K., Brown, P.O., Sherlock, G., Ball, C.A.: The Stanford Microarray Database: Implementation of New Analysis Tools and Open Source Release of Software. Nucleic Acids Research 35(Database issue) (2007)

    Google Scholar 

  13. Gardiner-Garden, M., Littlejohn, T.: A comparison of microarray databases. Briefings in Bioinformatics 2(2), 143–158 (2001)

    Article  Google Scholar 

  14. Do, H.H., Kirsten, T., Rahm, E.: Comparative Evaluation of Microarray-based Gene Expression Databases. In: BTW, pp. 482–501 (2003)

    Google Scholar 

  15. Canales, R.D., Luo, Y., Willey, J.C., Austermiller, B., Barbacioru, C.C., Boysen, C., Hunkapiller, K., Jensen, R.V., Knight, C.R., Lee, K.Y., Ma, Y., Maqsodi, B., Papallo, A., Peters, E.H., Poulter, K., Ruppel, P.L., Samaha, R.R., Shi, L., Yang, W., Zhang, L., Goodsaid, F.M.: Evaluation of dna microarray results with quantitative gene expression platforms. Nature Biotechnology 24(9), 1115–1122 (2006)

    Article  Google Scholar 

  16. Faith, J.J., Driscoll, M.E., Fusaro, V.A., Cosgrove, E.J., Hayete, B., Juhn, F.S., Schneider, S.J., Gardner, T.S.: Many microbe microarrays database: uniformly normalized affymetrix compendia with structured experimental metadata. Nucl. Acids Res. (2007), gkm815+

    Google Scholar 

  17. Zeef, L.: Getting the most value out of Affymetrix array experiments (2006), http://nebc.nox.ac.uk/workshops/mqwshop2006.html

  18. Allison, D.B., Cui, X., Page, G.P., Sabripour, M.: Microarray Data Analysis: From Disarray to Consolidation and Consensus. Nature Reviews Genetics 7(1), 55–65 (2006)

    Article  Google Scholar 

  19. Brettschneider, J., Collin, F., Bolstad, B.M., Speed, T.P.: Quality Assessment for Short Oligonucleotide Microarray Data. Technometrics 50(3), 241–264 (2008)

    Article  MathSciNet  Google Scholar 

  20. Coombes, K.R., Wang, J., Abruzzo, L.V.: Monitoring the Quality of Microarray Experiments. In: Volume Methods of Microarray Data Analysis III of Biomedical and Life Sciences, pp. 25–40. Springer, US (2003)

    Google Scholar 

  21. Müller, H., Naumann, F.: Data quality in genome databases. In: IQ, pp. 269–284 (2003)

    Google Scholar 

  22. Martinez, A., Hammer, J.: Making Quality Count in Biological Data Sources. In: IQIS 2005: Proceedings of the 2nd international workshop on Information quality in information systems, pp. 16–27. ACM, New York (2005)

    Chapter  Google Scholar 

  23. Missier, P., Embury, S.M., Greenwood, M., Preece, A.D., Jin, B.: Managing Information Quality in E-science: the Qurator Workbench. In: SIGMOD Conference, pp. 1150–1152 (2007)

    Google Scholar 

  24. Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for Data Quality Assessment and Improvement. ACM Comput. Surv. 41(3), 1–52 (2009)

    Article  Google Scholar 

  25. Madnick, S.E., Wang, R.Y., Lee, Y.W., Zhu, H.: Overview and Framework for Data and Information Quality Research. J. Data and Information Quality 1(1), 1–22 (2009)

    Google Scholar 

  26. Goldman, R., Widom, J.: DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In: Jarke, M., Carey, M.J., Dittrich, K.R., Lochovsky, F.H., Loucopoulos, P., Jeusfeld, M.A. (eds.) VLDB 1997, Proceedings of 23rd International Conference on Very Large Data Bases, Athens, Greece, August 25-29, pp. 436–445. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  27. Gray, J., Liu, D., Santisteban, M., Szalay, A., DeWitt, D., Heber, G.: Scientific Data Management in the Coming Decade. SIGMOD Rec. 34(4), 34–41 (2005)

    Article  Google Scholar 

  28. Consens, M.P., Rizzolo, F., Vaisman, A.A.: AxPRE Summaries: Exploring the (Semi-) Structure of XML Web Collections. In: ICDE, pp. 1519–1521 (2008)

    Google Scholar 

  29. Ali, M.S., Consens, M.P., Khatchadourian, S., Rizzolo, F.: DescribeX: Interacting with AxPRE Summaries. In: ICDE, pp. 1540–1543 (2008)

    Google Scholar 

  30. Samavi, R., Consens, M., Khatchadourian, S., Topaloglou, T.: Exploring PSI-MI XML Collections Using DescribeX. Journal of Integrative Bioinformatics 4(3), 70 (2007)

    Google Scholar 

  31. Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, C.A., Causton, H.C., Gaasterland, T., Glenisson, P., Holstege, F.C., Kim, I.F., Markowitz, V., Matese, J.C., Parkinson, H., Robinson, A., Sarkans, U., Schulze-Kremer, S., Stewart, J., Taylor, R., Vilo, J., Vingron, M.: Minimum information about a microarray experiment (miame)-toward standards for microarray data. Nature Genetics 29(4), 365–371 (2001)

    Article  Google Scholar 

  32. Barrett, T., Troup, D.B., Wilhite, S.E., Ledoux, P., Rudnev, D., Evangelista, C., Kim, I.F., Soboleva, A., Tomashevsky, M., Edgar, R.: NCBI GEO: Mining Tens of Millions of Expression Profiles–Database and Tools Update. Nucleic Acids Res. 35(Database issue) (2007)

    Google Scholar 

  33. Ali, M., Consens, M., Rizzolo, F.: Visualizing Structural Patterns in Web Collections. In: WWW (2007)

    Google Scholar 

  34. Bex, G., Neven, F., Van den Bussche, J.: DTDs Versus XML Schema: A Practical Study. In: WebDB, pp. 79–84 (2004)

    Google Scholar 

  35. Whetzel, P.L., Parkinson, H., Causton, H.C., Fan, L., Fostel, J., Fragoso, G., Game, L., Heiskanen, M., Morrison, N., Rocca-Serra, P., Sansone, S.A., Taylor, C., White, J., Stoeckert, C.J.: The MGED Ontology: a Resource for Semantics-based Description of Microarray Experiments. Bioinformatics 22(7), 866–873 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Etcheverry, L., Khatchadourian, S., Consens, M. (2010). Quality Assessment of MAGE-ML Genomic Datasets Using DescribeX. In: Lambrix, P., Kemp, G. (eds) Data Integration in the Life Sciences. DILS 2010. Lecture Notes in Computer Science(), vol 6254. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15120-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15120-0_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15119-4

  • Online ISBN: 978-3-642-15120-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics