Abstract
The functional genomics and informatics community has made extensive microarray experimental data available online, facilitating independent evaluation of experiment conclusions and enabling researchers to access and reuse a growing body of gene expression knowledge. While there are several data-exchange standards, numerous microarray experiment datasets are published using the MAGE-ML XML schema. Assessing the quality of published experiments is a challenging task, and there is no consensus among microarray users on a framework to measure dataset quality.
In this paper, we develop techniques based on DescribeX (a summary-based visualization tool for XML) that quantitatively and qualitatively analyze MAGE-ML public collections, gaining insights about schema usage. We address specific questions such as detection of common instance patterns and coverage, precision of the experiment descriptions, and usage of controlled vocabularies. Our case study shows that DescribeX is a useful tool for the evaluation of microarray experiment data quality that enhances the understanding of the instance-level structure of MAGE-ML datasets.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Lockhart, D.J., Winzeler, E.A.: Genomics, gene expression and dna arrays. Nature 405(6788), 827–836 (2000)
Kohane, I.S., Kho, A., Butte, A.J.: Microarrays for an Integrative Genomics. MIT Press, Cambridge (2002)
Stekel, D.: Microarray bioinformatics. Cambridge University Press, New York (2003)
Ball, C.A., Brazma, A., Causton, H., Chervitz, S., Edgar, R., Hingamp, P., Matese, J.C., Parkinson, H., Quackenbush, J., Ringwald, M., Sansone, S.A., Sherlock, G., Spellman, P., Stoeckert, C., Tateno, Y., Taylor, R., White, J., Winegarden, N.: Submission of Microarray Data to Public Repositories. PLoS Biol. 2(9) (2004)
Spellman, P.T., Miller, M., Stewart, J., Troup, C., Sarkans, U., Chervitz, S., Bernhart, D., Sherlock, G., Ball, C., Lepage, M., Swiatek, M., Marks, W.L., Goncalves, J., Markel, S., Iordan, D., Shojatalab, M., Pizarro, A., White, J., Hubley, R., Deutsch, E., Senger, M., Aronow, B.J., Robinson, A., Bassett, D., Stoeckert, C.J., Brazma, A.: Design and Implementation of Microarray Gene Expression Markup Language (MAGE-ML). Genome biology 3(9) (2002)
Rayner, T., Rocca-Serra, P., Spellman, P., Causton, H., Farne, A., Holloway, E., Irizarry, R., Liu, J., Maier, D., Miller, M., Petersen, K., Quackenbush, J., Sherlock, G., Stoeckert, C., White, J., Whetzel, P., Wymore, F., Parkinson, H., Sarkans, U., Ball, C., Brazma, A.: A Simple spreadsheet-based, MIAME-supportive Format for Microarray Data: MAGETAB. BMC Bioinformatics 7, 489 (2006)
Rayner, T.F., Rezwan, F.I., Lukk, M., Bradley, X.Z., Farne, A., Holloway, E., Malone, J., Williams, E., Parkinson, H.: Magetabulator, a suite of tools to support the microarray data format mage-tab. Bioinformatics 25(2), 279–280 (2009)
MINiML, MIAME Notation in Markup Language (2009), http://www.ncbi.nlm.nih.gov/geo/info/MINiML.html
Brazma, A., Parkinson, H., Sarkans, U., Shojatalab, M., Vilo, J., Abeygunawardena, N., Holloway, E., Kapushesky, M., Kemmeren, P., Lara, G.G., Oezcimen, A., Rocca-Serra, P., Sansone, S.A.: ArrayExpress: a Public Repository for Microarray Gene Expression Data at the EBI. Nucleic Acids Research 31(1), 68–71 (2003)
Bian, X., Klemm, J., Basu, A., Hadfield, J., Srinivasa, R., Parnell, T., Miller, S., Mason, W., Kokotov, D., Duncan, M., Duvall, P., Gurses, L., Boal, T., Misquitta, L., Swan, D., Wysong, R., Klink, A., Johnson, A., Fontenay, G., Liu, J., Colbert, M., Komatsoulis, G.: Data Submission and Curation for caArray, a Standard Based Microarray Data Repository System. In: Nature Proceedings (2009)
Ikeo, K., Ishi-i, J., Tamura, T., Gojobori, T., Tateno, Y.: CIBEX: Center for Information Biology gene EXpression database. Comptes Rendus Biologies 326(10-11), 1079–1082 (2003)
Demeter, J., Beauheim, C., Gollub, J., Hernandez-Boussard, T., Jin, H., Maier, D., Matese, J.C., Nitzberg, M., Wymore, F., Zachariah, Z.K., Brown, P.O., Sherlock, G., Ball, C.A.: The Stanford Microarray Database: Implementation of New Analysis Tools and Open Source Release of Software. Nucleic Acids Research 35(Database issue) (2007)
Gardiner-Garden, M., Littlejohn, T.: A comparison of microarray databases. Briefings in Bioinformatics 2(2), 143–158 (2001)
Do, H.H., Kirsten, T., Rahm, E.: Comparative Evaluation of Microarray-based Gene Expression Databases. In: BTW, pp. 482–501 (2003)
Canales, R.D., Luo, Y., Willey, J.C., Austermiller, B., Barbacioru, C.C., Boysen, C., Hunkapiller, K., Jensen, R.V., Knight, C.R., Lee, K.Y., Ma, Y., Maqsodi, B., Papallo, A., Peters, E.H., Poulter, K., Ruppel, P.L., Samaha, R.R., Shi, L., Yang, W., Zhang, L., Goodsaid, F.M.: Evaluation of dna microarray results with quantitative gene expression platforms. Nature Biotechnology 24(9), 1115–1122 (2006)
Faith, J.J., Driscoll, M.E., Fusaro, V.A., Cosgrove, E.J., Hayete, B., Juhn, F.S., Schneider, S.J., Gardner, T.S.: Many microbe microarrays database: uniformly normalized affymetrix compendia with structured experimental metadata. Nucl. Acids Res. (2007), gkm815+
Zeef, L.: Getting the most value out of Affymetrix array experiments (2006), http://nebc.nox.ac.uk/workshops/mqwshop2006.html
Allison, D.B., Cui, X., Page, G.P., Sabripour, M.: Microarray Data Analysis: From Disarray to Consolidation and Consensus. Nature Reviews Genetics 7(1), 55–65 (2006)
Brettschneider, J., Collin, F., Bolstad, B.M., Speed, T.P.: Quality Assessment for Short Oligonucleotide Microarray Data. Technometrics 50(3), 241–264 (2008)
Coombes, K.R., Wang, J., Abruzzo, L.V.: Monitoring the Quality of Microarray Experiments. In: Volume Methods of Microarray Data Analysis III of Biomedical and Life Sciences, pp. 25–40. Springer, US (2003)
Müller, H., Naumann, F.: Data quality in genome databases. In: IQ, pp. 269–284 (2003)
Martinez, A., Hammer, J.: Making Quality Count in Biological Data Sources. In: IQIS 2005: Proceedings of the 2nd international workshop on Information quality in information systems, pp. 16–27. ACM, New York (2005)
Missier, P., Embury, S.M., Greenwood, M., Preece, A.D., Jin, B.: Managing Information Quality in E-science: the Qurator Workbench. In: SIGMOD Conference, pp. 1150–1152 (2007)
Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for Data Quality Assessment and Improvement. ACM Comput. Surv. 41(3), 1–52 (2009)
Madnick, S.E., Wang, R.Y., Lee, Y.W., Zhu, H.: Overview and Framework for Data and Information Quality Research. J. Data and Information Quality 1(1), 1–22 (2009)
Goldman, R., Widom, J.: DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In: Jarke, M., Carey, M.J., Dittrich, K.R., Lochovsky, F.H., Loucopoulos, P., Jeusfeld, M.A. (eds.) VLDB 1997, Proceedings of 23rd International Conference on Very Large Data Bases, Athens, Greece, August 25-29, pp. 436–445. Morgan Kaufmann, San Francisco (1997)
Gray, J., Liu, D., Santisteban, M., Szalay, A., DeWitt, D., Heber, G.: Scientific Data Management in the Coming Decade. SIGMOD Rec. 34(4), 34–41 (2005)
Consens, M.P., Rizzolo, F., Vaisman, A.A.: AxPRE Summaries: Exploring the (Semi-) Structure of XML Web Collections. In: ICDE, pp. 1519–1521 (2008)
Ali, M.S., Consens, M.P., Khatchadourian, S., Rizzolo, F.: DescribeX: Interacting with AxPRE Summaries. In: ICDE, pp. 1540–1543 (2008)
Samavi, R., Consens, M., Khatchadourian, S., Topaloglou, T.: Exploring PSI-MI XML Collections Using DescribeX. Journal of Integrative Bioinformatics 4(3), 70 (2007)
Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, C.A., Causton, H.C., Gaasterland, T., Glenisson, P., Holstege, F.C., Kim, I.F., Markowitz, V., Matese, J.C., Parkinson, H., Robinson, A., Sarkans, U., Schulze-Kremer, S., Stewart, J., Taylor, R., Vilo, J., Vingron, M.: Minimum information about a microarray experiment (miame)-toward standards for microarray data. Nature Genetics 29(4), 365–371 (2001)
Barrett, T., Troup, D.B., Wilhite, S.E., Ledoux, P., Rudnev, D., Evangelista, C., Kim, I.F., Soboleva, A., Tomashevsky, M., Edgar, R.: NCBI GEO: Mining Tens of Millions of Expression Profiles–Database and Tools Update. Nucleic Acids Res. 35(Database issue) (2007)
Ali, M., Consens, M., Rizzolo, F.: Visualizing Structural Patterns in Web Collections. In: WWW (2007)
Bex, G., Neven, F., Van den Bussche, J.: DTDs Versus XML Schema: A Practical Study. In: WebDB, pp. 79–84 (2004)
Whetzel, P.L., Parkinson, H., Causton, H.C., Fan, L., Fostel, J., Fragoso, G., Game, L., Heiskanen, M., Morrison, N., Rocca-Serra, P., Sansone, S.A., Taylor, C., White, J., Stoeckert, C.J.: The MGED Ontology: a Resource for Semantics-based Description of Microarray Experiments. Bioinformatics 22(7), 866–873 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Etcheverry, L., Khatchadourian, S., Consens, M. (2010). Quality Assessment of MAGE-ML Genomic Datasets Using DescribeX. In: Lambrix, P., Kemp, G. (eds) Data Integration in the Life Sciences. DILS 2010. Lecture Notes in Computer Science(), vol 6254. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15120-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-15120-0_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15119-4
Online ISBN: 978-3-642-15120-0
eBook Packages: Computer ScienceComputer Science (R0)