Abstract
With widespread use of microarray technology as a potential diagnostics tool, the comparison of results obtained from the use of different platforms is of interest. When inference methods are designed using data collected using a particular platform, they are unlikely to work directly on measurements taken from a different type of array. We report on this cross-platform transfer problem, and show that working with transcriptome representations at binary numerical precision, similar to the gene expression bar code method, helps circumvent the variability across platforms in several cancer classification tasks. We compare our approach with a recent machine learning method specifically designed for shifting distributions, i.e., problems in which the training and testing data are not drawn from identical probability distributions, and show superior performance in three of the four problems in which we could directly compare.
Chapter PDF
Similar content being viewed by others
References
Brown, M.P.S., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T.S., Ares Jr., M., Haussler, D.: Knowledge-based analysis of microarray gene expression data by using support vector machines. PNAS 97(1), 262–267 (2000)
Tomayko, M.M., Anderson, S.M., Brayton, C.E., Sadanand, S., Steinel, N.C., Behrens, T.W., Shlomchik, M.J.: Systematic Comparison of Gene Expression between Murine Memory and Naive B Cells Demonstrates That Memory B Cells Have Unique Signaling Capabilities. J. Immunol. 181(1), 27 (2008)
MAQC consortium, The MicroArray Quality Control (MAQC) project shows inter-and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161 (2006)
Draghici, S., Khatri, P., Eklund, A.C., Szallasi, Z.: Reliability and reproducibility issues in DNA microarray measurements. Trends Genet. 22, 101–109 (2006)
Kuo, W.P., Jenssen, T.K., Butte, A.J., Ohno-Machado, L., Kohane, I.S.: Analysis of matched mRNA measurements from two different microarray technologies. Bioinformatics 18(3), 405–412 (2002)
Tuna, S., Niranjan, M.: Inference from low precision transcriptome data representation. Journal of Signal Processing Systems (April 22, 2009), doi:10.1007/s11265-009-0363-2
Tanimoto, T.T.: IBM Internal Report, An elementary mathematical theory of classification and prediction (1958)
Tuna, S., Niranjan, M.: Classification with binary gene expressions. Journal of Biomedical Sciences and Engineering (in press, 2009)
Zilliox, M.J., Irizarry, R.A.: A gene expression bar code for microarray data. Nat. Met. 4(11), 911–913 (2007)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley & Sons, USA (2001)
Shmulevich, I., Zhang, W.: Binary analysis and optimization-based normalization of gene expression data. Bioinformatics 18(4), 555–565 (2002)
Warnat, P., Eils, R., Brors, B.: Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes. BMC Bioinformatics 6, 265 (2005)
Gretton, A., Smola, A., Huang, J., Schmittfull, M., Borgwardt, K., Scholkopf, B.: Covariate shift by kernel mean matching. In: Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D. (eds.) Dataset shift in machine learning, pp. 131–160. Springer/The MIT Press, London (2009)
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and Unsupervised Discretization of Continuous Features. In: International Conference on Machine Learning, pp. 194–202 (1995)
Zhou, X., Wang, X., Dougherty, E.R.: Binarization of microarray data on the basis of a mixture model. Mol. Cancer Ther. 2(7), 679–684 (2003)
Friedman, N., Linial, M., Nachman, I., Pe’er, D.: Using Bayesian networks to analyze expression data. J. Comput. Biol. 7(3-4), 601–620 (2000)
Brazma, A., Jonassen, I., Vilo, J., Ukkonen, E.: Predicting Gene Regulatory Elements in Silico on a Genomic Scale. Genome Res. 8(11), 1202–1215 (1998)
Swamidass, S.J., Chen, J., Bruand, J., Phung, P., Ralaivola, L., Baldi, P.: Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics 21(suppl. 1), i359–i368 (2005)
Trotter, M.W.B.: Support vector machines for drug discovery. Ph.D. thesis, University College London, UK (2006)
Gunn, S.R.: Support vector machines for classification and regression, Technical Report, University of Southampton (1997), http://www.isis.ecs.soton.ac.uk/isystems/kernel/
Milo, M., Fazeli, A., Niranjan, M., Lawrence, N.D.: A probabilistic model for the extraction of expression levels from oligonucleotide arrays. Biochem. Soc. Trans. 31(Pt 6), 1510–1512 (2003)
Rattray, M., Liu, X., Sanguinetti, G., Milo, M., Lawrence, N.D.: Propagating uncertainty in microarray data analysis. Brief Bioinform. 7(1), 37–47 (2006)
Sanguinetti, G., Milo, M., Rattray, M., Lawrence, N.D.: Accounting for probe-level noise in principal component analysis of microarray data. Bioinformatics 21(19), 3748–3754 (2005)
Liu, X., Lin, K., Andersen, B., Rattray, M.: Including probe-level uncertainty in model-based gene expression clustering. BMC Bioinformatics 8(1), 98 (2007)
West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson Jr., J.A., Marks, J.R., Nevins, J.R.: Predicting the clinical status of human breast cancer by using gene expression profiles. PNAS 98(20), 11462–11467 (2001)
Gruvberger, S., Ringnér, M., Chen, Y., Panavally, S., Saal, L.H., Borg, A., Ferno, M., Peterson, C., Meltzer, P.S.: Estrogen Receptor Status in Breast Cancer Is Associated with Remarkably Distinct Gene Expression Patterns. Cancer Res. 61(16), 5979–5984 (2001)
Welsh, J.B., Sapinoso, L.M., Su, A.I., Kern, S.G., Wang-Rodriguez, J., Moskaluk, C.A., Frierson, H.F., Hampton, G.M.: Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. Cancer Res. 61(16), 5974–5978 (2001)
Dhanasekaran, S.M., Barrette, T.R., Ghosh, D., Shah, R., Varambally, S., Kurachi, K., Pienta, K.J., Rubin, M.A., Chinnaiyan, A.M.: Delineation of prognostic biomarkers in prostate cancer. Nature 412(6849), 822–826 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tuna, S., Niranjan, M. (2009). Cross-Platform Analysis with Binarized Gene Expression Data. In: Kadirkamanathan, V., Sanguinetti, G., Girolami, M., Niranjan, M., Noirel, J. (eds) Pattern Recognition in Bioinformatics. PRIB 2009. Lecture Notes in Computer Science(), vol 5780. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04031-3_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-04031-3_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04030-6
Online ISBN: 978-3-642-04031-3
eBook Packages: Computer ScienceComputer Science (R0)