Abstract
In this paper an empirical evaluation of different generative scores for expression microarray data classification is proposed. Score spaces represent a quite recent trend in the machine learning community, taking the best of both generative and discriminative classification paradigms. The scores are extracted from topic models, a class of highly interpretable probabilistic tools whose utility in the microarray classification context has been recently assessed. The experimental evaluation, performed on 3 literature datasets and with 7 score spaces, demonstrates the viability of the proposed scheme and, for the first time, it compares pros and cons of each space.
Chapter PDF
Similar content being viewed by others
Keywords
- Topic Model
- Latent Dirichlet Allocation
- Neural Information Processing System
- Expression Microarray Data
- Probabilistic Latent Semantic Analysis
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96(12), 6745–6750 (1999)
Bicego, M., Lovato, P., Oliboni, B., Perina, A.: Expression microarray classification using topic models. In: ACM SAC - Bioinformatics track (2010)
Bicego, M., Lovato, P., Ferrarini, A., Delledonne, M.: Biclustering of expression microarray data with topic models. In: Proc. of International Conference on Pattern Recognition (2010)
Bishop, C., Lasserre, J.: Generative or discriminative? getting the best of both worlds. Bayesian Statistics 8, 3–24 (2007)
Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Bosch, A., Zisserman, A., Muñoz, X.: Scene classification via pLSA. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 517–530. Springer, Heidelberg (2006)
Brändle, N., Bischof, H., Lapp, H.: Robust DNA microarray image analysis. Machine Vision and Applications 15, 11–28 (2003)
Castellani, U., Perina, A., Murino, V., Bellani, M., Rambaldelli, G., Tansella, M., Brambilla, P.: Brain morphometry by probabilistic latent semantic analysis. In: Jiang, T., Navab, N., Pluim, J.P.W., Viergever, M.A. (eds.) MICCAI 2010. LNCS, vol. 6362, pp. 177–184. Springer, Heidelberg (2010)
de Souto, M., Costa, I., de Araujo, D., Ludermir, T., Schliep, A.: Clustering cancer gene expression data: A comparative study. BMC Bioinformatics 9 (2008)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B 39, 1–38 (1977)
Dhanasekaran, S., Barrette, T., Ghosh, D., Shah, R., Varambally, S., Kurachi, K., Pienta, K., Rubin, M., Chinnaiya, A.: Delineation of prognostic biomarkers in prostate cancer. Nature 412(6849), 822–826 (2001)
Frey, B.J., Jojic, N.: A comparison of algorithms for inference and learning in probabilistic graphical models. IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (2005)
Gammerman, A., Vovk, V., Vapnik, V.: Learning by transduction. In: Proc. of Uncertainty in Artificial Intelligence (1998)
Hofmann, T.: Learning the similarity of documents: An information-geometric approach to document retrieval and categorization. In: Adv. in Neural Information Processing Systems (1999)
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42, 177–196 (2001)
Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Adv. in Neural Information Processing Systems (1998)
Lee, J., Lee, J., Park, M., Song, S.: An extensive comparison of recent classification tools applied to microarray data. Computational Statistics & Data Analysis 48(4), 869–885 (2005)
Li, X., Lee, T.S., Liu, Y.: Hybrid generative-discriminative classification using posterior divergence. In: Proc. of Conference on Computer Vision and Pattern Recognition (2011)
Ng, A., Jordan, M.: On discriminative vs generative classifiers: A comparison of logistic regression and naive Bayes. In: Adv. in Neural Information Processing Systems (2002)
Perina, A., Cristani, M., Castellani, U., Murino, V., Jojic, N.: Free energy score space. In: Adv. in Neural Information Processing Systems (2009)
Perina, A., Cristani, M., Castellani, U., Murino, V., Jojic, N.: An hybrid generativediscriminative framework based on free energy terms. In: Proc. of the International Conference on Computer Vision (2009)
Perina, A., Lovato, P., Murino, V., Bicego, M.: Biologically-aware latent dirichlet allocation (balda) for the classification of expression microarray. Proc. of Pattern Recognition in Bioinformatics (2010)
Rao, C.R.: Diversity: Its Measurement, Decomposition, Apportionment and Analysis. Sankhy: The Indian Journal of Statistics, Series A 44(1), 1–22 (1982)
Rogers, S., Girolami, M., Campbell, C., Breitling, R.: The latent process decomposition of cdna microarray data sets. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2(2), 143–156 (2005)
Schwarz, G.: Estimating the dimension of a model. Annals of Statistics 6, 461–464 (1978)
Shipp, M., Ross, K.: Diffuse large b-cell lymphoma outcome prediction by gene expression profiling and supervised machine learning. Nature Medicine 8, 68–74 (2002)
Smith, N., Gales, M.: Speech recognition using svms. In: Adv. in Neural Information Processing Systems (2002)
Smith, N.D., Gales, M.J.F.: Using SVMs to Classify Variable Length Speech Patterns. Tech. rep., Cambridge University Engineering Dept. (2002)
Statnikov, A., Aliferis, C., Tsamardinos, I., Hardin, D., Levy, S.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5), 631–643 (2005)
Tsuda, K., Kawanabe, M., Rotsch, G., Sonnenburg, S., Mueller, K.R.: A new discriminative kernel from probabilistic models. In: Neural Computation. MIT Press (2001)
Valafar, F.: Pattern recognition techniques in microarray data analysis: A survey. Annals of the New York Academy of Sciences 980, 41–64 (2002)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Xing, D., Girolami, M.: Employing latent dirichlet allocation for fraud detection in telecommunications. Pattern Recogn. Lett. 28, 1727–1734 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Perina, A., Lovato, P., Cristani, M., Bicego, M. (2011). A Comparison on Score Spaces for Expression Microarray Data Classification. In: Loog, M., Wessels, L., Reinders, M.J.T., de Ridder, D. (eds) Pattern Recognition in Bioinformatics. PRIB 2011. Lecture Notes in Computer Science(), vol 7036. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24855-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-24855-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24854-2
Online ISBN: 978-3-642-24855-9
eBook Packages: Computer ScienceComputer Science (R0)