A Comparison on Score Spaces for Expression Microarray Data Classification

Perina, Alessandro; Lovato, Pietro; Cristani, Marco; Bicego, Manuele

doi:10.1007/978-3-642-24855-9_18

Alessandro Perina²¹,
Pietro Lovato²²,
Marco Cristani^22,23 &
…
Manuele Bicego²²

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7036))

Included in the following conference series:

IAPR International Conference on Pattern Recognition in Bioinformatics

1268 Accesses
4 Citations

Abstract

In this paper an empirical evaluation of different generative scores for expression microarray data classification is proposed. Score spaces represent a quite recent trend in the machine learning community, taking the best of both generative and discriminative classification paradigms. The scores are extracted from topic models, a class of highly interpretable probabilistic tools whose utility in the microarray classification context has been recently assessed. The experimental evaluation, performed on 3 literature datasets and with 7 score spaces, demonstrates the viability of the proposed scheme and, for the first time, it compares pros and cons of each space.

Download to read the full chapter text

Chapter PDF

An Insight on the ‘Large G, Small n’ Problem in Gene-Expression Microarray Classification

Statistical Analysis of Microarray Data

Comparison of Machine Learning Pipelines for Gene Expression Matrices

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96(12), 6745–6750 (1999)
Article Google Scholar
Bicego, M., Lovato, P., Oliboni, B., Perina, A.: Expression microarray classification using topic models. In: ACM SAC - Bioinformatics track (2010)
Google Scholar
Bicego, M., Lovato, P., Ferrarini, A., Delledonne, M.: Biclustering of expression microarray data with topic models. In: Proc. of International Conference on Pattern Recognition (2010)
Google Scholar
Bishop, C., Lasserre, J.: Generative or discriminative? getting the best of both worlds. Bayesian Statistics 8, 3–24 (2007)
MATH Google Scholar
Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Bosch, A., Zisserman, A., Muñoz, X.: Scene classification via pLSA. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 517–530. Springer, Heidelberg (2006)
Chapter Google Scholar
Brändle, N., Bischof, H., Lapp, H.: Robust DNA microarray image analysis. Machine Vision and Applications 15, 11–28 (2003)
Article Google Scholar
Castellani, U., Perina, A., Murino, V., Bellani, M., Rambaldelli, G., Tansella, M., Brambilla, P.: Brain morphometry by probabilistic latent semantic analysis. In: Jiang, T., Navab, N., Pluim, J.P.W., Viergever, M.A. (eds.) MICCAI 2010. LNCS, vol. 6362, pp. 177–184. Springer, Heidelberg (2010)
Chapter Google Scholar
de Souto, M., Costa, I., de Araujo, D., Ludermir, T., Schliep, A.: Clustering cancer gene expression data: A comparative study. BMC Bioinformatics 9 (2008)
Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B 39, 1–38 (1977)
MATH Google Scholar
Dhanasekaran, S., Barrette, T., Ghosh, D., Shah, R., Varambally, S., Kurachi, K., Pienta, K., Rubin, M., Chinnaiya, A.: Delineation of prognostic biomarkers in prostate cancer. Nature 412(6849), 822–826 (2001)
Article Google Scholar
Frey, B.J., Jojic, N.: A comparison of algorithms for inference and learning in probabilistic graphical models. IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (2005)
Google Scholar
Gammerman, A., Vovk, V., Vapnik, V.: Learning by transduction. In: Proc. of Uncertainty in Artificial Intelligence (1998)
Google Scholar
Hofmann, T.: Learning the similarity of documents: An information-geometric approach to document retrieval and categorization. In: Adv. in Neural Information Processing Systems (1999)
Google Scholar
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42, 177–196 (2001)
Article MATH Google Scholar
Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Adv. in Neural Information Processing Systems (1998)
Google Scholar
Lee, J., Lee, J., Park, M., Song, S.: An extensive comparison of recent classification tools applied to microarray data. Computational Statistics & Data Analysis 48(4), 869–885 (2005)
Article MATH Google Scholar
Li, X., Lee, T.S., Liu, Y.: Hybrid generative-discriminative classification using posterior divergence. In: Proc. of Conference on Computer Vision and Pattern Recognition (2011)
Google Scholar
Ng, A., Jordan, M.: On discriminative vs generative classifiers: A comparison of logistic regression and naive Bayes. In: Adv. in Neural Information Processing Systems (2002)
Google Scholar
Perina, A., Cristani, M., Castellani, U., Murino, V., Jojic, N.: Free energy score space. In: Adv. in Neural Information Processing Systems (2009)
Google Scholar
Perina, A., Cristani, M., Castellani, U., Murino, V., Jojic, N.: An hybrid generativediscriminative framework based on free energy terms. In: Proc. of the International Conference on Computer Vision (2009)
Google Scholar
Perina, A., Lovato, P., Murino, V., Bicego, M.: Biologically-aware latent dirichlet allocation (balda) for the classification of expression microarray. Proc. of Pattern Recognition in Bioinformatics (2010)
Google Scholar
Rao, C.R.: Diversity: Its Measurement, Decomposition, Apportionment and Analysis. Sankhy: The Indian Journal of Statistics, Series A 44(1), 1–22 (1982)
MATH Google Scholar
Rogers, S., Girolami, M., Campbell, C., Breitling, R.: The latent process decomposition of cdna microarray data sets. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2(2), 143–156 (2005)
Article Google Scholar
Schwarz, G.: Estimating the dimension of a model. Annals of Statistics 6, 461–464 (1978)
Article MATH Google Scholar
Shipp, M., Ross, K.: Diffuse large b-cell lymphoma outcome prediction by gene expression profiling and supervised machine learning. Nature Medicine 8, 68–74 (2002)
Article Google Scholar
Smith, N., Gales, M.: Speech recognition using svms. In: Adv. in Neural Information Processing Systems (2002)
Google Scholar
Smith, N.D., Gales, M.J.F.: Using SVMs to Classify Variable Length Speech Patterns. Tech. rep., Cambridge University Engineering Dept. (2002)
Google Scholar
Statnikov, A., Aliferis, C., Tsamardinos, I., Hardin, D., Levy, S.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5), 631–643 (2005)
Article Google Scholar
Tsuda, K., Kawanabe, M., Rotsch, G., Sonnenburg, S., Mueller, K.R.: A new discriminative kernel from probabilistic models. In: Neural Computation. MIT Press (2001)
Google Scholar
Valafar, F.: Pattern recognition techniques in microarray data analysis: A survey. Annals of the New York Academy of Sciences 980, 41–64 (2002)
Article Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Book MATH Google Scholar
Xing, D., Girolami, M.: Employing latent dirichlet allocation for fraud detection in telecommunications. Pattern Recogn. Lett. 28, 1727–1734 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, Redmond, USA
Alessandro Perina
Department of Computer Science, University of Verona, Verona, Italy
Pietro Lovato, Marco Cristani & Manuele Bicego
Italian Institute of Technology, Genoa, Italy
Marco Cristani

Authors

Alessandro Perina
View author publications
You can also search for this author in PubMed Google Scholar
Pietro Lovato
View author publications
You can also search for this author in PubMed Google Scholar
Marco Cristani
View author publications
You can also search for this author in PubMed Google Scholar
Manuele Bicego
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Pattern Recognition Laboratory, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands
Marco Loog , Marcel J. T. Reinders & Dick de Ridder , &
Netherlands Cancer Institute, Bioinformatics and Statistics, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
Lodewyk Wessels

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Perina, A., Lovato, P., Cristani, M., Bicego, M. (2011). A Comparison on Score Spaces for Expression Microarray Data Classification. In: Loog, M., Wessels, L., Reinders, M.J.T., de Ridder, D. (eds) Pattern Recognition in Bioinformatics. PRIB 2011. Lecture Notes in Computer Science(), vol 7036. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24855-9_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-24855-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24854-2
Online ISBN: 978-3-642-24855-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

A Comparison on Score Spaces for Expression Microarray Data Classification

Abstract

Chapter PDF

Similar content being viewed by others

An Insight on the ‘Large G, Small n’ Problem in Gene-Expression Microarray Classification

Statistical Analysis of Microarray Data

Comparison of Machine Learning Pipelines for Gene Expression Matrices

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

A Comparison on Score Spaces for Expression Microarray Data Classification

Abstract

Chapter PDF

Similar content being viewed by others

An Insight on the ‘Large G, Small n’ Problem in Gene-Expression Microarray Classification

Statistical Analysis of Microarray Data

Comparison of Machine Learning Pipelines for Gene Expression Matrices

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation