Inferring Meta-covariates in Classification

  • Keith Harris
  • Lisa McMillan
  • Mark Girolami
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5780)


This paper develops an alternative method for gene selection that combines model based clustering and binary classification. By averaging the covariates within the clusters obtained from model based clustering, we define “meta-covariates” and use them to build a probit regression model, thereby selecting clusters of similarly behaving genes, aiding interpretation. This simultaneous learning task is accomplished by an EM algorithm that optimises a single likelihood function which rewards good performance at both classification and clustering. We explore the performance of our methodology on a well known leukaemia dataset and use the Gene Ontology to interpret our results.


Gene selection clustering classification EM algorithm Gene Ontology 


  1. 1.
    Lee, K.E., Sha, N., Dougherty, E.R., Vannucci, M., Mallick, B.K.: Gene selection: a Bayesian variable selection approach. Bioinformatics 19(1), 90–97 (2003)CrossRefPubMedGoogle Scholar
  2. 2.
    Bae, K., Mallick, B.K.: Gene selection using a two-level hierarchical Bayesian model. Bioinformatics 20(18), 3423–3430 (2004)CrossRefPubMedGoogle Scholar
  3. 3.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)Google Scholar
  4. 4.
    Fraley, C., Raftery, A.E.: Bayesian regularization for normal mixture estimation and model-based clustering. Journal of Classification 24(2), 155–181 (2007)CrossRefGoogle Scholar
  5. 5.
    Hanczar, B., Courtine, M., Benis, A., Henegar, C., Clément, K., Zucker, J.D.: Improving classification of microarray data using prototype-based feature selection. SIGKDD Explorations 5(2), 23–30 (2003)CrossRefGoogle Scholar
  6. 6.
    Park, M.Y., Hastie, T., Tibshirani, R.: Averaged gene expressions for regression. Biostatistics 8(2), 212–227 (2007)CrossRefPubMedGoogle Scholar
  7. 7.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRefPubMedGoogle Scholar
  8. 8.
    Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97(457), 77–87 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Keith Harris
    • 1
  • Lisa McMillan
    • 1
  • Mark Girolami
    • 1
  1. 1.Inference Group, Department of Computing ScienceUniversity of GlasgowUK

Personalised recommendations