Feature Selection Using Counting Grids: Application to Microarray Data

  • Pietro Lovato
  • Manuele Bicego
  • Marco Cristani
  • Nebojsa Jojic
  • Alessandro Perina
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7626)

Abstract

In this paper a novel feature selection scheme is proposed, which exploits the potentialities of a recent probabilistic generative model, the Counting Grid. This model is able to cluster together similar observations, highlighting the compactness of a class and its underlying structure. The proposed feature selection scheme is applied to the expression microarray scenario, a peculiar context with very few patterns and a huge number of features. Experiments on benchmark datasets show that the proposed approach is effective and stable, assessing state-of-the-art classification accuracies.

Keywords

feature selection gene selection generative models 

References

  1. 1.
    Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. John Wiley & Sons (2001)Google Scholar
  2. 2.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)MATHGoogle Scholar
  3. 3.
    Saeys, Y., Inza, I., Larraaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRefGoogle Scholar
  4. 4.
    Thomas, J., Olson, J., Tapscott, S., Zhao, L.: An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Research 11, 1227–1236 (2001)CrossRefGoogle Scholar
  5. 5.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)MATHCrossRefGoogle Scholar
  6. 6.
    Li, T., Zhang, C., Ogihara, M.: A comprehensive study on feature selection and multiclass classification methods for tissue classifcation based on gene expression. Bioinformatics 20, 2429–2437 (2004)CrossRefGoogle Scholar
  7. 7.
    Abeel, T., Helleputte, T., de Peer, Y.V., Dupont, P., Saeys, Y.: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26, 392–398 (2010)CrossRefGoogle Scholar
  8. 8.
    Yu, L., Han, Y., Berens, M.: Stable gene selection from microarray data via sample weighting. IEEE Transaction on Computational Biology and Bionformatics 9, 262–272 (2012)CrossRefGoogle Scholar
  9. 9.
    Jojic, N., Perina, A.: Multidimensional counting grids: Inferring word order from disordered bags of words. In: Uncertainty in Artificial Intelligence (2011)Google Scholar
  10. 10.
    Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)MATHGoogle Scholar
  11. 11.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATHGoogle Scholar
  12. 12.
    Rogers, S., Girolami, M., Campbell, C., Breitling, R.: The latent process decomposition of cdna microarray datasets. IEEE/ACM Transactions on Computational Biology and Bioinformatics (2005)Google Scholar
  13. 13.
    Bicego, M., Lovato, P., Oliboni, B., Perina, A.: Expression microarray classification using topic models. In: SAC, pp. 1516–1520 (2010)Google Scholar
  14. 14.
    Perina, A., Lovato, P., Cristani, M., Bicego, M.: A Comparison on Score Spaces for Expression Microarray Data Classification. In: Loog, M., Wessels, L., Reinders, M.J.T., de Ridder, D. (eds.) PRIB 2011. LNCS, vol. 7036, pp. 202–213. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  15. 15.
    Singh, D., Febbo, P., Ross, K., Jackson, D., Manola, J., Ladd, C., Tamayo, P., Renshaw, A., D’Amico, A., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 98, 203–209 (2002)CrossRefGoogle Scholar
  16. 16.
    Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96, 6745–6750 (1999)CrossRefGoogle Scholar
  17. 17.
    Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)MATHGoogle Scholar
  18. 18.
    Kuncheva, L.: A stability index for feature selection. In: IASTED International Multi-Conference Artificial Intelligence and Applications, pp. 390–395 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Pietro Lovato
    • 1
  • Manuele Bicego
    • 1
  • Marco Cristani
    • 1
  • Nebojsa Jojic
    • 2
  • Alessandro Perina
    • 2
  1. 1.Computer Science DepartmentUniversity of VeronaItaly
  2. 2.Microsoft ResearchUSA

Personalised recommendations