Advertisement

A Filter Feature Selection Method Based on MFA Score and Redundancy Excluding and It’s Application to Tumor Gene Expression Data Analysis

  • Jiangeng Li
  • Lei SuEmail author
  • Zenan Pang
Original Research Article

Abstract

Feature selection techniques have been widely applied to tumor gene expression data analysis in recent years. A filter feature selection method named marginal Fisher analysis score (MFA score) which is based on graph embedding has been proposed, and it has been widely used mainly because it is superior to Fisher score. Considering the heavy redundancy in gene expression data, we proposed a new filter feature selection technique in this paper. It is named MFA score+ and is based on MFA score and redundancy excluding. We applied it to an artificial dataset and eight tumor gene expression datasets to select important features and then used support vector machine as the classifier to classify the samples. Compared with MFA score, t test and Fisher score, it achieved higher classification accuracy.

Keywords

Filter feature selection MFA score+ Redundant features Tumor gene expression data 

Abbreviations

MFA

Marginal Fisher analysis

FDA

Fisher discriminant analysis

SVM

Support vector machine

MFA score+

Marginal Fisher analysis score and redundancy excluding

Notes

Acknowledgments

This work is supported by the Project for the National Key Technology R&D Program under Grant No. 2011BAC12B0304 and the Scientific Plan of Beijing Municipal Commission of Education under Grant No. JC002011200903.

References

  1. 1.
    Liu H, Li J, Wong L (2002) A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Inform 13:51–60PubMedGoogle Scholar
  2. 2.
    Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517CrossRefGoogle Scholar
  3. 3.
    He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: Proceedings of neural information processing systems, pp 505–512Google Scholar
  4. 4.
    He X, Cai D, Yan S, Zhang H (2005) Neighborhood preserving embedding. IEEE Int Conf Comput Vis 2:1208–1213Google Scholar
  5. 5.
    He X, Yan S, Hu Y, Niyogi P, Zhang H (2005) Face recognition using laplacianfaces. IEEE Trans Pattern Anal Mach Intell 27:328–340CrossRefGoogle Scholar
  6. 6.
    Xu D, Yan S, Tao D, Lin S, Zhang H (2007) Marginal Fisher analysis and its variants for human gait recognition and content-based image retrieval. IEEE Trans Image Process 16:2811–2821CrossRefGoogle Scholar
  7. 7.
    Yan S, Xu D, Zhang B, Zhang H (2005) Graph embedding: a general framework for dimensionality reduction. IEEE Intell Conf Comput Vis Pattern Recognit 2:830–837Google Scholar
  8. 8.
    Yan S, Xu D, Zhang L, Zhang B, Zhang H (2005) Coupled kernel-based subspace learning. Comput Soc Conf Comput Vis Pattern Recognit 1:645–650Google Scholar
  9. 9.
    Wei D, Li S, Tan M (2012) Graph embedding based feature selection. Neurocomputing 93:115–125CrossRefGoogle Scholar
  10. 10.
    Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224Google Scholar
  11. 11.
    Statnikov A, Tsamardinos I, Dosbayev Y, Aliferis CF (2005) GEMS: A system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. Int J Med Inform 74:491–503CrossRefGoogle Scholar
  12. 12.
    Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15:1373–1396CrossRefGoogle Scholar
  13. 13.
    Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156CrossRefGoogle Scholar
  14. 14.
    Devore J, Peck R (1997) Statistics: the exploration and analysis of data. Duxbury Press, Pacific GroveGoogle Scholar
  15. 15.
    Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New YorkGoogle Scholar
  16. 16.
    Fukunaga K, Mantock JM (1983) Nonparametric discriminant analysis. IEEE Trans Pattern Anal Mach Intell 5:671–678CrossRefGoogle Scholar
  17. 17.
    Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537CrossRefGoogle Scholar
  18. 18.
    Kohavi R, John G (1997) Wrappers for feature subset selection. Artif Intell 97:273–324CrossRefGoogle Scholar
  19. 19.
    Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326CrossRefGoogle Scholar
  20. 20.
    The Y, Roweis S (2002) Automatic alignment of hidden representations. Adv Neural Inf Process Syst 15:841–848Google Scholar
  21. 21.
    Tenenbaum JB, Silva VD, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323CrossRefGoogle Scholar
  22. 22.
    Yan S, Zhang H, Hu Y, Zhang B, Cheng Q (2004) Discriminant analysis on embedded manifold. Process Eighth Eur Conf Comput Vis 1:121–132Google Scholar
  23. 23.
    Ye J, Janardan R, Li Q (2005) Two-dimensional linear discriminant analysis. Adv Neural Inf Process Syst 17:1569–1576Google Scholar
  24. 24.
    Yu H, Yang J (2001) A direct LDA algorithm for high dimensional data-with application to face recognition. Pattern Recognit 34:2067–2070CrossRefGoogle Scholar
  25. 25.
    Zhao J, Lu K, He X (2008) Locality sensitive semi-supervised feature selection. Neurocomputing 71:1842–1849CrossRefGoogle Scholar

Copyright information

© International Association of Scientists in the Interdisciplinary Areas and Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Institute of Artificial Intelligence and Robotics, College of Electronic Information & Control EngineeringBeijing University of TechnologyBeijingChina
  2. 2.Beijing Key Laboratory of Computational Intelligence and Intelligent SystemBeijingChina

Personalised recommendations