Abstract
In this paper, we address the problem of semi-supervised feature selection from high-dimensional data. It aims to select the most discriminative and informative features for data analysis. This is a recent addressed challenge in feature selection research when dealing with small labeled data sampled with large unlabeled data in the same set. We present a filter based approach by constraining the known Laplacian score. We evaluate the relevance of a feature according to its locality preserving and constraints preserving ability. The problem is then presented in the spectral graph theory framework with a study of the complexity of the proposed algorithm. Finally, experimental results will be provided for validating our proposal in comparison with other known feature selection methods.
Chapter PDF
Similar content being viewed by others
References
Jain, A., Zongker, D.: Feature selection: Evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(2), 153–158 (1997)
Frank, A., Asuncion, A.: Uci machine learning repository. Technical report, University of California (2010)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research (3), 1157–1182 (2003)
Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the Twentieth International Conference on Machine Learning (2003)
Kohavi, R., John, G.: Wrappers for feature subset selection. Artificial Intelligence 97(12), 273–324 (1997)
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by local linear embedding. Science (290), 2323–2326 (2000)
Dash, M., Liu, H.: Feature selection for classification. Intelligent Data Analysis 1(3), 131–156 (2000)
Dy, J., Brodley., C.E.: Feature selection for unsupervised learning. Journal of Machine Learning Research (5), 845–889 (2004)
Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning. The MIT Press, Cambridge (2006)
Zhao, Z., Liu, H.: Semi-supervised feature selection via spectral analysis. In: Proceedings of SIAM International Conference on Data Mining (SDM), pp. 641–646 (2007)
Chung, F.: Spectral graph theory. AMS, Providence (1997)
Ren, J., Qiu, Z., Fan, W., Cheng, H., Yu, P.S.: Forward semi-supervised feature selection. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 970–976. Springer, Heidelberg (2008)
Basu, S., Davidson, I., Wagstaff, K.: Constrained clustering: Advances in algorithms, theory and applications. Chapman and Hall/CRC Data Mining and Knowledge Discovery Series (2008)
Xing, E., Ng, A., Jordan, M., Russel, S.: Distance metric learning, with application to clustering with side-information. In: Advances in Neural Information Processing Systems, vol. 15, pp. 505–512 (2003)
Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning a mahalanobis metric from equivalence constraints. Journal of Machine Learning Research 6, 937–965 (2005)
Zhang, D., Zhou, Z., Chen, S.: Semi-supervised dimensionality reduction. In: Proceedings of SIAM International Conference on Data Mining, SDM (2007)
Zhang, D., Chen, S., Zhou, Z.: Constraint score: A new filter method for feature selection with pairwise constraints. Pattern Recognition 41(5), 1440–1451 (2008)
Sun, D., Zhan, D.: Bagging constraint score for feature selection with pairwise constraints. Pattern Recognition 43(6), 2106–2118 (2010)
Kalakech, M., Biela, P., Macaire, L., Hamad, D.: Constraint scores for semi-supervised feature selection: A comparative study. Pattern Recognition Letters 32(5), 656–665 (2011)
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems, vol. 17 (2005)
Robnik-Sikonja, M., Kononenko, I.: Theoretical and empirical analysis of relief and relieff. Machine Learning 53, 23–69 (2003)
Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the Twenty Fourth International Conference on Machine Learning (2007)
Kohonen, T.: Self Organizing Map. Springer, Berlin (2001)
Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, L., Caligiuri, M., Bloomfield, C., Lander, E.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 15 286(5439), 531–537 (1999)
Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Natl. Acad. Sci. 96(12), 6745–6750 (1999)
Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)
Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley-Interscience, Hoboken (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Benabdeslem, K., Hindawi, M. (2011). Constrained Laplacian Score for Semi-supervised Feature Selection. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23780-5_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-23780-5_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23779-9
Online ISBN: 978-3-642-23780-5
eBook Packages: Computer ScienceComputer Science (R0)