Abstract
In this paper, we propose a new spectral semi-supervised feature selection criterion called s-Laplacian score. It identifies discriminate features by measuring their capability of preserving both local and global geometrical structure. To address the limitation for spectral feature selection which cannot handle redundant features, we define Classification Information Gain degree (CIG) to measure redundant features. Based on s-Laplacian and CIG, we propose a graph-based semi-supervised feature selection algorithm (GSFS). The experimental results on real-world image dataset for automatic spam image identification problem show that GSFS can do well in utilizing small labeled samples and a large amount unlabeled data to select discriminate features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zhu, X.: Semi-supervised Learning Literature Survey. Technical Report 1530, Computer Sciences, University of Wisconsin, Madison (2005)
Quinzán, I., Sotoca, J.M., Pla, F.: Clustering-based Feature Selection in Semi-supervised Problems. In: 9th international conference on Intelligent Systems Design and Applications, pp. 535–540. IEEE Press, Pisa (2009)
Zhao, Z., Liu, H.: Semi-supervised Feature Selection via Sepectral Analysis. In: 7th SIAM International Conference on Data Ming, pp. 641–646. SIAM, Minnesota (2007)
Yeung, D., Wang, J., Ng, W.: IPIC Separability Ratio for Semi-Supervised Feature Selection. In: 8th International Conf on Machine Learning and Cybernetics, pp. 399–403. IEEE Press, Baoding (2009)
He, X., Cai, D., Niyogi, P.: Laplacian Score for Feature Selection. In: Advances in Neural Information Processing System, pp. 507–514. MIT Press, Vancouverm (2005)
Duangsoithong, R.: Relevant and Redundant Feature Analysis with Ensemble Classification. In: 7th International Conference on Advances in Pattern Recognition, pp. 247–250. IEEE Press, Kolkata (2009)
Cheng, H., Qin, Z., Wang, Y., Li, F.: Conditional Mutual Information Based Feature Selection Analyzing for Synergy and Redundancy. ETRI Journal 33(2), 210–218 (2011)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley & Sons, New York (2001)
Dredze, M., Gevaryahu, R., Bachrach, A.E.: Learning Fast Classifiers for Image Spam. In: 4th Conference on Email and Anti-Spam, California (2007)
Cheng, H., Qin, Z., Liu, Q., Wan, M.: Spam Image Discrimination using Support Vector Machine Based on Higher-Order Local Autocorrelation Feature Extraction. In: IEEE International Conferences on CIS and RAM, pp. 1017–1021. IEEE Press, Chengdu (2008)
Witten, I.H., Frank, E.: Data Mining - Practical Machine Learning Tools and Techniques with JAVA Implementations, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cheng, H., Deng, W., Fu, C., Wang, Y., Qin, Z. (2011). Graph-Based Semi-supervised Feature Selection with Application to Automatic Spam Image Identification. In: Yu, Y., Yu, Z., Zhao, J. (eds) Computer Science for Environmental Engineering and EcoInformatics. CSEEE 2011. Communications in Computer and Information Science, vol 159. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22691-5_45
Download citation
DOI: https://doi.org/10.1007/978-3-642-22691-5_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22690-8
Online ISBN: 978-3-642-22691-5
eBook Packages: Computer ScienceComputer Science (R0)