Skip to main content

Graph-Based Semi-supervised Feature Selection with Application to Automatic Spam Image Identification

  • Conference paper
Computer Science for Environmental Engineering and EcoInformatics (CSEEE 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 159))

Abstract

In this paper, we propose a new spectral semi-supervised feature selection criterion called s-Laplacian score. It identifies discriminate features by measuring their capability of preserving both local and global geometrical structure. To address the limitation for spectral feature selection which cannot handle redundant features, we define Classification Information Gain degree (CIG) to measure redundant features. Based on s-Laplacian and CIG, we propose a graph-based semi-supervised feature selection algorithm (GSFS). The experimental results on real-world image dataset for automatic spam image identification problem show that GSFS can do well in utilizing small labeled samples and a large amount unlabeled data to select discriminate features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhu, X.: Semi-supervised Learning Literature Survey. Technical Report 1530, Computer Sciences, University of Wisconsin, Madison (2005)

    Google Scholar 

  2. Quinzán, I., Sotoca, J.M., Pla, F.: Clustering-based Feature Selection in Semi-supervised Problems. In: 9th international conference on Intelligent Systems Design and Applications, pp. 535–540. IEEE Press, Pisa (2009)

    Google Scholar 

  3. Zhao, Z., Liu, H.: Semi-supervised Feature Selection via Sepectral Analysis. In: 7th SIAM International Conference on Data Ming, pp. 641–646. SIAM, Minnesota (2007)

    Google Scholar 

  4. Yeung, D., Wang, J., Ng, W.: IPIC Separability Ratio for Semi-Supervised Feature Selection. In: 8th International Conf on Machine Learning and Cybernetics, pp. 399–403. IEEE Press, Baoding (2009)

    Google Scholar 

  5. He, X., Cai, D., Niyogi, P.: Laplacian Score for Feature Selection. In: Advances in Neural Information Processing System, pp. 507–514. MIT Press, Vancouverm (2005)

    Google Scholar 

  6. Duangsoithong, R.: Relevant and Redundant Feature Analysis with Ensemble Classification. In: 7th International Conference on Advances in Pattern Recognition, pp. 247–250. IEEE Press, Kolkata (2009)

    Google Scholar 

  7. Cheng, H., Qin, Z., Wang, Y., Li, F.: Conditional Mutual Information Based Feature Selection Analyzing for Synergy and Redundancy. ETRI Journal 33(2), 210–218 (2011)

    Article  Google Scholar 

  8. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley & Sons, New York (2001)

    MATH  Google Scholar 

  9. Dredze, M., Gevaryahu, R., Bachrach, A.E.: Learning Fast Classifiers for Image Spam. In: 4th Conference on Email and Anti-Spam, California (2007)

    Google Scholar 

  10. Cheng, H., Qin, Z., Liu, Q., Wan, M.: Spam Image Discrimination using Support Vector Machine Based on Higher-Order Local Autocorrelation Feature Extraction. In: IEEE International Conferences on CIS and RAM, pp. 1017–1021. IEEE Press, Chengdu (2008)

    Google Scholar 

  11. Witten, I.H., Frank, E.: Data Mining - Practical Machine Learning Tools and Techniques with JAVA Implementations, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cheng, H., Deng, W., Fu, C., Wang, Y., Qin, Z. (2011). Graph-Based Semi-supervised Feature Selection with Application to Automatic Spam Image Identification. In: Yu, Y., Yu, Z., Zhao, J. (eds) Computer Science for Environmental Engineering and EcoInformatics. CSEEE 2011. Communications in Computer and Information Science, vol 159. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22691-5_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22691-5_45

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22690-8

  • Online ISBN: 978-3-642-22691-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics