Unsupervised Feature Selection for Spherical Data Modeling: Application to Image-Based Spam Filtering

  • Ola Amayri
  • Nizar Bouguila
Part of the Communications in Computer and Information Science book series (CCIS, volume 287)


Understanding the relevance of extracted features in domain-specific sense is a matter at the heart of image classification. In this paper, we propose a feature selection framework that allows more compactness of the statistical model while holding good generalization to unseen data. Both feature selection and clustering are based on well-established statistical models that provide natural choice when the data to model are spherical. Moreover, we develop a probabilistic kernel based on Fisher score and mixture of von Mises model (moVM) to feed Support Vector Machines (SVM). The selection process evaluates the relevance of features through a principled feature saliency approach. The unsupervised learning is approached using Expectation Maximization (EM) for parameter estimation along with Minimum Message Length (MML) to determine the optimal number of mixture components. We argue that the proposed framework is well-justified and can be adjusted to different problems. Experimental results involving the challenging problem of image-based spam filtering show the merits of the proposed approach.


Von Mises mixture feature selection minimum message length Support Vector Machines Fisher score image-based spam 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Akaike, H.: A new look at the statistical model identification. IEEE Transaction on Automatic Control 19(6), 716–723 (1974)MathSciNetzbMATHCrossRefGoogle Scholar
  2. 2.
    Baxter, R., Oliver, J.: Finding Overlapping Components with MML. Statistics and Computing 10(1), 5–16 (2000)CrossRefGoogle Scholar
  3. 3.
    Ben-Bassat, M.: Use of Distance Measures, Information Measures and Error Bounds in Feature Evaluation, ch. 35, pp. 773–791. Elsevier Science Pub Co.Google Scholar
  4. 4.
    Bouguila, N.: A model-based approach for discrete data clustering and feature weighting using map and stochastic complexity. IEEE Transactions on Knowledge and Data Engineering 21(12), 1649–1664 (2009)CrossRefGoogle Scholar
  5. 5.
    Boutemedjet, S., Bouguila, N., Ziou, D.: A hybrid feature extraction selection approach for high-dimensional non-gaussian data clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(8), 1429–1443 (2009)CrossRefGoogle Scholar
  6. 6.
    Lowe, D.G.: Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  7. 7.
    Dowe, D.L., Oliver, J.J., Baxter, R.A., Wallace, C.S.: Bayesian estimation of the von mises concentration parameter. In: Proceedings of the Fifteenth International Workshop on Maximum Entropy and Bayesian Methods, pp. 51–59. Kluwer Academic (1995)Google Scholar
  8. 8.
    Dredze, M., Gevaryahu, R., Elias-Bachrach, A.: Learning fast classifiers for image spam. In: CEAS (2007)Google Scholar
  9. 9.
    Erdogmus, D.: Information theoretic learning: Renyi’s entropy and its applications to adaptive. Ph.D. thesis, University of Florida (2002)Google Scholar
  10. 10.
    Fisher, N.I.: Statistical analysis of circular data, 1st edn. Cambridge University Press, Cambridge (1993)zbMATHCrossRefGoogle Scholar
  11. 11.
    Hsia, J.H., Chen, M.S.: Language-model-based detection cascade for efficient classification of image-based spam e-mail. In: Proceedings of the IEEE International Conference on Multimedia and Expo, ICME 2009, pp. 1182–1185. IEEE Press, Piscataway (2009)CrossRefGoogle Scholar
  12. 12.
    Dhillon, I.S., Modha, D.S.: Concept Decompositions for Large Sparse Text Data Using Clustering. Machine Learning 42(1-2), 143–175 (2001)zbMATHCrossRefGoogle Scholar
  13. 13.
    Jaakkola, T.S., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Proceedings of Advances in Neural Information Systems (NIPS), pp. 487–493. MIT Press (1998)Google Scholar
  14. 14.
    Jain, A., Zongker, D.: Feature selection: evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(2), 153–158 (1997)CrossRefGoogle Scholar
  15. 15.
    Koenderink, J.J.: The structure of images. Biological Cybernetics 50(5), 363–370 (1984), MathSciNetzbMATHCrossRefGoogle Scholar
  16. 16.
    Law, M.H.C., Figueiredo, M.A.T., Jain, A.K.: Simultaneous feature selection and clustering using mixture models. IEEE Trans. Pattern Anal. Mach. Intell 26, 1154–1166 (2004)CrossRefGoogle Scholar
  17. 17.
    Lemos, A., Caminhas, W., Gomide, F.: Evolving fuzzy linear regression trees with feature selection. In: 2011 IEEE Workshop on Evolving and Adaptive Intelligent Systems (EAIS), pp. 31–38 (April 2011)Google Scholar
  18. 18.
    Lindeberg, T.: Scale-space theory: A basic tool for analysing structures at different scales. Journal of Applied Statistics, 224–270 (1994)Google Scholar
  19. 19.
    Liu, Q., Qin, Z., Cheng, H., Wan, M.: Efficient modeling of spam images. In: Proceedings of the Third International Symposium on Intelligent Information Technology and Security Informatics. IEEE Computer Society, Washington, DC (2010)Google Scholar
  20. 20.
    Mardia, K.V.: Statistics of directional data. Academic Press (1972)Google Scholar
  21. 21.
    Mehta, B., Nangia, S., Gupta, M., Nejdl, W.: Detecting image spam using visual features and near duplicate detection. In: Proceedings of the 17th International Conference on World Wide Web, WWW 2008, pp. 497–506 (2008)Google Scholar
  22. 22.
    Mitra, P., Murthy, C., Pal, S.: Unsupervised feature selection using feature similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(3), 301–312 (2002)CrossRefGoogle Scholar
  23. 23.
    Perronnin, F., Dance, C.R.: Fisher kernels on visual vocabularies for image categorization. In: CVPR. IEEE Computer Society (2007)Google Scholar
  24. 24.
    Herbrich, R., Graepel, T.: A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs Work. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 224–230 (2000)Google Scholar
  25. 25.
    Rissanen, J.: Modeling by shortest data discription. Automatica 14, 465–471 (1987)CrossRefGoogle Scholar
  26. 26.
    Saeys, Y., Inza, I.N., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRefGoogle Scholar
  27. 27.
    Schwarz, G.: Estimating dimension of a model. Annals of Statistics 6, 461–464 (1978)MathSciNetzbMATHCrossRefGoogle Scholar
  28. 28.
    Titterington, D., Smith, A., Makov, U.: Statistical Analysis of Finite Mixture Distributions. John Wiley & Sons, Chichester (1985)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Ola Amayri
    • 1
  • Nizar Bouguila
    • 1
  1. 1.Faculty of Engineering and Computer ScienceConcordia UniversityMontrealCanada

Personalised recommendations