Advertisement

EP-Based Infinite Inverted Dirichlet Mixture Learning: Application to Image Spam Detection

  • Wentao Fan
  • Sami Bourouis
  • Nizar BouguilaEmail author
  • Fahd Aldosari
  • Hassen Sallay
  • K. M. Jamil Khayyat
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10868)

Abstract

We propose in this paper a new fully unsupervised model based on a Dirichlet process prior and the inverted Dirichlet distribution that allows the automatic inferring of clusters from data. The main idea is to let the number of mixture components increases as new vectors arrive. This allows answering the model selection problem in a elegant way since the resulting model can be viewed as an infinite inverted Dirichlet mixture. An expectation propagation (EP) inference methodology is developed to learn this model by obtaining a full posterior distribution on its parameters. We validate the model on a challenging application namely image spam filtering to show the merits of the framework.

Notes

Acknowledgements

The authors would like to thank the Deanship of Scientific Research at umm Al-Qura University for the continuous support. This work was supported financially by the Deanship of Scientific Research at Umm Al-Qura University under the grant number 15-COM-3-1-0006. The first author was supported by the National Natural Science Foundation of China (61502183).

References

  1. 1.
    McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2000)CrossRefGoogle Scholar
  2. 2.
    Bdiri, T., Bouguila, N.: Positive vectors clustering using inverted dirichlet finite mixture models. Expert Syst. Appl. 39(2), 1869–1882 (2012)CrossRefGoogle Scholar
  3. 3.
    Bouguila, N., Ziou, D.: High-dimensional unsupervised selection and estimation of a finite generalized dirichlet mixture model based on minimum message length. IEEE Trans. Pattern Anal. Mach. Intell. 29(10), 1716–1731 (2007)CrossRefGoogle Scholar
  4. 4.
    Rasmussen, C.E.: The infinite Gaussian mixture model. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 554–560. MIT Press (2000)Google Scholar
  5. 5.
    Blackwell, D., MacQueen, J.: Ferguson distributions via pólya urn schemes. Ann. Stat. 1(2), 353–355 (1973)CrossRefGoogle Scholar
  6. 6.
    Korwar, R.M., Hollander, M.: Contributions to the theory of Dirichlet processes. Ann. Prob. 1, 705–711 (1973)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Blei, D.M., Jordan, M.I.: Variational inference for Dirichlet process mixtures. Bayesian Anal. 1, 121–144 (2005)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Bouguila, N., Ziou, D.: A Dirichlet process mixture of Dirichlet distributions for classification and prediction. In: Proceedings of the IEEE Workshop on Machine Learning for Signal Processing (MLSP), pp. 297–302 (2008)Google Scholar
  9. 9.
    Zhang, X., Chen, B., Liu, H., Zuo, L., Feng, B.: Infinite max-margin factor analysis via data augmentation. Pattern Recogn. 52(Suppl. C), 17–32 (2016)CrossRefGoogle Scholar
  10. 10.
    Bertrand, A., Al-Osaimi, F.R., Bouguila, N.: View-based 3D objects recognition with expectation propagation learning. In: Bebis, G., Boyle, R., Parvin, B., Koracin, D., Porikli, F., Skaff, S., Entezari, A., Min, J., Iwai, D., Sadagic, A., Scheidegger, C., Isenberg, T. (eds.) ISVC 2016. LNCS, vol. 10073, pp. 359–369. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-50832-0_35CrossRefGoogle Scholar
  11. 11.
    Minka, T., Ghahramani, Z.: Expectation propagation for infinite mixtures. In: NIPS 2003 Workshop on Nonparametric Bayesian Methods and Infinite Models (2003)Google Scholar
  12. 12.
    Bouguila, N.: Infinite Liouville mixture models with application to text and texture categorization. Pattern Recogn. Lett. 33(2), 103–110 (2012)CrossRefGoogle Scholar
  13. 13.
    Bouguila, N., Ziou, D.: A Dirichlet process mixture of generalized Dirichlet distributions for proportional data modeling. IEEE Trans. Neural Netw. 21(1), 107–122 (2010)CrossRefGoogle Scholar
  14. 14.
    Minka, T.: Expectation propagation for approximate Bayesian inference. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), pp. 362–369 (2001)Google Scholar
  15. 15.
    Minka, T., Lafferty, J.: Expectation-propagation for the generative aspect model. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), pp. 352–359 (2002)Google Scholar
  16. 16.
    Chang, S., Dasgupta, N., Carin, L.: A Bayesian approach to unsupervised feature selection and density estimation using expectation propagation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1043–1050 (2005)Google Scholar
  17. 17.
    Maybeck, P.S.: Stochastic Models, Estimation and Control. Academic Press, New York (1982)zbMATHGoogle Scholar
  18. 18.
    Ishwaran, H., James, L.F.: Gibbs sampling methods for stick-breaking priors. J. Am. Stat. Assoc. 96, 161–173 (2001)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Ma, Z., Leijon, A.: Expectation propagation for estimating the parameters of the beta distribution. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 2082–2085 (2010)Google Scholar
  20. 20.
    Zhang, Y., Wang, S., Phillips, P., Ji, G.: Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl. Based Syst. 64, 22–31 (2014)CrossRefGoogle Scholar
  21. 21.
    Amayri, O., Bouguila, N.: Improved online support vector machines spam filtering using string kernels. In: Bayro-Corrochano, E., Eklundh, J.-O. (eds.) CIARP 2009. LNCS, vol. 5856, pp. 621–628. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-10268-4_73CrossRefGoogle Scholar
  22. 22.
    Amayri, O., Bouguila, N.: Online spam filtering using support vector machines. In: Proceedings of the 14th IEEE Symposium on Computers and Communications (ISCC 2009), 5–8 July Sousse, Tunisia, pp. 337–340. IEEE Computer Society (2009)Google Scholar
  23. 23.
    Biggio, B., Fumera, G., Pillai, I., Roli, F.: A survey and experimental evaluation of image spam filtering techniques. Pattern Recogn. Lett. 32, 1436–1446 (2011)CrossRefGoogle Scholar
  24. 24.
    Fumera, G., Pillai, I., Roli, F.: Spam filtering based on the analysis of text information embedded into images. J. Mach. Learn. Res. 7, 2699–2720 (2006)Google Scholar
  25. 25.
    Biggio, B., Fumera, G., Pillai, I., Roli, F.: Image spam filtering using visual information. In: Proceedings of the 14th International Conference on Image Analysis and Processing (ICIAP), pp. 105–110 (2007)Google Scholar
  26. 26.
    Mehta, B., Nangia, S., Gupta, M., Nejdl, W.: Detecting image spam using visual features and near duplicate detection. In: Proceedings of the 17th International Conference on World Wide Web, pp. 497–506 (2008)Google Scholar
  27. 27.
    Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1/2), 177–196 (2001)CrossRefGoogle Scholar
  28. 28.
    Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, 8th European Conference on Computer Vision (ECCV), pp. 1–22 (2004)Google Scholar
  29. 29.
    Dredze, M., Gevaryahu, R., Elias-Bachrach, A.: Learning fast classifiers for image spam. In: Proceedings of the Conference on Email and Anti-Spam (CEAS), pp. 487–493 (2007)Google Scholar
  30. 30.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Bdiri, T., Bouguila, N.: An infinite mixture of inverted Dirichlet distributions. In: Lu, B.-L., Zhang, L., Kwok, J. (eds.) ICONIP 2011. LNCS, vol. 7063, pp. 71–78. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-24958-7_9CrossRefGoogle Scholar
  32. 32.
    Fan, W., Bouguila, N.: Topic novelty detection using infinite variational inverted Dirichlet mixture models. In: 14th IEEE International Conference on Machine Learning and Applications, ICMLA 2015, Miami, FL, USA, 9–11 December 2015, pp. 70–75 (2015)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Wentao Fan
    • 1
  • Sami Bourouis
    • 2
  • Nizar Bouguila
    • 3
    Email author
  • Fahd Aldosari
    • 4
  • Hassen Sallay
    • 4
  • K. M. Jamil Khayyat
    • 4
  1. 1.Huaqiao UniversityXiamenChina
  2. 2.Taif universityTaifKingdom of Saudi Arabia
  3. 3.Concordia UniversityMontrealCanada
  4. 4.Umm Al-Qura UniversityMakkahSaudi Arabia

Personalised recommendations