Skip to main content

Variational Learning of Dirichlet Process Mixtures of Generalized Dirichlet Distributions and Its Applications

  • Conference paper
Advanced Data Mining and Applications (ADMA 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7713))

Included in the following conference series:

Abstract

In this paper, we introduce a nonparametric Bayesian approach for clustering based on both Dirichlet processes and generalized Dirichlet (GD) distribution. Thanks to the proposed approach, the obstacle of estimating the correct number of clusters is sidestepped by assuming an infinite number of components. The problems of overfitting and underfitting the data are also prevented due to the nature of the nonparametric Bayesian framework. The proposed model is learned through a variational method in which the whole inference process is analytically tractable with closed-form solutions. The effectiveness and merits of the proposed clustering approach are investigated on two challenging real applications namely anomaly intrusion detection and image spam filtering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aitchison, J.: The Statistical Analysis of Compositional Data. Blackburn, Caldwell (2003)

    Google Scholar 

  2. Amayri, O., Bouguila, N.: A study of spam filtering using support vector machines. Artif. Intell. Rev. 34(1), 73–108 (2010)

    Article  Google Scholar 

  3. Amayri, O., Bouguila, N.: Content-based spam filtering using hybrid generative discriminative learning of both textual and visual features. In: ISCAS, pp. 862–865. IEEE (2012)

    Google Scholar 

  4. Antoniak, C.E.: Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Annals of Statistics 2, 1152–1174 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  5. Attias, H.: A variational Bayes framework for graphical models. In: Proc. of NIPS, pp. 209–215 (1999)

    Google Scholar 

  6. Biggio, B., Fumera, G., Pillai, I., Roli, F.: Image spam filtering using visual information. In: Proc. of the 14th International Conference on Image Analysis and Processing, pp. 105–110 (2007)

    Google Scholar 

  7. Blei, D.M., Jordan, M.I.: Variational inference for Dirichlet process mixtures. Bayesian Analysis 1, 121–144 (2005)

    Article  MathSciNet  Google Scholar 

  8. Bosch, A., Zisserman, A., Muñoz, X.: Scene Classification Via pLSA. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 517–530. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Bouguila, N., Ziou, D.: A powreful finite mixture model based on the generalized Dirichlet distribution: Unsupervised learning and applications. In: Proc. of ICPR, pp. 280–283 (2004)

    Google Scholar 

  10. Bouguila, N., Ziou, D.: A Dirichlet process mixture of Dirichlet distributions for classification and prediction. In: Proc. of the IEEE Workshop on Machine Learning for Signal Processing (MLSP), pp. 297–302 (2008)

    Google Scholar 

  11. Bouguila, N., Ziou, D.: A dirichlet process mixture of generalized Dirichlet distributions for proportional data modeling. IEEE Transactions on Neural Networks 21(1), 107–122 (2010)

    Article  Google Scholar 

  12. Bouguila, N., Ziou, D., Hammoud, R.I.: On bayesian analysis of a finite generalized Dirichlet mixture via a metropolis-within-gibbs sampling. Pattern Analysis and Applications 12(2), 151–166 (2009)

    Article  MathSciNet  Google Scholar 

  13. Bouguila, N., Ziou, D.: A Nonparametric Bayesian Learning Model: Application to Text and Image Categorization. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 463–474. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  14. Boutemedjet, S., Bouguila, N., Ziou, D.: A hybrid feature extraction selection approach for high-dimensional non-Gaussian data clustering. IEEE Transactions on PAMI 31(8), 1429–1443 (2009)

    Article  Google Scholar 

  15. Boutemedjet, S., Bouguila, N., Ziou, D.: Unsupervised Feature and Model Selection for Generalized Dirichlet Mixture Models. In: Kamel, M.S., Campilho, A. (eds.) ICIAR 2007. LNCS, vol. 4633, pp. 330–341. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  16. Boutemedjet, S., Ziou, D., Bouguila, N.: A Graphical Model for Content Based Image Suggestion and Feature Selection. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 30–41. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  17. Corduneanu, A., Bishop, C.M.: Variational Bayesian model selection for mixture distributions. In: Proc. of AISTAT, pp. 27–34 (2001)

    Google Scholar 

  18. Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, (ECCV) (2004)

    Google Scholar 

  19. Dredze, M., Gevaryahu, R., Elias-Bachrach, A.: Learning fast classifiers for image spam. In: Proc. of CEAS (2007)

    Google Scholar 

  20. Elkan, C.: Using the triangle inequality to accelerate k-means. In: Proc. of ICML, pp. 147–153 (2003)

    Google Scholar 

  21. Fan, W., Bouguila, N., Ziou, D.: A Variational Statistical Framework for Object Detection. In: Lu, B.-L., Zhang, L., Kwok, J. (eds.) ICONIP 2011, Part II. LNCS, vol. 7063, pp. 276–283. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  22. Fan, W., Bouguila, N., Ziou, D.: Variational learning for finite Dirichlet mixture models and applications. IEEE Transactions on Neural Networks and Learning Systems 23(5), 762–774 (2012)

    Article  Google Scholar 

  23. Ferguson, T.S.: Bayesian density estimation by mixtures of normal distributions. Recent Advances in Statistics 24, 287–302 (1983)

    MathSciNet  Google Scholar 

  24. Ferguson, T.S.: A Bayesian analysis of some nonparametric problems. The Annals of Statistics 1(2), 209–230 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  25. Fumera, G., Pillai, I., Roli, F.: Spam filtering based on the analysis of text information embedded into images. J. Mach. Learn. Res. 7, 2699–2720 (2006)

    Google Scholar 

  26. Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42(1/2), 177–196 (2001)

    Article  MATH  Google Scholar 

  27. Ishwaran, H., James, L.F.: Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association 96, 161–173 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  28. Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical models. In: Learning in Graphical Models, pp. 105–162 (1998)

    Google Scholar 

  29. Korwar, R.M., Hollander, M.: Contributions to the theory of Dirichlet processes. Ann. Probab. 1, 705–711 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  30. Lippmann, R., Haines, J.W., Fried, D.J., Korba, J., Das, K.: Analysis and results of the 1999 DARPA off-line intrusion detection evaluation. In: Recent Advances in Intrusion Detection, pp. 162–182 (2000)

    Google Scholar 

  31. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)

    Article  Google Scholar 

  32. Ma, Z., Leijon, A.: Bayesian estimation of Beta mixture models with variational inference. IEEE Transactions on PAMI 33(11), 2160–2173 (2011)

    Article  Google Scholar 

  33. McHugh, J., Christie, A., Allen, J.: Defending yourself: The role of intrusion detection systems. IEEE Software 17(5), 42–51 (2000)

    Article  Google Scholar 

  34. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Transactions on PAMI 27(10), 1615–1630 (2005)

    Article  Google Scholar 

  35. Neal, R.M.: Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics 9(2), 249–265 (2000)

    MathSciNet  Google Scholar 

  36. Northcutt, S., Novak, J.: Network Intrusion Detection: An Analyst’s Handbook. New Riders Publishing (2002)

    Google Scholar 

  37. Rasmussen, C.E.: The infinite Gaussian mixture model. In: Proc. of NIPS, pp. 554–560. MIT Press (2000)

    Google Scholar 

  38. Robert, C., Casella, G.: Monte Carlo Statistical Methods. Springer (1999)

    Google Scholar 

  39. Sethuraman, J.: A constructive definition of Dirichlet priors. Statistica Sinica 4, 639–650 (1994)

    MathSciNet  MATH  Google Scholar 

  40. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. Journal of the American Statistical Association 101, 705–711 (2004)

    MathSciNet  Google Scholar 

  41. Zhang, J.: The mean field theory in EM procedures for Markov random fields. IEEE Transactions on Signal Processing 40(10), 2570–2583 (1992)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fan, W., Bouguila, N. (2012). Variational Learning of Dirichlet Process Mixtures of Generalized Dirichlet Distributions and Its Applications. In: Zhou, S., Zhang, S., Karypis, G. (eds) Advanced Data Mining and Applications. ADMA 2012. Lecture Notes in Computer Science(), vol 7713. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35527-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35527-1_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35526-4

  • Online ISBN: 978-3-642-35527-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics