Skip to main content
Log in

Variational learning for Dirichlet process mixtures of Dirichlet distributions and applications

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we propose a Bayesian nonparametric approach for modeling and selection based on a mixture of Dirichlet processes with Dirichlet distributions, which can also be seen as an infinite Dirichlet mixture model. The proposed model uses a stick-breaking representation and is learned by a variational inference method. Due to the nature of Bayesian nonparametric approach, the problems of overfitting and underfitting are prevented. Moreover, the obstacle of estimating the correct number of clusters is sidestepped by assuming an infinite number of clusters. Compared to other approximation techniques, such as Markov chain Monte Carlo (MCMC), which require high computational cost and whose convergence is difficult to diagnose, the whole inference process in the proposed variational learning framework is analytically tractable with closed-form solutions. Additionally, the proposed infinite Dirichlet mixture model with variational learning requires only a modest amount of computational power which makes it suitable to large applications. The effectiveness of our model is experimentally investigated through both synthetic data sets and challenging real-life multimedia applications namely image spam filtering and human action videos categorization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. The complete source code is available upon request.

  2. http://www.cs.jhu.edu/~mdredze/datasets/image_spam

References

  1. Antoniak CE (1974) Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann Stat 2:1152–1174

    Article  MATH  MathSciNet  Google Scholar 

  2. Attias H (1999) A variational Bayes framework for graphical models. In: Proc. of neural information processing systems (NIPS), pp 209–215

  3. Biggio B, Fumera G, Pillai I, Roli F (2007) Image spam filtering using visual information. In: Proc. of the 14th international conference on image analysis and processing (ICIAP), pp 105–110

  4. Biggio B, Fumera G, Pillai I, Roli F (2011) A survey and experimental evaluation of image spam filtering techniques. Pattern Recogn Lett 32:1436–1446

    Article  Google Scholar 

  5. Blackwell D, MacQueen J (1973) Ferguson distributions via Pólya Urn schemes. Ann Stat 1(2):353–355

    Article  MATH  MathSciNet  Google Scholar 

  6. Blei DM, Jordan MI (2005) Variational inference for Dirichlet process mixtures. Bayesian Analysis 1:121–144

    Article  MathSciNet  Google Scholar 

  7. Bosch A, Zisserman A, Munoz X (2006) Scene classification via pLSA. In: Proc. of 9th European conference on computer vision (ECCV), pp 517–530

  8. Bouguila N, Ziou D (2006) Unsupervised selection of a finite Dirichlet mixture model: an MML-based approach. IEEE Trans Knowl Data Eng 18(8):993–1009

    Article  Google Scholar 

  9. Bouguila N, Ziou D (2008) A Dirichlet arocess mixture of Dirichlet distributions for classification and prediction. In: Proc. of the IEEE workshop on machine learning for signal processing (MLSP), pp 297–302

  10. Bouguila N, Ziou D, Vaillancourt J (2004) Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application. IEEE Trans Image Process 13(11):1533–1543

    Article  Google Scholar 

  11. Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press

  12. Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, pp 1–22

  13. Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: Proc. of VS-PETS, pp 65–72

  14. Dredze M, Gevaryahu R, Elias-Bachrach A (2007) Learning fast classifiers for image spam. In: Proc. of the conference on email and anti-spam (CEAS), pp 487–493

  15. Elkan C (2003) Using the triangle inequality to accelerate K-means. In: Proc. of the 20th international conference on machine learning (ICML), pp 147–153

  16. Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1(2):209–230

    Article  MATH  MathSciNet  Google Scholar 

  17. Ferguson TS (1983) Bayesian density estimation by mixtures of normal distributions. Recent Adv Stat 24:287–302

    MathSciNet  Google Scholar 

  18. Fumera G, Pillai I, Roli F (2006) Spam filtering based on the analysis of text information embedded into images. J Mach Learn Res 7:2699–2720

    Google Scholar 

  19. Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1/2):177–196

    Article  MATH  Google Scholar 

  20. Ishwaran H, James LF (2001) Gibbs sampling methods for stick-breaking priors. J Am Stat Assoc 96:161–173

    Article  MATH  MathSciNet  Google Scholar 

  21. Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37(2):183–233

    Article  MATH  Google Scholar 

  22. Khoshabeh R, Hollan JD (2009) Spatio-temporal interest points for video analysis. In: Proc. of the 27th international conference extended abstracts on human factors in computing systems, pp 3455–3460

  23. Korwar RM, Hollander M (1973) Contributions to the theory of Dirichlet processes. Ann Probab 1:705–711

    Article  MATH  MathSciNet  Google Scholar 

  24. Laptev I, Lindeberg T (2003) Space-time interest points. In: Proc. of IEEE international conference on computer vision (ICCV), pp 432–439

  25. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8

  26. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  27. Ma Z, Leijon A (2011) Bayesian estimation of beta mixture models with variational inference. IEEE Trans Pattern Anal Mach Intell 33(11):2160–2173

    Article  Google Scholar 

  28. McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

    Book  MATH  Google Scholar 

  29. Mehta B, Nangia S, Gupta M, Nejdl W (2008) Detecting image spam using visual features and near duplicate detection. In: Proc. of the 17th international conference on World Wide Web, pp 497–506

  30. Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intell 27(10):1615–1630

    Article  Google Scholar 

  31. Neal RM (2000) Markov chain sampling methods for Dirichlet process mixture models. J Comput Graph Stat 9(2):249–265

    MathSciNet  Google Scholar 

  32. Parisi G (1988) Statistical field theory. Addison-Wesley

  33. Rasmussen CE (2000) The infinite Gaussian mixture model. In: Proc. of neural information processing systems (NIPS), pp 554–560

  34. Robert C, Casella G (1999) Monte Carlo statistical methods. Springer

  35. Schüldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proc. of international conference on pattern recognition (ICPR), pp 32–36

  36. Sethuraman J (1994) A constructive definition of Dirichlet priors. Stat Sin 4:639–650

    MATH  MathSciNet  Google Scholar 

  37. Teh YW, Jordan MI, Beal MJ, Blei DM (2004) Hierarchical Dirichlet processes. J Am Stat Assoc 101:705–711

    MathSciNet  Google Scholar 

  38. Woolrich MW, Behrens TE (2006) Variational Bayes inference of spatial mixture models for segmentation. IEEE Trans Med Imag 25(10):1380–1391

    Article  Google Scholar 

  39. Zhong D, Zhang H, Chang SF (1996) Clustering methods for video browsing and annotation. In: Storage and retrieval for image and video databases (SPIE), pp 239–246

  40. Zhou X, Zhuang X, Yan S, Chang SF, Hasegawa-Johnson M, Huang TS (2008) SIFT-Bag Kernel for video event analysis. In: Proc. of the 16th ACM international conference on multimedia, pp 229–238

Download references

Acknowledgements

The completion of this research was made possible thanks to the Natural Sciences and Engineering Research Council of Canada (NSERC). The authors would like to thank the anonymous referees and the associate editor for their helpful comments. The complete source code of this work is available upon request.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nizar Bouguila.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, W., Bouguila, N. Variational learning for Dirichlet process mixtures of Dirichlet distributions and applications. Multimed Tools Appl 70, 1685–1702 (2014). https://doi.org/10.1007/s11042-012-1191-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1191-0

Keywords

Navigation