Abstract
In this paper, we propose a Bayesian nonparametric approach for modeling and selection based on a mixture of Dirichlet processes with Dirichlet distributions, which can also be seen as an infinite Dirichlet mixture model. The proposed model uses a stick-breaking representation and is learned by a variational inference method. Due to the nature of Bayesian nonparametric approach, the problems of overfitting and underfitting are prevented. Moreover, the obstacle of estimating the correct number of clusters is sidestepped by assuming an infinite number of clusters. Compared to other approximation techniques, such as Markov chain Monte Carlo (MCMC), which require high computational cost and whose convergence is difficult to diagnose, the whole inference process in the proposed variational learning framework is analytically tractable with closed-form solutions. Additionally, the proposed infinite Dirichlet mixture model with variational learning requires only a modest amount of computational power which makes it suitable to large applications. The effectiveness of our model is experimentally investigated through both synthetic data sets and challenging real-life multimedia applications namely image spam filtering and human action videos categorization.
Similar content being viewed by others
Notes
The complete source code is available upon request.
References
Antoniak CE (1974) Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann Stat 2:1152–1174
Attias H (1999) A variational Bayes framework for graphical models. In: Proc. of neural information processing systems (NIPS), pp 209–215
Biggio B, Fumera G, Pillai I, Roli F (2007) Image spam filtering using visual information. In: Proc. of the 14th international conference on image analysis and processing (ICIAP), pp 105–110
Biggio B, Fumera G, Pillai I, Roli F (2011) A survey and experimental evaluation of image spam filtering techniques. Pattern Recogn Lett 32:1436–1446
Blackwell D, MacQueen J (1973) Ferguson distributions via Pólya Urn schemes. Ann Stat 1(2):353–355
Blei DM, Jordan MI (2005) Variational inference for Dirichlet process mixtures. Bayesian Analysis 1:121–144
Bosch A, Zisserman A, Munoz X (2006) Scene classification via pLSA. In: Proc. of 9th European conference on computer vision (ECCV), pp 517–530
Bouguila N, Ziou D (2006) Unsupervised selection of a finite Dirichlet mixture model: an MML-based approach. IEEE Trans Knowl Data Eng 18(8):993–1009
Bouguila N, Ziou D (2008) A Dirichlet arocess mixture of Dirichlet distributions for classification and prediction. In: Proc. of the IEEE workshop on machine learning for signal processing (MLSP), pp 297–302
Bouguila N, Ziou D, Vaillancourt J (2004) Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application. IEEE Trans Image Process 13(11):1533–1543
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, pp 1–22
Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: Proc. of VS-PETS, pp 65–72
Dredze M, Gevaryahu R, Elias-Bachrach A (2007) Learning fast classifiers for image spam. In: Proc. of the conference on email and anti-spam (CEAS), pp 487–493
Elkan C (2003) Using the triangle inequality to accelerate K-means. In: Proc. of the 20th international conference on machine learning (ICML), pp 147–153
Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1(2):209–230
Ferguson TS (1983) Bayesian density estimation by mixtures of normal distributions. Recent Adv Stat 24:287–302
Fumera G, Pillai I, Roli F (2006) Spam filtering based on the analysis of text information embedded into images. J Mach Learn Res 7:2699–2720
Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1/2):177–196
Ishwaran H, James LF (2001) Gibbs sampling methods for stick-breaking priors. J Am Stat Assoc 96:161–173
Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37(2):183–233
Khoshabeh R, Hollan JD (2009) Spatio-temporal interest points for video analysis. In: Proc. of the 27th international conference extended abstracts on human factors in computing systems, pp 3455–3460
Korwar RM, Hollander M (1973) Contributions to the theory of Dirichlet processes. Ann Probab 1:705–711
Laptev I, Lindeberg T (2003) Space-time interest points. In: Proc. of IEEE international conference on computer vision (ICCV), pp 432–439
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Ma Z, Leijon A (2011) Bayesian estimation of beta mixture models with variational inference. IEEE Trans Pattern Anal Mach Intell 33(11):2160–2173
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Mehta B, Nangia S, Gupta M, Nejdl W (2008) Detecting image spam using visual features and near duplicate detection. In: Proc. of the 17th international conference on World Wide Web, pp 497–506
Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intell 27(10):1615–1630
Neal RM (2000) Markov chain sampling methods for Dirichlet process mixture models. J Comput Graph Stat 9(2):249–265
Parisi G (1988) Statistical field theory. Addison-Wesley
Rasmussen CE (2000) The infinite Gaussian mixture model. In: Proc. of neural information processing systems (NIPS), pp 554–560
Robert C, Casella G (1999) Monte Carlo statistical methods. Springer
Schüldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proc. of international conference on pattern recognition (ICPR), pp 32–36
Sethuraman J (1994) A constructive definition of Dirichlet priors. Stat Sin 4:639–650
Teh YW, Jordan MI, Beal MJ, Blei DM (2004) Hierarchical Dirichlet processes. J Am Stat Assoc 101:705–711
Woolrich MW, Behrens TE (2006) Variational Bayes inference of spatial mixture models for segmentation. IEEE Trans Med Imag 25(10):1380–1391
Zhong D, Zhang H, Chang SF (1996) Clustering methods for video browsing and annotation. In: Storage and retrieval for image and video databases (SPIE), pp 239–246
Zhou X, Zhuang X, Yan S, Chang SF, Hasegawa-Johnson M, Huang TS (2008) SIFT-Bag Kernel for video event analysis. In: Proc. of the 16th ACM international conference on multimedia, pp 229–238
Acknowledgements
The completion of this research was made possible thanks to the Natural Sciences and Engineering Research Council of Canada (NSERC). The authors would like to thank the anonymous referees and the associate editor for their helpful comments. The complete source code of this work is available upon request.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fan, W., Bouguila, N. Variational learning for Dirichlet process mixtures of Dirichlet distributions and applications. Multimed Tools Appl 70, 1685–1702 (2014). https://doi.org/10.1007/s11042-012-1191-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-012-1191-0