Abstract
In this paper, we introduce a nonparametric Bayesian approach for clustering based on both Dirichlet processes and generalized Dirichlet (GD) distribution. Thanks to the proposed approach, the obstacle of estimating the correct number of clusters is sidestepped by assuming an infinite number of components. The problems of overfitting and underfitting the data are also prevented due to the nature of the nonparametric Bayesian framework. The proposed model is learned through a variational method in which the whole inference process is analytically tractable with closed-form solutions. The effectiveness and merits of the proposed clustering approach are investigated on two challenging real applications namely anomaly intrusion detection and image spam filtering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aitchison, J.: The Statistical Analysis of Compositional Data. Blackburn, Caldwell (2003)
Amayri, O., Bouguila, N.: A study of spam filtering using support vector machines. Artif. Intell. Rev. 34(1), 73–108 (2010)
Amayri, O., Bouguila, N.: Content-based spam filtering using hybrid generative discriminative learning of both textual and visual features. In: ISCAS, pp. 862–865. IEEE (2012)
Antoniak, C.E.: Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Annals of Statistics 2, 1152–1174 (1974)
Attias, H.: A variational Bayes framework for graphical models. In: Proc. of NIPS, pp. 209–215 (1999)
Biggio, B., Fumera, G., Pillai, I., Roli, F.: Image spam filtering using visual information. In: Proc. of the 14th International Conference on Image Analysis and Processing, pp. 105–110 (2007)
Blei, D.M., Jordan, M.I.: Variational inference for Dirichlet process mixtures. Bayesian Analysis 1, 121–144 (2005)
Bosch, A., Zisserman, A., Muñoz, X.: Scene Classification Via pLSA. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 517–530. Springer, Heidelberg (2006)
Bouguila, N., Ziou, D.: A powreful finite mixture model based on the generalized Dirichlet distribution: Unsupervised learning and applications. In: Proc. of ICPR, pp. 280–283 (2004)
Bouguila, N., Ziou, D.: A Dirichlet process mixture of Dirichlet distributions for classification and prediction. In: Proc. of the IEEE Workshop on Machine Learning for Signal Processing (MLSP), pp. 297–302 (2008)
Bouguila, N., Ziou, D.: A dirichlet process mixture of generalized Dirichlet distributions for proportional data modeling. IEEE Transactions on Neural Networks 21(1), 107–122 (2010)
Bouguila, N., Ziou, D., Hammoud, R.I.: On bayesian analysis of a finite generalized Dirichlet mixture via a metropolis-within-gibbs sampling. Pattern Analysis and Applications 12(2), 151–166 (2009)
Bouguila, N., Ziou, D.: A Nonparametric Bayesian Learning Model: Application to Text and Image Categorization. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 463–474. Springer, Heidelberg (2009)
Boutemedjet, S., Bouguila, N., Ziou, D.: A hybrid feature extraction selection approach for high-dimensional non-Gaussian data clustering. IEEE Transactions on PAMI 31(8), 1429–1443 (2009)
Boutemedjet, S., Bouguila, N., Ziou, D.: Unsupervised Feature and Model Selection for Generalized Dirichlet Mixture Models. In: Kamel, M.S., Campilho, A. (eds.) ICIAR 2007. LNCS, vol. 4633, pp. 330–341. Springer, Heidelberg (2007)
Boutemedjet, S., Ziou, D., Bouguila, N.: A Graphical Model for Content Based Image Suggestion and Feature Selection. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 30–41. Springer, Heidelberg (2007)
Corduneanu, A., Bishop, C.M.: Variational Bayesian model selection for mixture distributions. In: Proc. of AISTAT, pp. 27–34 (2001)
Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, (ECCV) (2004)
Dredze, M., Gevaryahu, R., Elias-Bachrach, A.: Learning fast classifiers for image spam. In: Proc. of CEAS (2007)
Elkan, C.: Using the triangle inequality to accelerate k-means. In: Proc. of ICML, pp. 147–153 (2003)
Fan, W., Bouguila, N., Ziou, D.: A Variational Statistical Framework for Object Detection. In: Lu, B.-L., Zhang, L., Kwok, J. (eds.) ICONIP 2011, Part II. LNCS, vol. 7063, pp. 276–283. Springer, Heidelberg (2011)
Fan, W., Bouguila, N., Ziou, D.: Variational learning for finite Dirichlet mixture models and applications. IEEE Transactions on Neural Networks and Learning Systems 23(5), 762–774 (2012)
Ferguson, T.S.: Bayesian density estimation by mixtures of normal distributions. Recent Advances in Statistics 24, 287–302 (1983)
Ferguson, T.S.: A Bayesian analysis of some nonparametric problems. The Annals of Statistics 1(2), 209–230 (1973)
Fumera, G., Pillai, I., Roli, F.: Spam filtering based on the analysis of text information embedded into images. J. Mach. Learn. Res. 7, 2699–2720 (2006)
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42(1/2), 177–196 (2001)
Ishwaran, H., James, L.F.: Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association 96, 161–173 (2001)
Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical models. In: Learning in Graphical Models, pp. 105–162 (1998)
Korwar, R.M., Hollander, M.: Contributions to the theory of Dirichlet processes. Ann. Probab. 1, 705–711 (1973)
Lippmann, R., Haines, J.W., Fried, D.J., Korba, J., Das, K.: Analysis and results of the 1999 DARPA off-line intrusion detection evaluation. In: Recent Advances in Intrusion Detection, pp. 162–182 (2000)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)
Ma, Z., Leijon, A.: Bayesian estimation of Beta mixture models with variational inference. IEEE Transactions on PAMI 33(11), 2160–2173 (2011)
McHugh, J., Christie, A., Allen, J.: Defending yourself: The role of intrusion detection systems. IEEE Software 17(5), 42–51 (2000)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Transactions on PAMI 27(10), 1615–1630 (2005)
Neal, R.M.: Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics 9(2), 249–265 (2000)
Northcutt, S., Novak, J.: Network Intrusion Detection: An Analyst’s Handbook. New Riders Publishing (2002)
Rasmussen, C.E.: The infinite Gaussian mixture model. In: Proc. of NIPS, pp. 554–560. MIT Press (2000)
Robert, C., Casella, G.: Monte Carlo Statistical Methods. Springer (1999)
Sethuraman, J.: A constructive definition of Dirichlet priors. Statistica Sinica 4, 639–650 (1994)
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. Journal of the American Statistical Association 101, 705–711 (2004)
Zhang, J.: The mean field theory in EM procedures for Markov random fields. IEEE Transactions on Signal Processing 40(10), 2570–2583 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fan, W., Bouguila, N. (2012). Variational Learning of Dirichlet Process Mixtures of Generalized Dirichlet Distributions and Its Applications. In: Zhou, S., Zhang, S., Karypis, G. (eds) Advanced Data Mining and Applications. ADMA 2012. Lecture Notes in Computer Science(), vol 7713. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35527-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-35527-1_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35526-4
Online ISBN: 978-3-642-35527-1
eBook Packages: Computer ScienceComputer Science (R0)