Data Mining and Knowledge Discovery

, Volume 32, Issue 3, pp 675–707 | Cite as

Kernel mixture model for probability density estimation in Bayesian classifiers

  • Wenyu Zhang
  • Zhenjiang Zhang
  • Han-Chieh Chao
  • Fan-Hsun Tseng


Estimating reliable class-conditional probability is the prerequisite to implement Bayesian classifiers, and how to estimate the probability density functions (PDFs) is also a fundamental problem for other probabilistic induction algorithms. The finite mixture model (FMM) is able to represent arbitrary complex PDFs by using a mixture of mutimodal distributions, but it assumes that the component mixtures follows a given distribution, which may not be satisfied for real world data. This paper presents a non-parametric kernel mixture model (KMM) based probability density estimation approach, in which the data sample of a class is assumed to be drawn by several unknown independent hidden subclasses. Unlike traditional FMM schemes, we simply use the k-means clustering algorithm to partition the data sample into several independent components, and the regional density diversities of components are combined using the Bayes theorem. On the basis of the proposed kernel mixture model, we present a three-step Bayesian classifier, which includes partitioning, structure learning, and PDF estimation. Experimental results show that KMM is able to improve the quality of estimated PDFs of conventional kernel density estimation (KDE) method, and also show that KMM-based Bayesian classifiers outperforms existing Gaussian, GMM, and KDE-based Bayesian classifiers.


Kernel mixture model Probability density estimation Bayesian classifier Clustering 



This work is supported by National Natural Science Foundation of China under Grant 61772064, and Academic Discipline, Post-Graduate Education Project of the Beijing Municipal Commission of Education, and Fundamental Research Funds for the Central Universities under Grant 2017YJS026. The authors also thanks the anonymous reviewers’ valuable comments and suggestions for improving the quality of this paper.


  1. Babich GA, Camps OI (1996) Weighted parzen windows for pattern classification. IEEE Trans Pattern Anal Mach Intell 18(5):567–570CrossRefGoogle Scholar
  2. Bielza C (2014) Discrete bayesian network classifiers: a survey. ACM Comput Surv 47(1):1–43MathSciNetCrossRefMATHGoogle Scholar
  3. Bouckaert RR (2004) Naive bayes classifiers that perform well with continuous variables. In: AI 2004: advances in artificial intelligence, Springer, Berlin, pp 1089–1094Google Scholar
  4. Castillo E, Gutierrez JM, Hadi AS (2012) Expert systems and probabilistic network models. Springer, BerlinMATHGoogle Scholar
  5. Chickering DM (2010) Learning bayesian networks is np-complete. Lect. Notes Stat. 112(2):121–130MathSciNetGoogle Scholar
  6. Chow CK, Liu CN, Liu c (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory 14(3):462–467 IEEE Transactions on Information Theory 14(3), 462–467CrossRefMATHGoogle Scholar
  7. Dehnad K (1986) Density estimation for statistics and data analysis. Chapman and Hall, Boca RatonGoogle Scholar
  8. Domingos P, Pazzani M (1997) On the optimality of the simple bayesian classifier under zero-one loss. Mach Learn 29(2–3):103–130CrossRefMATHGoogle Scholar
  9. Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, New YorkMATHGoogle Scholar
  10. Escobar MD, West M (1995) Bayesian density estimation and inference using mixtures. J Am Stat Assoc 90(430):577–588MathSciNetCrossRefMATHGoogle Scholar
  11. Figueiredo MAT, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24(3):381–396CrossRefGoogle Scholar
  12. Friedman N, Dan G, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2–3):131–163CrossRefMATHGoogle Scholar
  13. Girolami M, He C (2003) Probability density estimation from optimally condensed data samples. IEEE Trans Pattern Anal Mach Intell 25(10):1253–1264CrossRefGoogle Scholar
  14. Hand DJ, Till RJ (2001) A simple generalisation of the area under the roc curve for multiple class classification problems. Mach Learn 45(2):171–186CrossRefMATHGoogle Scholar
  15. Hand DJ, Yu K (2001) Idiot’s bayesłnot so stupid after all? Int Stat Rev 69(3):385–398MATHGoogle Scholar
  16. Heckerman D, Dan G, Chickering DM (1995) Learning bayesian networks: the combination of knowledge and statistical data. Mach Learn 20(3):197–243MATHGoogle Scholar
  17. Heidenreich NB, Schindler A, Sperlich S (2010) Bandwidth selection methods for kernel density estimation—a review of performance. Social Science Electronic Publishing, RochesterGoogle Scholar
  18. Holmström L (2000) The accuracy and the computational complexity of a multivariate binned kernel density estimator. J Multivar Anal 72(2):264–309MathSciNetCrossRefMATHGoogle Scholar
  19. Holmström L, Hämäläinen A (1993) The self-organizing reduced kernel density estimator. In: IEEE international conference on neural networks, IEEE, pp 417–421Google Scholar
  20. Jeon B, Landgrebe DA (1994) Fast parzen density estimation using clustering-based branch and bound. IEEE Trans Pattern Anal Mach Intell 16(9):950–954CrossRefGoogle Scholar
  21. Jeon J, Taylor JW (2012) Using conditional kernel density estimation for wind power density forecasting. J Am Stat Assoc 107(497):66–79MathSciNetCrossRefMATHGoogle Scholar
  22. Jiang L, Cai Z, Wang D, Zhang H (2012) Improving tree augmented naive bayes for class probability estimation. Knowl-Based Syst 26:239–245CrossRefGoogle Scholar
  23. John GH, Langley P (2013) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the eleventh conference on Uncertainty in artificial intelligence, pp 338–345Google Scholar
  24. Kayabol K, Zerubia J (2013) Unsupervised amplitude and texture classification of sar images with multinomial latent model. IEEE Trans Image Process 22(2):561–572MathSciNetCrossRefMATHGoogle Scholar
  25. Leray P, Francois O (2004) BNT structure learning package: documentation and experiments. Technical Report FRE CNRS 2645, Laboratoire PSI, Universite et INSA de RouenGoogle Scholar
  26. Pérez A, Larrañaga P, Inza I (2009) Bayesian classifiers based on kernel density estimation: flexible classifiers. Int J Approx Reason 50(2):341–362CrossRefMATHGoogle Scholar
  27. Raykar VC, Duraiswami R (2006) Fast optimal bandwidth selection for kernel density estimation. In: SIAM international conference on data mining, April 20–22, Bethesda, MD, USAGoogle Scholar
  28. Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Trans Speech Audio Process 3(1):72–83CrossRefGoogle Scholar
  29. Rish I (2001) An empirical study of the naive bayes classifier. J Univ Comput Sci 1(2):127Google Scholar
  30. Schwander O, Nielsen F (2012) Model centroids for the simplification of kernel density estimators. In: IEEE international conference on acoustics, speech and signal processing, pp 737–740Google Scholar
  31. Schwander O, Nielsen F (2013) Learning mixtures by simplifying kernel density estimators. Matrix Information Geometry. Springer, Berlin, pp 403–426MATHGoogle Scholar
  32. Scott DW (2015) Multivariate density estimation: theory, practice, and visualization. Wiley, New YorkCrossRefMATHGoogle Scholar
  33. Scott DW, Sheather SJ (1985) Kernel density estimation with binned data. Commun Stat Theory Methods 14(6):1353–1359CrossRefGoogle Scholar
  34. Shen W, Tokdar ST, Ghosal S (2013) Adaptive bayesian multivariate density estimation with dirichlet mixtures. Biometrika 100(3):623–640MathSciNetCrossRefMATHGoogle Scholar
  35. Simonoff JS (1997) Smoothing methods in statistics. Technometrics 92(3):338–339MathSciNetMATHGoogle Scholar
  36. Sucar LE (2015) Bayesian classifiers. Springer, LondonCrossRefGoogle Scholar
  37. Topchy AP, Jain AK, Punch WF (2004) A mixture model for clustering ensembles. In: SDM, SIAM, pp 379–390Google Scholar
  38. Wang F, Zhang C, Lu N (2005) Boosting GMM and its two applications. In: International workshop on multiple classifier systems, vol 3541. Springer, Berlin, Heidelberg, pp 12–21Google Scholar
  39. Wang S, Wang J, Chung FL (2013) Kernel density estimation, kernel methods, and fast learning in large data sets. IEEE Trans Cybern 44(1):1–20CrossRefGoogle Scholar
  40. Xiong F, Liu Y, Cheng J (2017a) Modeling and predicting opinion formation with trust propagation in online social networks. Commun Nonlinear Sci Numer Simul 44:513–524MathSciNetCrossRefGoogle Scholar
  41. Xiong F, Liu Y, Wang L, Wang X (2017b) Analysis and application of opinion model with multiple topic interactions. Chaos 27(8):083,113MathSciNetCrossRefGoogle Scholar
  42. Xu X, Yan Z, Xu S (2015) Estimating wind speed probability distribution by diffusion-based kernel density method. Electr Power Syst Res 121:28–37CrossRefGoogle Scholar
  43. Yang Y, Webb GI (2009) Discretization for naive-bayes learning: managing discretization bias and variance. Mach Learn 74(1):39–74CrossRefGoogle Scholar
  44. Yin H, Allinson NM (2001) Self-organizing mixture networks for probability density estimation. IEEE Trans Neural Netw 12(2):405–411CrossRefGoogle Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  • Wenyu Zhang
    • 1
  • Zhenjiang Zhang
    • 1
  • Han-Chieh Chao
    • 2
    • 3
    • 4
    • 5
  • Fan-Hsun Tseng
    • 6
  1. 1.School of Electronic and Information Engineering, Key Laboratory of Communication and Information Systems, Beijing Municipal Commission of EducationBeijing Jiaotong UniversityBeijingChina
  2. 2.School of Information Science and EngineeringFujian University of TechnologyFuzhouChina
  3. 3.School of Mathematics and Computer ScienceWuhan Polytechnic UniversityWuhanChina
  4. 4.Department of Electrical EngineeringNational Dong Hwa UniversityHualienTaiwan
  5. 5.Department of Computer Science and Information EngineeringNational Ilan UniversityYilanTaiwan
  6. 6.Department of Technology Application and Human Resource DevelopmentNational Taiwan Normal UniversityTaipeiTaiwan

Personalised recommendations