Estimating reliable class-conditional probability is the prerequisite to implement Bayesian classifiers, and how to estimate the probability density functions (PDFs) is also a fundamental problem for other probabilistic induction algorithms. The finite mixture model (FMM) is able to represent arbitrary complex PDFs by using a mixture of mutimodal distributions, but it assumes that the component mixtures follows a given distribution, which may not be satisfied for real world data. This paper presents a non-parametric kernel mixture model (KMM) based probability density estimation approach, in which the data sample of a class is assumed to be drawn by several unknown independent hidden subclasses. Unlike traditional FMM schemes, we simply use the k-means clustering algorithm to partition the data sample into several independent components, and the regional density diversities of components are combined using the Bayes theorem. On the basis of the proposed kernel mixture model, we present a three-step Bayesian classifier, which includes partitioning, structure learning, and PDF estimation. Experimental results show that KMM is able to improve the quality of estimated PDFs of conventional kernel density estimation (KDE) method, and also show that KMM-based Bayesian classifiers outperforms existing Gaussian, GMM, and KDE-based Bayesian classifiers.
Kernel mixture model Probability density estimation Bayesian classifier Clustering
This is a preview of subscription content, log in to check access.
This work is supported by National Natural Science Foundation of China under Grant 61772064, and Academic Discipline, Post-Graduate Education Project of the Beijing Municipal Commission of Education, and Fundamental Research Funds for the Central Universities under Grant 2017YJS026. The authors also thanks the anonymous reviewers’ valuable comments and suggestions for improving the quality of this paper.
Babich GA, Camps OI (1996) Weighted parzen windows for pattern classification. IEEE Trans Pattern Anal Mach Intell 18(5):567–570CrossRefGoogle Scholar
Chow CK, Liu CN, Liu c (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory 14(3):462–467 IEEE Transactions on Information Theory 14(3), 462–467CrossRefMATHGoogle Scholar
Dehnad K (1986) Density estimation for statistics and data analysis. Chapman and Hall, Boca RatonGoogle Scholar
Domingos P, Pazzani M (1997) On the optimality of the simple bayesian classifier under zero-one loss. Mach Learn 29(2–3):103–130CrossRefMATHGoogle Scholar
Jiang L, Cai Z, Wang D, Zhang H (2012) Improving tree augmented naive bayes for class probability estimation. Knowl-Based Syst 26:239–245CrossRefGoogle Scholar
John GH, Langley P (2013) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the eleventh conference on Uncertainty in artificial intelligence, pp 338–345Google Scholar
Kayabol K, Zerubia J (2013) Unsupervised amplitude and texture classification of sar images with multinomial latent model. IEEE Trans Image Process 22(2):561–572MathSciNetCrossRefMATHGoogle Scholar
Leray P, Francois O (2004) BNT structure learning package: documentation and experiments. Technical Report FRE CNRS 2645, Laboratoire PSI, Universite et INSA de RouenGoogle Scholar
Pérez A, Larrañaga P, Inza I (2009) Bayesian classifiers based on kernel density estimation: flexible classifiers. Int J Approx Reason 50(2):341–362CrossRefMATHGoogle Scholar
Raykar VC, Duraiswami R (2006) Fast optimal bandwidth selection for kernel density estimation. In: SIAM international conference on data mining, April 20–22, Bethesda, MD, USAGoogle Scholar
Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Trans Speech Audio Process 3(1):72–83CrossRefGoogle Scholar
Rish I (2001) An empirical study of the naive bayes classifier. J Univ Comput Sci 1(2):127Google Scholar
Schwander O, Nielsen F (2012) Model centroids for the simplification of kernel density estimators. In: IEEE international conference on acoustics, speech and signal processing, pp 737–740Google Scholar
Schwander O, Nielsen F (2013) Learning mixtures by simplifying kernel density estimators. Matrix Information Geometry. Springer, Berlin, pp 403–426MATHGoogle Scholar