Abstract
A fast maximum likelihood estimator based on a linear combination of Gaussian kernels is introduced to represent the square root of probability density function. It is shown that, if the kernel centres and kernel width are known, then the underlying problem can be formulated as a Riemannian optimization one. The first order Riemannian geometry of the sphere manifold and vector transport are explored, and then the well-known Riemannian conjugate gradient algorithm is used to estimate the model parameters. For completeness the k-means clustering algorithm and a grid search are applied to determine the centers and kernel width respectively. Illustrative examples are employed to demonstrate that the proposed approach is effective in constructing the estimate of the square root of probability density function.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press (2008)
Mishra, B., Meyer, G., Bach, F., Sepulchre, R.: Low-rank optimization with trace norm penalty. SIAM J. Optim. 23(4), 2124–2149 (2013)
Harandi, M., Hartley, R., Shen, C., Lovell, B., Sanderson, C.: Extrinsic methods for coding and dictionary learning on Grassmann manifolds, pp. 1–41 (2014). arXiv:1401.8126
Lui, Y.M.: Advances in matrix manifolds for computer vision. Image Vision Comput. 30, 380–388 (2012)
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2000)
Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
Chen, S., Hong, X., Harris, C.J.: Particle swarm optimization aided orthogonal forward regression for unified data modelling. IEEE Trans. Evol. Comput. 14(4), 477–499 (2010)
Rutkowski, L.: Adaptive probabilistic neural networks for pattern classification in time-varying environment. IEEE Trans. Neural Netw. 15(4), 811–827 (2004)
Yin, H., Allinson, N.W.: Self-organizing mixture networks for probability density estimation. IEEE Trans. Neural Netw. 12(2), 405–411 (2001)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39(1), 1–38 (1977)
Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1066–1076 (1962)
Weston, J., Gammerman, A., Stitson, M.O., Vapnik, V., Vovk, V., Watkins, C.: Support vector density estimation. In: Schölkopf, B., Burges, C., Smola, A.J. (eds.) Advances in Kernel Methods—Support Vector Learning, pp. 293–306. MIT Pres, Cambridge, MA (1999)
Vapnik, V., Mukherjee, S.: Support vector method for multivariate density estimation. In: Solla, S., Leen, T., Müller, K.R. (eds.) Advances in Neural Information Processing Systems, pp. 659–665. MIT Press, Cambridge, MA (2000)
Girolami, M., He, C.: Probability density estimation from optimally condensed data samples. IEEE Trans. Pattern Anal. Mach. Intell. 25(10), 1253–1264 (2003)
Hong, X., Gao, J., Chen, S., Zia, T.: Sparse density estimation on the multinomial manifold. IEEE Trans. Neural Netw. Learn. Syst. (In Press, 2015)
Choudhury, A.: Fast machine learning algorithms for large data. Ph.D. dissertation, School of Engineering Sciences, University of Southampton (2002)
Chen, S., Hong, X., Harris, C.J., Sharkey, P.M.: Sparse modeling using forward regression with PRESS statistic and regularization. IEEE Trans. Syst. Man Cybern. Part B 34(2), 898–911 (2004)
Chen, S., Hong, X., Harris, C.J.: Sparse kernel density construction using orthogonal forward regression with leave-one-out test score and local regularization. IEEE Trans. Syst. Man Cybern. Part B 34(4), 1708–1717 (2004)
Chen, S., Hong, X., Harris, C.J.: An orthogonal forward regression techniques for sparse kernel density estimation. Neurocomputing 71(4–6), 931–943 (2008)
Hong, X., Chen, S., Qatawneh, A., Daqrouq, K., Sheikh, M., Morfeq, A.: Sparse probability density function estimation using the minimum integrated square error. Neurocomputing 115, 122–129 (2013)
Bilmes, J.A.: A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report ICSI-TR-97-021. University of California, Berkeley (1998)
Pinheiro, A., Vidakovic, B.: Estimating the square root of density via compactly supported wavelets. Comput. Stat. Data Anal. 25(4), 399–415 (1998)
Hager, W.W., Zhang, H.: A survey of nonlinear conjugate gradient methods. Pac. J. Optim. 2(1), 35–58 (2006)
Boumal, N., Mishra, B., Absil, P.-A., Sepulchre, R.: Manopt, a Matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15, 1455–1459 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix A
Appendix A
To integrate \(q_{i,j}=\int K_{\sigma }\big (\varvec{x},\varvec{c}_i \big ) K_{\sigma }\big (\varvec{x},\varvec{c}_j \big ) d\varvec{x}\), we let \(\varvec{x}=[x_1,...x_m]^\mathrm{T}\), we have
in which
By making use of \(\int \frac{1}{\sqrt{2\pi \sigma ^2}} \exp \Big (-\frac{(x_l-c)^2}{2\sigma ^2} \Big )dx_l =1\), i.e. Gaussian density integrates to one, we have
Hence
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Hong, X., Gao, J. (2016). A Fast Algorithm to Estimate the Square Root of Probability Density Function. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXXIII. SGAI 2016. Springer, Cham. https://doi.org/10.1007/978-3-319-47175-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-47175-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47174-7
Online ISBN: 978-3-319-47175-4
eBook Packages: Computer ScienceComputer Science (R0)