Advertisement

Positive Data Kernel Density Estimation via the LogKDE Package for R

  • Hien D. NguyenEmail author
  • Andrew T. Jones
  • Geoffrey J. McLachlan
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 996)

Abstract

Kernel density estimators (KDEs) are ubiquitous tools for nonparametric estimation of probability density functions (PDFs), when data are obtained from unknown data generating processes. The KDEs that are typically available in software packages are defined, and designed, to estimate real-valued data. When applied to positive data, these typical KDEs do not yield bona fide PDFs. A log-transformation methodology can be applied to produce a nonparametric estimator that is appropriate and yields proper PDFs over positive supports. We call the KDEs obtained via this transformation log-KDEs. We derive expressions for the pointwise biases, variances, and mean-squared errors of the log-KDEs that are obtained via various kernel functions. Mean integrated squared error (MISE) and asymptotic MISE results are also provided and a plug-in rule for log-KDE bandwidths is derived. We demonstrate the log-KDEs methodology via our R package, logKDE. Real data case studies are provided to demonstrate the log-KDE approach.

Keywords

Kernel density estimator Log-transformation Nonparametric Plug-in rule Positive data 

References

  1. 1.
    Aggarwal, C.C.: Data Mining. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-14142-8CrossRefzbMATHGoogle Scholar
  2. 2.
    Amemiya, T.: Introduction to Statistics and Econometrics. Harvard University Press, Cambridge (1994)Google Scholar
  3. 3.
    Chambers, J.M., Cleveland, W.S., Kleiner, B., Tukey, P.A.: Graphical Methods for Data Analysis. Wadsworth, Belmont (1983)zbMATHGoogle Scholar
  4. 4.
    Charpentier, A., Flachaire, E.: Log-transform kernel density estimation of income distribution. L’Actualite Economique 91, 141–159 (2015)Google Scholar
  5. 5.
    DasGupta, A.: Asymptotic Theory Of Statistics And Probability. Springer, New York (2008).  https://doi.org/10.1007/978-0-387-75971-5
  6. 6.
    Hirukawa, M., Sakudo, M.: Nonnegative bias reduction methods for density estimation using asymmetric kernels. Comput. Stat. Data Anal. 75, 112–123 (2014)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Igarashi, G.: Weighted log-normal kernel density estimation. Commun. Stat. - Theory Methods 45, 6670–6687 (2016)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Igarashi, G., Kakizawa, Y.: Bias corrections for some asymmetric kernel estimators. J. Stat. Plan. Inference 159, 37–63 (2015)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Jin, X., Kawczak, J.: Birnbaum-Saunders and lognormal kernel estimators for modelling durations in high frequency financial data. Ann. Econ. Financ. 4, 103–124 (2003)Google Scholar
  10. 10.
    Jones, A.T., Nguyen, H.D., McLachlan, G.J.: logKDE: log-transformed kernel density estimation. J. Open Source Softw. 3, 870 (2018)CrossRefGoogle Scholar
  11. 11.
    Marron, J.S., Ruppert, D.: Transformations to reduce boundary bias in kernel density estimation. J. R. Stat. Soc. B 56, 653–671 (1994)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Nguyen, H.D., Jones, A.T., McLachlan, G.J.: logKDE: computing log-transformed kernel density estimates for postive data (2018). cran.r-project.org/package=logKDE
  13. 13.
    Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33, 1065–1076 (1962)MathSciNetCrossRefGoogle Scholar
  14. 14.
    R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing (2016)Google Scholar
  15. 15.
    Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. Ann. Math. Stat. 27, 832–835 (1956)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Scott, D.W., Terrell, G.R.: Biased and unbiased cross-validation in density estimation. J. Am. Stat. Assoc. 82(400), 1131–1146 (1987)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Sheather, S.J., Jones, M.C.: A reliable data-based bandwidth selection method for kernel density estimation. J. R. Stat. Soc. B 53, 683–690 (1991)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)CrossRefGoogle Scholar
  19. 19.
    van der Vaart, A.: Asymptotic Statistics. Cambridge University Press, Cambridge (1998)CrossRefGoogle Scholar
  20. 20.
    Wand, M.P., Jones, M.C.: Kernel Smoothing. Springer, New York (1995)CrossRefGoogle Scholar
  21. 21.
    Wand, M.P., Marron, J.S., Ruppert, D.: Transformations in density estimation. J. Am. Stat. Assoc. 86, 343–353 (1991)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Wansouwé, W.E., Libengué, F.G., Kokonendji, C.C.: Conake: Continuous Associated Kernel Estimation (2015). CRAN.R-project.org/package=Conake
  23. 23.
    Wansouwé, W.E., Some, S.M., Kokonendji, C.C.: Ake: an R package for discrete and continuous associated kernel estimations. R Journal 8, 258–276 (2016)Google Scholar
  24. 24.
    Watnik, M.R.: Pay for play: are baseball salaries based on performance? J. Stat. Educ. 6, 1–5 (1998)CrossRefGoogle Scholar
  25. 25.
    Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Amsterdam (2017)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Hien D. Nguyen
    • 1
    Email author
  • Andrew T. Jones
    • 2
  • Geoffrey J. McLachlan
    • 2
  1. 1.La Trobe UniversityBundooraAustralia
  2. 2.University of QueenslandSt. LuciaAustralia

Personalised recommendations