Skip to main content

Positive Data Kernel Density Estimation via the LogKDE Package for R

  • Conference paper
  • First Online:
Data Mining (AusDM 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 996))

Included in the following conference series:

Abstract

Kernel density estimators (KDEs) are ubiquitous tools for nonparametric estimation of probability density functions (PDFs), when data are obtained from unknown data generating processes. The KDEs that are typically available in software packages are defined, and designed, to estimate real-valued data. When applied to positive data, these typical KDEs do not yield bona fide PDFs. A log-transformation methodology can be applied to produce a nonparametric estimator that is appropriate and yields proper PDFs over positive supports. We call the KDEs obtained via this transformation log-KDEs. We derive expressions for the pointwise biases, variances, and mean-squared errors of the log-KDEs that are obtained via various kernel functions. Mean integrated squared error (MISE) and asymptotic MISE results are also provided and a plug-in rule for log-KDE bandwidths is derived. We demonstrate the log-KDEs methodology via our R package, logKDE. Real data case studies are provided to demonstrate the log-KDE approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aggarwal, C.C.: Data Mining. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8

    Book  MATH  Google Scholar 

  2. Amemiya, T.: Introduction to Statistics and Econometrics. Harvard University Press, Cambridge (1994)

    Google Scholar 

  3. Chambers, J.M., Cleveland, W.S., Kleiner, B., Tukey, P.A.: Graphical Methods for Data Analysis. Wadsworth, Belmont (1983)

    MATH  Google Scholar 

  4. Charpentier, A., Flachaire, E.: Log-transform kernel density estimation of income distribution. L’Actualite Economique 91, 141–159 (2015)

    Google Scholar 

  5. DasGupta, A.: Asymptotic Theory Of Statistics And Probability. Springer, New York (2008). https://doi.org/10.1007/978-0-387-75971-5

  6. Hirukawa, M., Sakudo, M.: Nonnegative bias reduction methods for density estimation using asymmetric kernels. Comput. Stat. Data Anal. 75, 112–123 (2014)

    Article  MathSciNet  Google Scholar 

  7. Igarashi, G.: Weighted log-normal kernel density estimation. Commun. Stat. - Theory Methods 45, 6670–6687 (2016)

    Article  MathSciNet  Google Scholar 

  8. Igarashi, G., Kakizawa, Y.: Bias corrections for some asymmetric kernel estimators. J. Stat. Plan. Inference 159, 37–63 (2015)

    Article  MathSciNet  Google Scholar 

  9. Jin, X., Kawczak, J.: Birnbaum-Saunders and lognormal kernel estimators for modelling durations in high frequency financial data. Ann. Econ. Financ. 4, 103–124 (2003)

    Google Scholar 

  10. Jones, A.T., Nguyen, H.D., McLachlan, G.J.: logKDE: log-transformed kernel density estimation. J. Open Source Softw. 3, 870 (2018)

    Article  Google Scholar 

  11. Marron, J.S., Ruppert, D.: Transformations to reduce boundary bias in kernel density estimation. J. R. Stat. Soc. B 56, 653–671 (1994)

    MathSciNet  MATH  Google Scholar 

  12. Nguyen, H.D., Jones, A.T., McLachlan, G.J.: logKDE: computing log-transformed kernel density estimates for postive data (2018). cran.r-project.org/package=logKDE

  13. Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33, 1065–1076 (1962)

    Article  MathSciNet  Google Scholar 

  14. R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing (2016)

    Google Scholar 

  15. Rosenblatt, M.: Remarks on some nonparametric estimates of a density function. Ann. Math. Stat. 27, 832–835 (1956)

    Article  MathSciNet  Google Scholar 

  16. Scott, D.W., Terrell, G.R.: Biased and unbiased cross-validation in density estimation. J. Am. Stat. Assoc. 82(400), 1131–1146 (1987)

    Article  MathSciNet  Google Scholar 

  17. Sheather, S.J., Jones, M.C.: A reliable data-based bandwidth selection method for kernel density estimation. J. R. Stat. Soc. B 53, 683–690 (1991)

    MathSciNet  MATH  Google Scholar 

  18. Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)

    Book  Google Scholar 

  19. van der Vaart, A.: Asymptotic Statistics. Cambridge University Press, Cambridge (1998)

    Book  Google Scholar 

  20. Wand, M.P., Jones, M.C.: Kernel Smoothing. Springer, New York (1995)

    Book  Google Scholar 

  21. Wand, M.P., Marron, J.S., Ruppert, D.: Transformations in density estimation. J. Am. Stat. Assoc. 86, 343–353 (1991)

    Article  MathSciNet  Google Scholar 

  22. Wansouwé, W.E., Libengué, F.G., Kokonendji, C.C.: Conake: Continuous Associated Kernel Estimation (2015). CRAN.R-project.org/package=Conake

  23. Wansouwé, W.E., Some, S.M., Kokonendji, C.C.: Ake: an R package for discrete and continuous associated kernel estimations. R Journal 8, 258–276 (2016)

    Article  Google Scholar 

  24. Watnik, M.R.: Pay for play: are baseball salaries based on performance? J. Stat. Educ. 6, 1–5 (1998)

    Article  Google Scholar 

  25. Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Amsterdam (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hien D. Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nguyen, H.D., Jones, A.T., McLachlan, G.J. (2019). Positive Data Kernel Density Estimation via the LogKDE Package for R. In: Islam, R., et al. Data Mining. AusDM 2018. Communications in Computer and Information Science, vol 996. Springer, Singapore. https://doi.org/10.1007/978-981-13-6661-1_21

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-6661-1_21

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-6660-4

  • Online ISBN: 978-981-13-6661-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics