Comparison of missing value estimation techniques in rainfall data of Bangladesh

Abstract

The presence of missing values in daily rainfall data may hamper the analyses to determine effective results for solving problems of hydrological, agricultural, and climatological issues. The study attempts to select an appropriate method for estimating the missing value of daily rainfall data of Bangladesh. For this purpose, eight methods and seven comparison techniques are employed. For imputation of missing values employing these methods, three sets of daily rainfall data (1, 5, and 10% missing values) with 1000 repetitions are considered randomly for five regions of the country. These samples are artificially created as missing and then imputation for these missing values is made applying the selected methods. The relative performance of the methods are examined using some comparison criteria. The following observations can be made from the study regarding the choice of the appropriate missing value estimation technique: for imputation of the missing values of daily rainfall data, the arithmetic average method for rainfall stations Chittagong and Rajshahi in the south-east region and the north-west region, respectively, is found as the best methods. Further, the single best estimator method for rainfall stations Sylhet and Dhaka in the north-east region and the mid-region, respectively, and the EM-MCMC method for rainfall station Khulna of the south-east region are also identified as the best methods in respect of Kolmogorov-Smirnov test, the lowest bias of estimate, the value of S index, etc.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

References

  1. Ahrens B (2006) Distance in spatial interpolation of daily rain gauge data. Hydrol Earth Syst Sci 10:197–208

    Article  Google Scholar 

  2. Asati SR (2012) Analysis of rainfall data for drought investigation at Agra U. P. Int J Life Sci Biotechnol Pharm Res 1(4):81–86

    Google Scholar 

  3. Bangladesh Economic Review (2016) Economic adviser’s wing, finance division, Ministry of Finance, Government of the People’s Republic of Bangladesh

  4. Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)?—arguments against avoiding RMSE in the literature. Geosci Model Dev 7(3):1247–1250

    Article  Google Scholar 

  5. Chen FW, Liu CW (2012) Estimation of the spatial rainfall distribution using inverse distance weighting (IDW) in the middle of Taiwan. Paddy Water Environ 10(3):209–222

    Article  Google Scholar 

  6. Chowdhury MRK (2013) Country report: Bangladesh meteorological department (BMD), People’s republic of Bangladesh

  7. Collins LM, Schafer JL, Kam CM (2001) A comparison of inclusive and restrictive strategies in modern missing-data procedures. Psychol Methods 6:330–351

    Article  Google Scholar 

  8. Cong RG, Brady M (2012) The interdependence between rainfall and temperature: copula analyses. Sci World J 2012:1–11

    Article  Google Scholar 

  9. Coulibaly P, Evora ND (2007) Comparison of neural network methods for infilling missing daily weather records. J Hydrol 341:27–41

    Article  Google Scholar 

  10. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38

    Google Scholar 

  11. Dumedah G, Coulibaly P (2011) Evaluation of statistical methods for infilling missing values in high-resolution soil moisture data. J Hydrol 400(1–2):95–102

    Article  Google Scholar 

  12. Eischeid JK, Baker CB, Karl TR, Diaz HGF (1995) The quality control of long-term climatological data using objective data analysis. J Appl Meteorol 34:2787–2795

    Article  Google Scholar 

  13. Eischeid JK, Pasteris PA, Diaz HF, Plantico MS, Lott NJ (2000) Creating a serially complete, national daily time series of temperature and precipitation for the western United States. J Appl Meteorol 39(9):1580–1591

    Article  Google Scholar 

  14. Ferrari GT, Ozaki V (2014) Missing data imputation of climate datasets: implications to modeling extreme drought events. Rev Bras Meteorol 29(1):21–28

    Article  Google Scholar 

  15. Garcia B, Sentelhas P, Tapia L, Sparovek G (2006) Filling in missing rainfall data in the Andes region of Venezuela, based on a cluster analysis approach. Rev Bras Agrometeorol 14(2):225–233

    Google Scholar 

  16. Garcia M, Peters-Lidard CD, Goodrich DC (2008) Spatial interpolation in a dense gauge network for monsoon storm events in the southwestern United States. Water Resour Res 44:W05S13. https://doi.org/10.1029/2006WR005788

    Article  Google Scholar 

  17. Goodison B, Louie PYT, Yang D (1998) WMO solid precipitation measurement inter comparison. Final report

  18. Graham JW, Hofer SM, Donaldson SI, MacKinnon DP, Schafer JL (1997) Analysis with missing data in prevention research. The science of prevention: methodological advances from alcohol and substance abuse research, 1, pp 325-366

  19. Hubbard KG (1994) Spatial variability of daily weather variables in the high plains of the USA. Agric For Meteorol 68:29–41

    Article  Google Scholar 

  20. Kemp WP, Burnell DG, Everson DO, Thomson AJ (1983) Estimating missing daily maximum and minimum temperatures. J Climate Appl 22:1587–1593

    Article  Google Scholar 

  21. Kripalani RH, Inamdar S, Sontakke NA (1996) Rainfall variability over Bangladesh and Nepal: comparison and connections with features over India. Int J Climatol 16(6):689–703

    Article  Google Scholar 

  22. Lam NSN (1983) Spatial interpolation methods : a review. Am Cartographer 10(2):129–149

    Article  Google Scholar 

  23. Lennon JJ, Turner JRG (1995) Predicting the spatial distribution of climate: temperature in Great Britain. J Anim Ecol 64:370–392

    Article  Google Scholar 

  24. Li X, Z Zhao (2001) Measures of performance for evaluation of estimators and filters. Proc. 2001 SPIE Conf. on Signal and Data Processing, (July–August), pp 1–12

  25. Little JRA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York

    Google Scholar 

  26. Lo Presti R, Barca E, Passarella G (2010) A methodology for treating missing data applied to daily rainfall data in the Candelaro River Basin (Italy). Environ Monit Assess 160:1–22

    Article  Google Scholar 

  27. Massey FJ (1951) The Kolmogorov-Smirnov test for goodness of fit. JASA 46(253):68–78

    Article  Google Scholar 

  28. National Hurricane Center of USA n.d. http://www.nhc.noaa.gov/gccalc.shtml

  29. Paulhus JLH, Kohler MA (1952) Interpolation of missing precipitation records. Mon Weather Rev 80(8):129–133

    Article  Google Scholar 

  30. Rashid H-e (1991) Geography of Bangladesh (2nd edition). In: Dhaka University Press Limited, Dhaka

  31. Rubel F, Hantel M (1999) Correction of daily gauge measurements in the Baltic Sea drainage basin. Nord Hydrol 30:191–208

    Article  Google Scholar 

  32. Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592

    Article  Google Scholar 

  33. Rubin DB (1978) Multiple imputation in sample surveys—a phenomenological Bayesian approach to nonresponse. Proceedings of the Survey Research Methods Section, ASA, pp 20–34

  34. Rubin DB (1987) Multiple imputation for non-response in surveys. Wiley, New York

    Google Scholar 

  35. Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall, London

    Google Scholar 

  36. Scheffer J (2002) Dealing with missing data. Res Lett Inf Math Sci 3:53–160

    Google Scholar 

  37. Shepard D (1968) A two-dimensional interpolation functions for irregularly spaced data. Proceeding of the Twenty-Third National Conference of the ACM, Washington, DC, pp 517–524

    Google Scholar 

  38. Silva RP, Dayawansa NDK, Ratnasiri MD (2007) A comparison of methods used in estimating missing rainfall data. J Agric Sci 3(May):101–108

    Google Scholar 

  39. Simanton JR, Osborn HB (1980) Reciprocal-distance estimate of point rainfall. J Hydraul Eng 106:1242–1246

    Google Scholar 

  40. Simolo C, Brunetti M, Maugeri M, Nanni T (2010) Improving estimation of missing values in daily precipitation series by a probability density function-preserving approach. Int J Climatol 30:1564–1576

    Google Scholar 

  41. Suhalia J, Sayang MD, Jemain AA (2008) Revised spatial weighting methods for estimation of missing rainfall data. Asia-Pac J Atmos Sci 44(2):93–104

    Google Scholar 

  42. Tabios GQ, Salas JD (1985) A comparative analysis of techniques for spatial interpolation of precipitation. Water Resour Bull 21:365–380

    Article  Google Scholar 

  43. Tabony RC (1983) The estimation of missing climatological data. J Climatol 3:297–314

    Article  Google Scholar 

  44. Tang WY, Kassim AHM, Abubakar SH (1996) Comparative studies of various missing data treatment methods-Malaysian experience. Atmos Res 42:247–262

    Article  Google Scholar 

  45. Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation. JASA 82(398):528–540

    Article  Google Scholar 

  46. Teegavarapu RSV, Chandramouli V (2005) Improved weighting methods, deterministic and stochastic data-driven models for estimation of missing precipitation records. J Hydrol 312:191–206

    Article  Google Scholar 

  47. Tronci N, Molteni F, Bozzini M (1986) A comparison of local approximation methods for the analysis of meteorological data. Arch Meteorol Geophys Bioclimatol A 36:189–211

    Article  Google Scholar 

  48. Walther BA, Moore JL (2005) The concept of bias, precison and accuracy, and their use in testing the performance of species richness estimators, with a literature review of estimators. Ecography 28:815–829

    Article  Google Scholar 

  49. Wilks DS (1995) Statistical methods in the atmospheric sciences. Academic Press, New York

    Google Scholar 

  50. Williams P (1998) Modelling seasonality and trends in daily rainfall data. Adv Neural Inf Proces Syst 10:985–991

    Google Scholar 

  51. Wallis JR, Letten Mayer DP, Wood EF (1991) A daily hydro climatological data set for the continental United States. Water Resour Res 27:1657–1663

    Article  Google Scholar 

  52. Wilmott CJ (1981) On the validation of models. Phys Geogr 2:194–194

    Article  Google Scholar 

  53. Xia Y, Fabian P, Stohl A, Winterhalter M (1999) Forest climatology: estimation of missing values for Bavaria, Germany. Agric For Meteorol 96:131–144

    Article  Google Scholar 

  54. Yim C (2015) Imputing missing data with SAS. SAS Global Forum 2015, April 26–29, 2015, Dallas, pp 1–21

  55. Yozgatligil C, Aslan S, Iyigun C, Batmaz I (2013) Comparison of missing value imputation methods in time series: the case of Turkish meteorological data. Theor Appl Climatol 112(1–2):143–167

    Article  Google Scholar 

  56. Young KC (1992) A three way model for interpolating monthly precipitation values. Mon Weather Rev 120:2561–2569

    Article  Google Scholar 

Download references

Acknowledgements

This study is supported under the HEQEP sub-project, CP-3293, in the Department of Applied Statistics, East West University funded by World Bank and implemented by University Grants Commission of Bangladesh (UGC). The authors are also grateful to Bangladesh Meteorological Department (BMD) for providing the data. We acknowledge the critical comments from anonymous reviewers and editor.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Farzana Jahan.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jahan, F., Sinha, N.C., Rahman, M.M. et al. Comparison of missing value estimation techniques in rainfall data of Bangladesh. Theor Appl Climatol 136, 1115–1131 (2019). https://doi.org/10.1007/s00704-018-2537-y

Download citation