Advertisement

Comparison of missing value estimation techniques in rainfall data of Bangladesh

  • Farzana Jahan
  • Narayan Chandra Sinha
  • Md. Mahfuzur Rahman
  • Md. Morshadur Rahman
  • Md. Sanaul Haque Mondal
  • M. Ataharul Islam
Original Paper

Abstract

The presence of missing values in daily rainfall data may hamper the analyses to determine effective results for solving problems of hydrological, agricultural, and climatological issues. The study attempts to select an appropriate method for estimating the missing value of daily rainfall data of Bangladesh. For this purpose, eight methods and seven comparison techniques are employed. For imputation of missing values employing these methods, three sets of daily rainfall data (1, 5, and 10% missing values) with 1000 repetitions are considered randomly for five regions of the country. These samples are artificially created as missing and then imputation for these missing values is made applying the selected methods. The relative performance of the methods are examined using some comparison criteria. The following observations can be made from the study regarding the choice of the appropriate missing value estimation technique: for imputation of the missing values of daily rainfall data, the arithmetic average method for rainfall stations Chittagong and Rajshahi in the south-east region and the north-west region, respectively, is found as the best methods. Further, the single best estimator method for rainfall stations Sylhet and Dhaka in the north-east region and the mid-region, respectively, and the EM-MCMC method for rainfall station Khulna of the south-east region are also identified as the best methods in respect of Kolmogorov-Smirnov test, the lowest bias of estimate, the value of S index, etc.

Notes

Acknowledgements

This study is supported under the HEQEP sub-project, CP-3293, in the Department of Applied Statistics, East West University funded by World Bank and implemented by University Grants Commission of Bangladesh (UGC). The authors are also grateful to Bangladesh Meteorological Department (BMD) for providing the data. We acknowledge the critical comments from anonymous reviewers and editor.

References

  1. Ahrens B (2006) Distance in spatial interpolation of daily rain gauge data. Hydrol Earth Syst Sci 10:197–208CrossRefGoogle Scholar
  2. Asati SR (2012) Analysis of rainfall data for drought investigation at Agra U. P. Int J Life Sci Biotechnol Pharm Res 1(4):81–86Google Scholar
  3. Bangladesh Economic Review (2016) Economic adviser’s wing, finance division, Ministry of Finance, Government of the People’s Republic of BangladeshGoogle Scholar
  4. Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)?—arguments against avoiding RMSE in the literature. Geosci Model Dev 7(3):1247–1250CrossRefGoogle Scholar
  5. Chen FW, Liu CW (2012) Estimation of the spatial rainfall distribution using inverse distance weighting (IDW) in the middle of Taiwan. Paddy Water Environ 10(3):209–222CrossRefGoogle Scholar
  6. Chowdhury MRK (2013) Country report: Bangladesh meteorological department (BMD), People’s republic of BangladeshGoogle Scholar
  7. Collins LM, Schafer JL, Kam CM (2001) A comparison of inclusive and restrictive strategies in modern missing-data procedures. Psychol Methods 6:330–351CrossRefGoogle Scholar
  8. Cong RG, Brady M (2012) The interdependence between rainfall and temperature: copula analyses. Sci World J 2012:1–11CrossRefGoogle Scholar
  9. Coulibaly P, Evora ND (2007) Comparison of neural network methods for infilling missing daily weather records. J Hydrol 341:27–41CrossRefGoogle Scholar
  10. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38Google Scholar
  11. Dumedah G, Coulibaly P (2011) Evaluation of statistical methods for infilling missing values in high-resolution soil moisture data. J Hydrol 400(1–2):95–102CrossRefGoogle Scholar
  12. Eischeid JK, Baker CB, Karl TR, Diaz HGF (1995) The quality control of long-term climatological data using objective data analysis. J Appl Meteorol 34:2787–2795CrossRefGoogle Scholar
  13. Eischeid JK, Pasteris PA, Diaz HF, Plantico MS, Lott NJ (2000) Creating a serially complete, national daily time series of temperature and precipitation for the western United States. J Appl Meteorol 39(9):1580–1591CrossRefGoogle Scholar
  14. Ferrari GT, Ozaki V (2014) Missing data imputation of climate datasets: implications to modeling extreme drought events. Rev Bras Meteorol 29(1):21–28CrossRefGoogle Scholar
  15. Garcia B, Sentelhas P, Tapia L, Sparovek G (2006) Filling in missing rainfall data in the Andes region of Venezuela, based on a cluster analysis approach. Rev Bras Agrometeorol 14(2):225–233Google Scholar
  16. Garcia M, Peters-Lidard CD, Goodrich DC (2008) Spatial interpolation in a dense gauge network for monsoon storm events in the southwestern United States. Water Resour Res 44:W05S13.  https://doi.org/10.1029/2006WR005788 CrossRefGoogle Scholar
  17. Goodison B, Louie PYT, Yang D (1998) WMO solid precipitation measurement inter comparison. Final reportGoogle Scholar
  18. Graham JW, Hofer SM, Donaldson SI, MacKinnon DP, Schafer JL (1997) Analysis with missing data in prevention research. The science of prevention: methodological advances from alcohol and substance abuse research, 1, pp 325-366Google Scholar
  19. Hubbard KG (1994) Spatial variability of daily weather variables in the high plains of the USA. Agric For Meteorol 68:29–41CrossRefGoogle Scholar
  20. Kemp WP, Burnell DG, Everson DO, Thomson AJ (1983) Estimating missing daily maximum and minimum temperatures. J Climate Appl 22:1587–1593CrossRefGoogle Scholar
  21. Kripalani RH, Inamdar S, Sontakke NA (1996) Rainfall variability over Bangladesh and Nepal: comparison and connections with features over India. Int J Climatol 16(6):689–703CrossRefGoogle Scholar
  22. Lam NSN (1983) Spatial interpolation methods : a review. Am Cartographer 10(2):129–149CrossRefGoogle Scholar
  23. Lennon JJ, Turner JRG (1995) Predicting the spatial distribution of climate: temperature in Great Britain. J Anim Ecol 64:370–392CrossRefGoogle Scholar
  24. Li X, Z Zhao (2001) Measures of performance for evaluation of estimators and filters. Proc. 2001 SPIE Conf. on Signal and Data Processing, (July–August), pp 1–12Google Scholar
  25. Little JRA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New YorkGoogle Scholar
  26. Lo Presti R, Barca E, Passarella G (2010) A methodology for treating missing data applied to daily rainfall data in the Candelaro River Basin (Italy). Environ Monit Assess 160:1–22CrossRefGoogle Scholar
  27. Massey FJ (1951) The Kolmogorov-Smirnov test for goodness of fit. JASA 46(253):68–78CrossRefGoogle Scholar
  28. National Hurricane Center of USA n.d. http://www.nhc.noaa.gov/gccalc.shtml
  29. Paulhus JLH, Kohler MA (1952) Interpolation of missing precipitation records. Mon Weather Rev 80(8):129–133CrossRefGoogle Scholar
  30. Rashid H-e (1991) Geography of Bangladesh (2nd edition). In: Dhaka University Press Limited, DhakaGoogle Scholar
  31. Rubel F, Hantel M (1999) Correction of daily gauge measurements in the Baltic Sea drainage basin. Nord Hydrol 30:191–208CrossRefGoogle Scholar
  32. Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592CrossRefGoogle Scholar
  33. Rubin DB (1978) Multiple imputation in sample surveys—a phenomenological Bayesian approach to nonresponse. Proceedings of the Survey Research Methods Section, ASA, pp 20–34Google Scholar
  34. Rubin DB (1987) Multiple imputation for non-response in surveys. Wiley, New YorkCrossRefGoogle Scholar
  35. Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall, LondonCrossRefGoogle Scholar
  36. Scheffer J (2002) Dealing with missing data. Res Lett Inf Math Sci 3:53–160Google Scholar
  37. Shepard D (1968) A two-dimensional interpolation functions for irregularly spaced data. Proceeding of the Twenty-Third National Conference of the ACM, Washington, DC, pp 517–524Google Scholar
  38. Silva RP, Dayawansa NDK, Ratnasiri MD (2007) A comparison of methods used in estimating missing rainfall data. J Agric Sci 3(May):101–108Google Scholar
  39. Simanton JR, Osborn HB (1980) Reciprocal-distance estimate of point rainfall. J Hydraul Eng 106:1242–1246Google Scholar
  40. Simolo C, Brunetti M, Maugeri M, Nanni T (2010) Improving estimation of missing values in daily precipitation series by a probability density function-preserving approach. Int J Climatol 30:1564–1576Google Scholar
  41. Suhalia J, Sayang MD, Jemain AA (2008) Revised spatial weighting methods for estimation of missing rainfall data. Asia-Pac J Atmos Sci 44(2):93–104Google Scholar
  42. Tabios GQ, Salas JD (1985) A comparative analysis of techniques for spatial interpolation of precipitation. Water Resour Bull 21:365–380CrossRefGoogle Scholar
  43. Tabony RC (1983) The estimation of missing climatological data. J Climatol 3:297–314CrossRefGoogle Scholar
  44. Tang WY, Kassim AHM, Abubakar SH (1996) Comparative studies of various missing data treatment methods-Malaysian experience. Atmos Res 42:247–262CrossRefGoogle Scholar
  45. Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation. JASA 82(398):528–540CrossRefGoogle Scholar
  46. Teegavarapu RSV, Chandramouli V (2005) Improved weighting methods, deterministic and stochastic data-driven models for estimation of missing precipitation records. J Hydrol 312:191–206CrossRefGoogle Scholar
  47. Tronci N, Molteni F, Bozzini M (1986) A comparison of local approximation methods for the analysis of meteorological data. Arch Meteorol Geophys Bioclimatol A 36:189–211CrossRefGoogle Scholar
  48. Walther BA, Moore JL (2005) The concept of bias, precison and accuracy, and their use in testing the performance of species richness estimators, with a literature review of estimators. Ecography 28:815–829CrossRefGoogle Scholar
  49. Wilks DS (1995) Statistical methods in the atmospheric sciences. Academic Press, New YorkGoogle Scholar
  50. Williams P (1998) Modelling seasonality and trends in daily rainfall data. Adv Neural Inf Proces Syst 10:985–991Google Scholar
  51. Wallis JR, Letten Mayer DP, Wood EF (1991) A daily hydro climatological data set for the continental United States. Water Resour Res 27:1657–1663CrossRefGoogle Scholar
  52. Wilmott CJ (1981) On the validation of models. Phys Geogr 2:194–194Google Scholar
  53. Xia Y, Fabian P, Stohl A, Winterhalter M (1999) Forest climatology: estimation of missing values for Bavaria, Germany. Agric For Meteorol 96:131–144CrossRefGoogle Scholar
  54. Yim C (2015) Imputing missing data with SAS. SAS Global Forum 2015, April 26–29, 2015, Dallas, pp 1–21Google Scholar
  55. Yozgatligil C, Aslan S, Iyigun C, Batmaz I (2013) Comparison of missing value imputation methods in time series: the case of Turkish meteorological data. Theor Appl Climatol 112(1–2):143–167CrossRefGoogle Scholar
  56. Young KC (1992) A three way model for interpolating monthly precipitation values. Mon Weather Rev 120:2561–2569CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Austria, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Mathematical Sciences, Science and Engineering FacultyQueensland University of TechnologyBrisbaneAustralia
  2. 2.Dhaka School of EconomicsDhakaBangladesh
  3. 3.Green Business SchoolGreen University of BangladeshDhakaBangladesh
  4. 4.Department of StatisticsUniversity of DhakaDhakaBangladesh
  5. 5.Tokyo Institute of TechnologyTokyoJapan
  6. 6.Institute of Statistical Research and Training (ISRT)University of DhakaDhakaBangladesh

Personalised recommendations