The Effectiveness of a Probabilistic Principal Component Analysis Model and Expectation Maximisation Algorithm in Treating Missing Daily Rainfall Data

  • Zun Liang ChuanEmail author
  • Sayang Mohd Deni
  • Soo-Fen Fam
  • Noriszura Ismail
Original Article


The reliability and accuracy of a risk assessment of extreme hydro-meteorological events are highly dependent on the quality of the historical rainfall time series data. However, missing data in a time series such as this could result in lower quality data. Therefore, this paper proposes a multiple-imputation algorithm for treating missing data without requiring information from adjoining monitoring stations. The proposed imputation algorithms are based on the M-component probabilistic principal component analysis model and an expectation maximisation algorithm (MPPCA-EM). In order to evaluate the effectiveness of the MPPCA-EM imputation algorithm, six distinct historical daily rainfall time series data were recorded from six monitoring stations. These stations were located at the coastal and inland regions of the East-Coast Economic Region (ECER) Malaysia. The results of analysis show that, when it comes to treating missing historical daily rainfall time series data recorded from coastal monitoring stations, the 2-component probabilistic principal component analysis model and expectation-maximisation algorithm (2PPCA-EM) were found to be superior to the single- and multiple-imputation algorithms proposed in previous studies. On the contrary, the single-imputation algorithms as proposed in previous studies were superior to the MPPCA-EM imputation algorithms when treating missing historical daily rainfall time series data recorded from inland monitoring stations.


Expectation maximization algorithms Missing daily rainfall Probabilistic principal component analysis model VIKOR technique 



The authors would like to thank the Department of Irrigation and Drainage (DID), Malaysia, for providing the historical rainfall time series data in this study. The authors also acknowledge This appreciation also extended to Ministry of Education Malaysia and Universiti Malaysia Pahang (UMP) for providing the FRGS grant RDU190134, flagship research grant RDU150393, and the internal research grant RDU1703184 to conduct this study.


  1. Agilan, V., Umamahesh, N.V.: Is the covariate based non-stationary rainfall IDF curve capable of encompassing future rainfall changes? J. Hydrol. 541(B), 1441–1455 (2016)CrossRefGoogle Scholar
  2. Burhanuddin, S.N.Z.A., Deni, S.N., Ramli, N.M.: Imputation of missing rainfall data using revised normal ratio method. Adv. Sci. Lett. 23(11), 10981–10985 (2017a)CrossRefGoogle Scholar
  3. Burhanuddin, S.N.Z.A., Deni, S.N., Ramli, N.M.: Normal ratio in multiple imputation based on bootstrapped sample for rainfall data with missingness. International Journal of GEOMATE. 13(36), 131–137 (2017b)Google Scholar
  4. Cai, W., Santoso, A., Wang, G., Weller, E., Wu, L., Ashok, K., Masumoto, Y., Yamagata, T.: Increased frequency of extreme Indian Ocean dipole events due to greenhouse warming. Nature. 510(7504), 254–258 (2014)CrossRefGoogle Scholar
  5. Chuan, Z.L., Senawi, A., Yusoff, W.N.S.W., Ismail, N., Ken, T.L., Chuan, M.W.: Identifying the ideal number of Q-component of the Bayesian principal component analysis model for missing precipitation data treatment. IJET. 7(4.30), 5–10 (2018a)CrossRefGoogle Scholar
  6. Chuan, Z.L., Ismail, N., Shinyie, W.L., Ken, T.L., Fam, S.-L., Senawi, A., Yusoff, W.N.S.W.: W.N.S.W.: The efficiency of average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling in identifying homogeneous precipitation catchments. IOP Conf. Ser.:Mater. Sci. Eng. 342(1), 012070 (2018b). CrossRefGoogle Scholar
  7. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. JRSS B. 39(1), 1–38 (1977)Google Scholar
  8. Mardani, A., Zavadskas, E.K., Govindan, K., Senin, A.A., Jusoh, A.: VIKOR technique: a systematic review of the state of the art literature on methodologies and applications. Sustainability. 8(1), 1–38 (2016)CrossRefGoogle Scholar
  9. Masseran, N., Razali, A.M., Ibrahim, K.: Application of single imputation method to estimate missing wind speed data in Malaysia. Res. J. Appl. Sci. Eng. Technol. 6(10), 1780–1784 (2013)CrossRefGoogle Scholar
  10. Mondal, W.I.: An analysis of the industrial development potential of Malaysia: a shift-share approach. JBER. 7(5), 41–46 (2009)Google Scholar
  11. Opricovic, S.: Multicriteria Optimization of Civil Engineering Systems. University of Belgrade, Serbia (1998)Google Scholar
  12. Saeed, G.A.A., Chuan, Z.L., Zakaria, R., Yusoff, W.N.S.W., Salleh, M.Z.: Determination of the best single imputation algorithm for missing rainfall data treatment. JQMA. 12(1–2), 79–87 (2016)Google Scholar
  13. Simanton, J.R., Osborn, H.B.: Reciprocal-distance estimate of point rainfall. J. Hydraul. Eng. 106, 1242–1246 (1980)Google Scholar
  14. Suhaila, J., Sayang, M.D., Jemain, A.A.: Revised spatial weighting methods for estimation of missing rainfall data. Asia-Pac. J. Atmos. Sci. 44(2), 93–104 (2008)Google Scholar
  15. Tabios, G., Salas, J.D.: A comparative analysis of techniques for spatial interpolation of precipitation. Water Resour. Bull. 21(3), 365–380 (1985)CrossRefGoogle Scholar
  16. Tang, W.Y., Kassim, A.H.M., Abubakar, S.H.: Comparative studies of various missing data treatment methods-Malaysia experience. Atmos. Res. 42(1–4), 247–262 (1996)CrossRefGoogle Scholar
  17. Tangang, F.T., Juneng, L., Salimun, E., Sei, K.M., Le, L.J., Muhammad, H.: Climate change and variability over Malaysia: gaps in science and research information. Sains Malaysiana. 41(11), 1355–1366 (2012)Google Scholar
  18. Teegavarapu, R.S.V., Chandramouli, V.: Improved weighting methods, deterministic and stochastic data driven models for estimation of missing precipitation records. J. Hydrol. 312(1–4), 191–206 (2005)CrossRefGoogle Scholar
  19. Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. JRSS B. 61(3), 611–622 (1999)CrossRefGoogle Scholar
  20. Villafuerte, M.Q., Matsumoto, J.: Significant influences of global mean temperature and ENSO on extreme rainfall in Southeast Asia. J. Clim. 28(5), 1905–1919 (2015)CrossRefGoogle Scholar
  21. Yu, L., Snapp, R.R., Ruiz, T., Radermacher, M.: Probabilistic principal component analysis with expectation maximization (PPCA-EM) facilitates volume classification and estimates the missing data. J. Struct. Biol. 171(1), 18–30 (2010)CrossRefGoogle Scholar

Copyright information

© Korean Meteorological Society and Springer Nature B.V. 2019

Authors and Affiliations

  1. 1.Faculty of Industrial Sciences and TechnologyUniversiti Malaysia PahangPahang DMMalaysia
  2. 2.Faculty of Computer and Mathematical SciencesUniversiti Teknologi MARAShah AlamMalaysia
  3. 3.Faculty of Technology Management and TechnopreneurshipUniversiti Teknikal Malaysia MelakaMelakaMalaysia
  4. 4.School of Mathematical Sciences, Faculty Science and TechnologyUniversiti Kebangsaan MalaysiaBangiMalaysia

Personalised recommendations