The Effectiveness of a Probabilistic Principal Component Analysis Model and Expectation Maximisation Algorithm in Treating Missing Daily Rainfall Data
The reliability and accuracy of a risk assessment of extreme hydro-meteorological events are highly dependent on the quality of the historical rainfall time series data. However, missing data in a time series such as this could result in lower quality data. Therefore, this paper proposes a multiple-imputation algorithm for treating missing data without requiring information from adjoining monitoring stations. The proposed imputation algorithms are based on the M-component probabilistic principal component analysis model and an expectation maximisation algorithm (MPPCA-EM). In order to evaluate the effectiveness of the MPPCA-EM imputation algorithm, six distinct historical daily rainfall time series data were recorded from six monitoring stations. These stations were located at the coastal and inland regions of the East-Coast Economic Region (ECER) Malaysia. The results of analysis show that, when it comes to treating missing historical daily rainfall time series data recorded from coastal monitoring stations, the 2-component probabilistic principal component analysis model and expectation-maximisation algorithm (2PPCA-EM) were found to be superior to the single- and multiple-imputation algorithms proposed in previous studies. On the contrary, the single-imputation algorithms as proposed in previous studies were superior to the MPPCA-EM imputation algorithms when treating missing historical daily rainfall time series data recorded from inland monitoring stations.
KeywordsExpectation maximization algorithms Missing daily rainfall Probabilistic principal component analysis model VIKOR technique
The authors would like to thank the Department of Irrigation and Drainage (DID), Malaysia, for providing the historical rainfall time series data in this study. The authors also acknowledge This appreciation also extended to Ministry of Education Malaysia and Universiti Malaysia Pahang (UMP) for providing the FRGS grant RDU190134, flagship research grant RDU150393, and the internal research grant RDU1703184 to conduct this study.
- Burhanuddin, S.N.Z.A., Deni, S.N., Ramli, N.M.: Normal ratio in multiple imputation based on bootstrapped sample for rainfall data with missingness. International Journal of GEOMATE. 13(36), 131–137 (2017b)Google Scholar
- Chuan, Z.L., Ismail, N., Shinyie, W.L., Ken, T.L., Fam, S.-L., Senawi, A., Yusoff, W.N.S.W.: W.N.S.W.: The efficiency of average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling in identifying homogeneous precipitation catchments. IOP Conf. Ser.:Mater. Sci. Eng. 342(1), 012070 (2018b). https://doi.org/10.1088/1757-899X/342/1/012070. CrossRefGoogle Scholar
- Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. JRSS B. 39(1), 1–38 (1977)Google Scholar
- Mondal, W.I.: An analysis of the industrial development potential of Malaysia: a shift-share approach. JBER. 7(5), 41–46 (2009)Google Scholar
- Opricovic, S.: Multicriteria Optimization of Civil Engineering Systems. University of Belgrade, Serbia (1998)Google Scholar
- Saeed, G.A.A., Chuan, Z.L., Zakaria, R., Yusoff, W.N.S.W., Salleh, M.Z.: Determination of the best single imputation algorithm for missing rainfall data treatment. JQMA. 12(1–2), 79–87 (2016)Google Scholar
- Simanton, J.R., Osborn, H.B.: Reciprocal-distance estimate of point rainfall. J. Hydraul. Eng. 106, 1242–1246 (1980)Google Scholar
- Suhaila, J., Sayang, M.D., Jemain, A.A.: Revised spatial weighting methods for estimation of missing rainfall data. Asia-Pac. J. Atmos. Sci. 44(2), 93–104 (2008)Google Scholar
- Tangang, F.T., Juneng, L., Salimun, E., Sei, K.M., Le, L.J., Muhammad, H.: Climate change and variability over Malaysia: gaps in science and research information. Sains Malaysiana. 41(11), 1355–1366 (2012)Google Scholar