A Hybrid Missing Data Imputation Method for Constructing City Mobility Indices
An effective missing data imputation method is essential for data mining and knowledge discovery from a comprehensive database with missing values. This paper proposes a new hybrid imputation method to effectively deal with the missing data issue of the Mobility in Cities Database (MCD) to construct city mobility indices. The hybrid method integrates the advantages of decision trees and fuzzy clustering into an iterative algorithm for missing data imputation. Extensive experiments conducted on the MCD and three commonly used datasets demonstrate that the hybrid method outperforms other existing effective imputation methods. With the MCD’s missing values imputed by the hybrid method, and using factor analysis and principal component analysis, this paper constructs city mobility indices for 63 cities in the MCD based on the novel concept of city mobility supply and demand. The city mobility indices constructed under a hierarchical structure of mobility supply and demand indicators represent substantial city mobility knowledge discovered from mining the MCD. The proposed hybrid method represents a significant contribution to missing data imputation research.
KeywordsMissing data imputation City mobility index Factor analysis Principal component analysis Decision tree Iterative fuzzy clustering
- 11.UITP: Mobility in cities database. International Association of Public Transport, Brussels (2015)Google Scholar
- 12.Nikfalazar, S., Yeh, C.-H., Bedingfield, S., Khorshidi, H.A.: A new iterative fuzzy clustering algorithm for multiple imputation of missing data. In: IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1–6. IEEE, Naples (2017)Google Scholar
- 21.Wang, X., Li, A., Jiang, Z., Feng, H.: Missing value estimation for DNA microarray gene expression data by support vector regression imputation and orthogonal coding scheme. BMC Bioinform. 7(32), 1–10 (2006)Google Scholar
- 23.Hair, J.F., Black, W.C., Babin, B.J., Anderson, R.E.: Multivariate Data Analysis, 7th edn. Pearson Prentice Hall, Upper Saddle River (2014)Google Scholar
- 27.Tajik, P., Majdzadeh, R.: Constructing pragmatic socioeconomic status assessment tools to address health equality challenges. Int. J. Prev. Med. 5(1), 46–51 (2014)Google Scholar