Multiple-Side Multiple-Learner for Incomplete Data Classification
Selective classifier can improve classification accuracy and algorithm efficiency by removing the irrelevant attributes of data. However, most of them deal with complete data. Actual datasets are often incomplete due to various reasons. Incomplete dataset also have some irrelevant attributes which have a negative effect on the algorithm performance. By analyzing main classification methods of incomplete data, this paper proposes a Multiple-side Multiple-learner algorithm for incomplete data (MSML). MSML first obtains a feature subset of the original incomplete dataset based on the chi-square statistic. And then, according to the missing attribute values of the selected feature subset, MSML obtains a group of data subsets. Each data subset was used to train a sub classifier based on bagging algorithm. Finally, the results of different sub classifiers were combined by weighted majority voting. Experimental results on UCI incomplete datasets show that MSML can effectively reduce the number of attributes, and thus improve the algorithm execution efficiency. At the same time, it can improve the classification accuracy and algorithm stability too.
KeywordsIncomplete data Multiple-side Feature subset Multiple-learner
This work was supported by National Natural Science Foundation of China (Nos.61175046 and 61203290).
- 8.Roderick L., J A, Rubin, D.B.: Statistical Analysis with Missing Data, vol. 43, no. 4, pp. 364–365. Wiley, New York (2002)Google Scholar
- 11.Russell, S., Binder, J., Koller, D., Kanazawa, K.: Local learning in probabilistic networks with hidden variables. In: Proceedings of IJCAI 1995, pp. 1146–1152 (1995)Google Scholar
- 15.Krause, S., Polikar, R.: An ensemble of classifiers approach for the missing feature problem. In: IEEE Proceedings of the International Joint Conference on Neural Networks, vol. 1, pp. 553–558 (2003)Google Scholar
- 16.Chen, H., Du, Y., Jiang, K.: Classification of incomplete data using classifier ensembles. In: IEEE International Conference on Systems and Informatics. pp. 2229–2232 (2012)Google Scholar
- 17.Yan, Y.-T., Zhang, Y.-P., Zhang, Y.-W.: Multi-granulation ensemble classification for incomplete data. In: Miao, D., Pedrycz, W., Slezak, D., Peters, G., Hu, Q., Wang, R. (eds.) RSKT 2014. LNCS, vol. 8818, pp. 343–351. Springer, Heidelberg (2014) Google Scholar
- 18.UCI Repository of machine learning databases for classification. http://archive.ics.uci.edu/ml/datasets.html
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.