An Efficient Classification Method of Uncertain Data with Sampling
Current research on the classification for uncertain data mainly focuses on the structural changes of the classification algorithms. Existing methods have achieved encouraging results; however, they do not take an effective trade-off between accuracy and running time, and they do not have good portability. This paper proposed a new framework to solve the classification problem of uncertain data from data processing point. The proposed algorithm represents the distribution of raw data by a sampling method, which means that the uncertain data are converted into determined data. The proposed framework is suitable for all classifiers, and then, XGBoost is adopted as a specific classifier in this paper. The experimental results show that the proposed method is an effective way of handling the classification problem for uncertain data.
KeywordsClassification Uncertain data Sampling XGBoost
This research work is funded by the National Key Research and Development Project of China (2016YFB0801003).
- 1.Bi, J., Zhang, T.: Support vector classification with input data uncertainty. In: Proceedings of neural information processing systems, vol 17; 2004. p. 161–8.Google Scholar
- 2.Chen T, Guestrin C. Xgboost: a scalable tree boosting system; 2016. p. 785–94. (2016)Google Scholar
- 3.Dheeru D, Karra Taniskidou E. UCI machine learning repository; 2017. URL http://archive.ics.uci.edu/ml
- 4.Domingos P, Hulten G. Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining; 2000. p. 71–80.Google Scholar
- 8.Qin B, Xia Y, Li F. DTU: a decision tree for uncertain data. In: Advances in knowledge discovery and data mining, Pacific-Asia conference, PAKDD 2009, Bangkok, Thailand, April 27–30, 2009, Proceedings; 2009. p. 4–15.Google Scholar
- 10.Quinlan JR. Induction on decision tree. Mach Learn. 1986;1(1):81–106.Google Scholar
- 11.Ren J, Lee, SD, Chen X, Kao B, Cheng R, Cheung D. Naive bayes classification of uncertain data. In: IEEE international conference on data mining; p. 944–9.Google Scholar
- 13.Vapnik VN. The nature of statistical learning theory. Technometrics. 1997;8(6):1564.Google Scholar