An Efficient Classification Method of Uncertain Data with Sampling

  • Jinchao HuangEmail author
  • Yulin Li
  • Kaiyue Qi
  • Fangqi Li
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 516)


Current research on the classification for uncertain data mainly focuses on the structural changes of the classification algorithms. Existing methods have achieved encouraging results; however, they do not take an effective trade-off between accuracy and running time, and they do not have good portability. This paper proposed a new framework to solve the classification problem of uncertain data from data processing point. The proposed algorithm represents the distribution of raw data by a sampling method, which means that the uncertain data are converted into determined data. The proposed framework is suitable for all classifiers, and then, XGBoost is adopted as a specific classifier in this paper. The experimental results show that the proposed method is an effective way of handling the classification problem for uncertain data.


Classification Uncertain data Sampling XGBoost 



This research work is funded by the National Key Research and Development Project of China (2016YFB0801003).


  1. 1.
    Bi, J., Zhang, T.: Support vector classification with input data uncertainty. In: Proceedings of neural information processing systems, vol 17; 2004. p. 161–8.Google Scholar
  2. 2.
    Chen T, Guestrin C. Xgboost: a scalable tree boosting system; 2016. p. 785–94. (2016)Google Scholar
  3. 3.
    Dheeru D, Karra Taniskidou E. UCI machine learning repository; 2017. URL
  4. 4.
    Domingos P, Hulten G. Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining; 2000. p. 71–80.Google Scholar
  5. 5.
    Duda RO, Hart PE. Pattern classification and scene analysis. Hoboken: Wiley; 1973.zbMATHGoogle Scholar
  6. 6.
    He J, Zhang Y, Li X, Wang Y. Bayesian classifiers for positive unlabeled learning. Berlin, Heidelberg: Springer; 2011.CrossRefGoogle Scholar
  7. 7.
    Peterson L. K-nearest neighbor. Scholarpedia. 2009;4(2):1883.CrossRefGoogle Scholar
  8. 8.
    Qin B, Xia Y, Li F. DTU: a decision tree for uncertain data. In: Advances in knowledge discovery and data mining, Pacific-Asia conference, PAKDD 2009, Bangkok, Thailand, April 27–30, 2009, Proceedings; 2009. p. 4–15.Google Scholar
  9. 9.
    Qin B, Xia Y, Wang S, Du X. A novel bayesian classification for uncertain data. Knowl-Based Syst. 2011;24(8):1151–8.CrossRefGoogle Scholar
  10. 10.
    Quinlan JR. Induction on decision tree. Mach Learn. 1986;1(1):81–106.Google Scholar
  11. 11.
    Ren J, Lee, SD, Chen X, Kao B, Cheng R, Cheung D. Naive bayes classification of uncertain data. In: IEEE international conference on data mining; p. 944–9.Google Scholar
  12. 12.
    Tsang S, Kao B, Yip KY, Ho WS, Lee SD. Decision trees for uncertain data. IEEE Trans Knowl Data Eng. 2011;23(1):64–78.CrossRefGoogle Scholar
  13. 13.
    Vapnik VN. The nature of statistical learning theory. Technometrics. 1997;8(6):1564.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.School of Cyber SecurityShanghai Jiao Tong UniversityShanghaiChina
  2. 2.School of Computer EngineeringUniversity of Illinois at Urbana-ChampaignChampaignUSA
  3. 3.School of Electronic Information and Electrical EngineeringShanghai Jiao Tong UniversityShanghaiChina

Personalised recommendations