Abstract
It is a significant challenge to discover knowledge from noise data. Most of previous works have focused on the data cleansing and the correction for the benefit of the subsequent mining process. When the training data contains noise, the classification accuracy was being affected dramatically. In this paper, we present a novel classification algorithm named ESC (Error-Sensitive Classification) to cover this problem. We materialize our main idea by constructing Attribute-Decision tree and measuring correlation among attributes. Experimental results show that our algorithm has ability to significantly improve the quality of data mining results.
The work was supported by the National Natural Science Foundation of China (Grant No. 60775037), and the Natural Science Foundation of Anhui Province (Grant No. KJ2011Z321, KJ2012274), and the Nature Science Research of Anhui.(Grant No. 1208085MF 95), and the Key Natural Science Foundation of Hefei University, China(Grant No. 01KY03ZD).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Khoshgoftaar, T.M., Seliya, N.: The necessity of assuring quality in software measurement data. In: Pro. of 10th International Software Metrics Symposium, pp. 119–130 (2004)
Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explorations 6(1), 7–19 (2004)
Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study of their impacts. Artificial Intelligence Review 22(3-4), 177–210 (2004)
Orr, K.: Data quality and systems theory. CACM 41(2), 66–71 (1998)
Redman, T.: The impact of poor data quality on the typical enterprise. CACM 41(2), 79–82 (1998)
Redman, T.: Data Quality for the Information Age. Artech House (1996)
Pierce, D., Ackerman, L.: Data Aggregators: A Study of Data Quality and Responsiveness (May 2005), http://www.privacyactivism.org/docs/DataAggregatorsStudy.html
Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. Journal of Artificial Intelligence Research 11, 131–167 (1999)
Khoshgoftaar, T.M., Zhong, S., Joshi, V.: Enhancing software quality estimation using ensemble-classifier based noise filtering. Intelligent Data Analysis: An International Journal 9(1), 3–27 (2005)
Van Hulse, J.D., Khoshgoftaar, T.M., Huang, H.: The pairwise attribute noise detection algorithm. Knowledge and Information Systems Journal, Special Issue on Mining Low Quality Data 11(2), 171–190 (2007)
Guyon, I., Matic, N., Vapnik, V.: Discovering information patterns and data cleaning. In: Advances in Knowledge Discovery and Data Mining, pp. 181–203 (1996)
Gamberger, D., Lavrac, N., Groselj, C.: Experiments with noise filtering in a medical domain. In: Proc. of 16th ICML, pp. 143–151 (1999)
Zhu, X., Wu, X., Chen, S.: Eliminating class noise in large datasets. In: Proc. of the 20th ICML, pp. 920–927 (2003)
Teng, C.M.: Correcting noisy data. In: Proc. of the 16th International Conf. on Machine Learning, pp. 239–248 (1999)
Yang, Y., Wu, X., Zhu, X.: Dealing with Predictive-but-Unpredictable Attributes in Noisy Data Sources. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 471–483. Springer, Heidelberg (2004)
Yun, U., Ryu, K.H.: Approximate weighted frequent pattern mining with/without noisy environments. Knowledge-Based Systems 24, 73–82 (2011)
Khoshgoftaar, T.M.: Identifying Noise in an Attribute of Interest. In: Proc. of ICMLA, pp. 124–131 (2005)
Schwarm, S., Wolfman, S.: Cleaning data with Bayesian methods. Final project report for CSE574, University of Washington (2000)
Zhu, X., Wu, X., Yang, Y.: Error detection and impact-sensitive instance ranking in noisy datasets. In: Proc. AAAI, pp. 378–384 (2004)
Wu, X., Zhu, X.: Mining With Noise Knowledge: Error-Aware Data Mining. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 38(4), 917–932 (2008)
Tao, K.: A Novel Hybrid Data Mining Method Based on the RS and BP. In: Zhang, L., Lu, B.-L., Kwok, J. (eds.) ISNN 2010, Part II. LNCS, vol. 6064, pp. 346–352. Springer, Heidelberg (2010)
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann, San Mateo (1993)
Han, J., Kamber, M.: Data mining: concepts and techniques, 2nd edn., pp. 259–260. Elsevier Inc. (2006)
Blake, C., Merz, C.: UCI Repository of Machine Learning Databases (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, H., Zong, Y., Wang, K., Wu, B. (2012). A Novel Classification Algorithm to Noise Data. In: Tan, Y., Shi, Y., Ji, Z. (eds) Advances in Swarm Intelligence. ICSI 2012. Lecture Notes in Computer Science, vol 7332. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31020-1_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-31020-1_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31019-5
Online ISBN: 978-3-642-31020-1
eBook Packages: Computer ScienceComputer Science (R0)