Skip to main content

Dealing with Noisy Data

  • Chapter
  • First Online:
Data Preprocessing in Data Mining

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 72))

Abstract

This chapter focuses on the noise imperfections of the data. The presence of noise in data is a common problem that produces several negative consequences in classification problems. Noise is an unavoidable problem, which affects the data collection and data preparation processes in Data Mining applications, where errors commonly occur. The performance of the models built under such circumstances will heavily depend on the quality of the training data, but also on the robustness against the noise of the model learner itself. Hence, problems containing noise are complex problems and accurate solutions are often difficult to achieve without using specialized techniques—particularly if they are noise-sensitive. Identifying the noise is a complex task that will be developed in Sect. 5.1. Once the noise has been identified, the different kinds of such an imperfection are described in Sect. 5.2. From this point on, the two main approaches carried out in the literature are described. On the first hand, modifying and cleaning the data is studied in Sect. 5.3, whereas designing noise robust Machine Learning algorithms is tackled in Sect. 5.4. An empirical comparison between the latest approaches in the specialized literature is made in Sect. 5.5.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abellán, J., Masegosa, A.R.: Bagging decision trees on data sets with classification noise. In: Link S., Prade H. (eds.) FoIKS, Lecture Notes in Computer Science, vol. 5956, pp. 248–265. Springer, Heidelberg (2009)

    Google Scholar 

  2. Abellán, J., Masegosa, A.R.: Bagging schemes on the presence of class noise in classification. Expert Syst. Appl. 39(8), 6827–6837 (2012)

    Article  Google Scholar 

  3. Aha, D.W., Kibler, D.: Noise-tolerant instance-based learning algorithms. In: Proceedings of the 11th International Joint Conference on Artificial Intelligence, Vol. 1, IJCAI’89, pp. 794–799. Morgan Kaufmann Publishers Inc. (1989)

    Google Scholar 

  4. Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M., Ventura, S., Garrell, J., Otero, J., Romero, C., Bacardit, J., Rivas, V., Fernández, J., Herrera, F.: KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput. Fus. Found. Methodol. Appl. 13, 307–318 (2009)

    Google Scholar 

  5. Allwein, E.L., Schapire, R.E., Singer, Y.: Reducing multiclass to binary: a unifying approach for margin classifiers. J. Mach. Learn. Res. 1, 113–141 (2000)

    MathSciNet  Google Scholar 

  6. Anand, R., Mehrotra, K., Mohan, C.K., Ranka, S.: Efficient classification for multiclass problems using modular neural networks. IEEE Trans. Neural Netw. 6(1), 117–124 (1995)

    Article  Google Scholar 

  7. Angluin, D., Laird, P.: Learning from noisy examples. Mach. Learn. 2(4), 343–370 (1988)

    Google Scholar 

  8. Bonissone, P., Cadenas, J.M., Carmen Garrido, M., Díaz-Valladares, A.: A fuzzy random forest. Int. J. Approx. Reason. 51(7), 729–747 (2010)

    Article  Google Scholar 

  9. Bootkrajang, J., Kaban, A.: Multi-class classification in the presence of labelling errors. In: ESANN 2011, 19th European Symposium on Artificial Neural Networks, Bruges, Belgium, 27–29 April 2011, Proceedings, ESANN (2011)

    Google Scholar 

  10. Brodley, C.E., Friedl, M.A.: Identifying and eliminating mislabeled training instances. In: Clancey W.J., Weld D.S. (eds.) AAAI/IAAI, Vol. 1, pp. 799–805 (1996)

    Google Scholar 

  11. Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)

    MATH  Google Scholar 

  12. Catal, C., Alan, O., Balkan, K.: Class noise detection based on software metrics and ROC curves. Inf. Sci. 181(21), 4867–4877 (2011)

    Article  Google Scholar 

  13. Chang, C.C., Lin, C.J.: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)

    Article  Google Scholar 

  14. Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20, 273–297 (1995)

    MATH  Google Scholar 

  15. Delany, S.J., Cunningham, P.: An analysis of case-base editing in a spam filtering system. In: Funk P., González-Calero P.A. (eds.) ECCBR, pp. 128–141 (2004)

    Google Scholar 

  16. Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach. Learn. 40(2), 139–157 (2000)

    Article  Google Scholar 

  17. Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. 2(1), 263–286 (1995)

    MATH  Google Scholar 

  18. Du, W., Urahama, K.: Error-correcting semi-supervised pattern recognition with mode filter on graphs. J. Adv. Comput. Intell. Intell. Inform. 15(9), 1262–1268 (2011)

    Google Scholar 

  19. Frenay, B., Verleysen, M.: Classification in the presence of label noise: a survey. Neural Netw. Learn. Syst. IEEE Trans. 25(5), 845–869 (2014)

    Article  Google Scholar 

  20. Fürnkranz, J.: Round robin classification. J. Mach. Learn. Res. 2, 721–747 (2002)

    MATH  MathSciNet  Google Scholar 

  21. Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F.: An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes. Pattern Recognit. 44(8), 1761–1776 (2011)

    Article  Google Scholar 

  22. Galar, M., Fernández, A., Tartas, E.B., Sola, H.B., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C 42(4), 463–484 (2012)

    Article  Google Scholar 

  23. Gamberger, D., Boskovic, R., Lavrac, N., Groselj, C.: Experiments with noise filtering in a medical domain. In: Proceedings of the Sixteenth International conference on machine learning, pp. 143–151. Morgan Kaufmann Publishers (1999)

    Google Scholar 

  24. Gamberger, D., Lavrac, N., Dzeroski, S.: Noise detection and elimination in data preprocessing: experiments in medical domains. Appl. Artif. Intell. 14, 205–223 (2000)

    Article  Google Scholar 

  25. García, V., Alejo, R., Sánchez, J., Sotoca, J., Mollineda, R.: Combined effects of class imbalance and class overlap on instance-based classification. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds.) Intelligent Data Engineering and Automated Learning IDEAL 2006. Lecture Notes in Computer Science, vol. 4224, pp. 371–378. Springer, Berlin (2006)

    Chapter  Google Scholar 

  26. García, V., Mollineda, R., Sánchez, J.: On the k-NN performance in a challenging scenario of imbalance and overlapping. Pattern Anal. Appl. 11(3–4), 269–280 (2008)

    Article  Google Scholar 

  27. García, V., Sánchez, J., Mollineda, R.: An empirical study of the behavior of classifiers on imbalanced and overlapped data sets. In: Rueda, L., Mery, D., Kittler, J. (eds.) CIARP 2007. LNCS, vol. 4756, pp. 397–406. Springer, Heidelberg (2007)

    Google Scholar 

  28. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  29. Haralick, R.M.: The table look-up rule. Commun. Stat. Theory Methods A 5(12), 1163–1191 (1976)

    Article  MathSciNet  Google Scholar 

  30. Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 14, 515–516 (1968)

    Article  Google Scholar 

  31. Hernández, M.A., Stolfo, S.J.: Real-world data is dirty: data cleansing and the merge/purge problem. Data Min. Knowl. Discov. 2, 9–37 (1998)

    Article  Google Scholar 

  32. Hernández-Lobato, D., Hernández-Lobato, J.M., Dupont, P.: Robust multi-class gaussian process classification. In: Shawe-Taylor J., Zemel R.S., Bartlett P.L., Pereira F.C.N., Weinberger K.Q. (eds.) Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12–14 December 2011, Granada, Spain, NIPS, pp. 280–288 (2011)

    Google Scholar 

  33. Heskes, T.: The use of being stubborn and introspective. In: Ritter, H., Cruse, H., Dean, J. (eds.) Prerational Intelligence: Adaptive Behavior and Intelligent Systems Without Symbols and Logic, pp. 725–741. Kluwer, Dordrecht (2001)

    Google Scholar 

  34. Ho, T.K.: Multiple classifier combination: lessons and next steps. In: Kandel, Bunke E. (eds.) Hybrid Methods in Pattern Recognition, pp. 171-198. World Scientific, New York (2002)

    Google Scholar 

  35. Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)

    Article  Google Scholar 

  36. Ho, T.K., Hull, J.J., Srihari, S.N.: Decision combination in multiple classifier systems. IEEE Trans. Pattern Anal. Mach. Intell. 16(1), 66–75 (1994)

    Article  Google Scholar 

  37. Hsu, C.W., Lin, C.J.: A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)

    Article  Google Scholar 

  38. Huang, Y.S., Suen, C.Y.: A method of combining multiple experts for the recognition of unconstrained handwritten numerals. IEEE Trans. Pattern Anal. Mach. Intell. 17, 90–93 (1995)

    Article  Google Scholar 

  39. Huber, P.J.: Robust Statistics. Wiley, New York (1981)

    Book  MATH  Google Scholar 

  40. Hüllermeier, E., Vanderlooy, S.: Combining predictions in pairwise classification: an optimal adaptive voting strategy and its relation to weighted voting. Pattern Recognit. 43(1), 128–142 (2010)

    Article  MATH  Google Scholar 

  41. Japkowicz, N.: Class imbalance: are we focusing on the right issue? In: II Workshop on learning from imbalanced data sets, ICML, pp. 17–23 (2003)

    Google Scholar 

  42. Jeatrakul, P., Wong, K., Fung, C.: Data cleaning for classification using misclassification analysis. J. Adv. Comput. Intell. Intell. Inform. 14(3), 297–302 (2010)

    Google Scholar 

  43. Jo, T., Japkowicz, N.: Class Imbalances versus small disjuncts. SIGKDD Explor. 6(1), 40–49 (2004)

    Article  MathSciNet  Google Scholar 

  44. John, G.H.: Robust decision trees: removing outliers from databases. In: Fayyad, U.M., Uthurusamy, R. (eds.) Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), pp. 174–179. Montreal, Canada, August (1995)

    Google Scholar 

  45. Karmaker, A., Kwek, S.: A boosting approach to remove class label noise. Int. J. Hybrid Intell. Syst. 3(3), 169–177 (2006)

    MATH  Google Scholar 

  46. Kermanidis, K.L.: The effect of borderline examples on language learning. J. Exp. Theor. Artif. Intell. 21, 19–42 (2009)

    Article  MATH  Google Scholar 

  47. Khoshgoftaar, T., Van Hulse, J., Napolitano, A.: Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 41(3), 552–568 (2011)

    Article  Google Scholar 

  48. Khoshgoftaar, T.M., Rebours, P.: Improving software quality prediction by noise filtering techniques. J. Comput. Sci. Technol. 22, 387–396 (2007)

    Article  Google Scholar 

  49. Klebanov, B.B., Beigman, E.: Some empirical evidence for annotation noise in a benchmarked dataset. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT ’10, pp. 438–446. Association for Computational Linguistics (2010)

    Google Scholar 

  50. Knerr, S., Personnaz, L., Dreyfus, G.: Single-layer learning revisited: a stepwise procedure for building and training a neural network. In: Fogelman Soulié F., Hérault J. (eds.) Neurocomputing: Algorithms, Architectures and Applications, pp. 41–50. Springer, Heidelberg (1990)

    Google Scholar 

  51. Knerr, S., Personnaz, L., Dreyfus, G., Member, S.: Handwritten digit recognition by neural networks with single-layer training. IEEE Trans. Neural Netw. 3, 962–968 (1992)

    Article  Google Scholar 

  52. Kubat, M., Matwin, S.: Addresing the curse of imbalanced training sets: one-side selection. In: Proceedings of the 14th International Conference on Machine Learning, pp. 179–186 (1997)

    Google Scholar 

  53. Kuncheva, L.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, Chichester (2004)

    Book  Google Scholar 

  54. Kuncheva, L.I.: Diversity in multiple classifier systems. Inform. Fus. 6, 3–4 (2005)

    Article  Google Scholar 

  55. Lorena, A., de Carvalho, A., Gama, J.: A review on the combination of binary classifiers in multiclass problems. Artif. Intell. Rev. 30, 19–37 (2008)

    Article  Google Scholar 

  56. Maclin, R., Opitz, D.: An empirical evaluation of bagging and boosting. In: Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence, pp. 546–551 (1997)

    Google Scholar 

  57. Malossini, A., Blanzieri, E., Ng, R.T.: Detecting potential labeling errors in microarrays by data perturbation. Bioinformatics 22(17), 2114–2121 (2006)

    Article  Google Scholar 

  58. Mandler, E., Schuermann, J.: Combining the classification results of independent classifiers based on the Dempster/Shafer theory of evidence. In: Gelsema E.S., Kanal L.N. (eds.) Pattern Recognition and Artificial Intelligence, pp. 381–393. Amsterdam: North-Holland (1988)

    Google Scholar 

  59. Manwani, N., Sastry, P.S.: Noise tolerance under risk minimization. IEEE Trans. Cybern. 43(3), 1146–1151 (2013)

    Article  Google Scholar 

  60. Maulik, U., Chakraborty, D.: A robust multiple classifier system for pixel classification of remote sensing images. Fundamenta Informaticae 101(4), 286–304 (2010)

    MathSciNet  Google Scholar 

  61. Mayoraz, E., Moreira, M.: On the decomposition of polychotomies into dichotomies (1996)

    Google Scholar 

  62. Mazurov, V.D., Krivonogov, A.I., Kazantsev, V.S.: Solving of optimization and identification problems by the committee methods. Pattern Recognit. 20, 371–378 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  63. Mclachlan, G.J.: Discriminant Analysis and Statistical Pattern Recognition (Wiley Series in Probability and Statistics). Wiley-Interscience, New York (2004)

    Google Scholar 

  64. Melville, P., Shah, N., Mihalkova, L., Mooney, R.J.: Experiments on ensembles with missing and noisy data. In: Roli F., Kittler J., Windeatt T. (eds.) Multiple Classifier Systems, Lecture Notes in Computer Science, vol. 3077, pp. 293–302. Springer, Heidelberg (2004)

    Google Scholar 

  65. Miranda, A.L.B., Garcia, L.P.F., Carvalho, A.C.P.L.F., Lorena, A.C.: Use of classification algorithms in noise detection and elimination. In: Corchado E., Wu X., Oja E., Herrero l., Baruque B. (eds.) HAIS, Lecture Notes in Computer Science, vol. 5572, pp. 417–424. Springer, Heidelberg (2009)

    Google Scholar 

  66. Muhlenbach, F., Lallich, S., Zighed, D.A.: Identifying and handling mislabelled instances. J. Intell. Inf. Syst. 22(1), 89–109 (2004)

    Article  Google Scholar 

  67. Napierala, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. Rough Sets and Current Trends in Computing. LNCS, vol. 6086, pp. 158–167. Springer, Berlin (2010)

    Google Scholar 

  68. Nath, R.K.: Fingerprint recognition using multiple classifier system. Fractals 15(3), 273–278 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  69. Nettleton, D., Orriols-Puig, A., Fornells, A.: A Study of the Effect of Different Types of Noise on the Precision of Supervised Learning Techniques. Artif. Intell. Rev. 33, 275–306 (2010)

    Article  Google Scholar 

  70. Pérez Carlos Javier, G.F.J.M.J.R.M.R.C.: Misclassified multinomial data: a Bayesian approach. RACSAM 101(1), 71–80 (2007)

    Google Scholar 

  71. Pimenta, E., Gama, J.: A study on error correcting output codes. In: Portuguese Conference on Artificial Intelligence EPIA 2005, 218–223 (2005)

    Google Scholar 

  72. Polikar, R.: Ensemble based systems in decision making. IEEE Circ. Syst. Mag. 6(3), 21–45 (2006)

    Article  Google Scholar 

  73. Qian, B., Rasheed, K.: Foreign exchange market prediction with multiple classifiers. J. Forecast. 29(3), 271–284 (2010)

    MATH  MathSciNet  Google Scholar 

  74. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  75. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)

    Google Scholar 

  76. Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)

    MATH  MathSciNet  Google Scholar 

  77. Sáez, J.A., Luengo, J., Herrera, F.: Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification. Pattern Recognit. 46(1), 355–364 (2013)

    Article  Google Scholar 

  78. Sánchez, J.S., Barandela, R., Marqués, A.I., Alejo, R., Badenas, J.: Analysis of new techniques to obtain quality training sets. Pattern Recognit. Lett. 24(7), 1015–1022 (2003)

    Article  Google Scholar 

  79. Segata, N., Blanzieri, E., Delany, S.J., Cunningham, P.: Noise reduction for instance-based learning with a local maximal margin approach. J. Intell. Inf. Syst. 35(2), 301–331 (2010)

    Article  Google Scholar 

  80. Shapley, L., Grofman, B.: Optimizing group judgmental accuracy in the presence of interdependencies. Pub. Choice 43, 329–343 (1984)

    Article  Google Scholar 

  81. Smith, M.R., Martinez, T.R.: Improving classification accuracy by identifying and removing instances that should be misclassified. In: IJCNN, pp. 2690–2697 (2011)

    Google Scholar 

  82. Sun, J., ying Zhao, F., Wang, C.J., Chen, S.: Identifying and correcting mislabeled training instances. In: FGCN (1), pp. 244–250. IEEE (2007)

    Google Scholar 

  83. Sun, Y., Wong, A.K.C., Kamel, M.S.: Classification of Imbalanced Data: a Review. Int. J. Pattern Recognit. Artif. Intell. 23(4), 687–719 (2009)

    Article  Google Scholar 

  84. Teng, C.M.: Correcting Noisy Data. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 239–248. Morgan Kaufmann Publishers, San Francisco, USA (1999)

    Google Scholar 

  85. Teng, C.M.: Polishing blemishes: Issues in data correction. IEEE Intell. Syst. 19(2), 34–39 (2004)

    Article  Google Scholar 

  86. Thongkam, J., Xu, G., Zhang, Y., Huang, F.: Support vector machine for outlier detection in breast cancer survivability prediction. In: Ishikawa Y., He J., Xu G., Shi Y., Huang G., Pang C., Zhang Q., Wang G. (eds.) APWeb Workshops, Lecture Notes in Computer Science, vol. 4977, pp. 99–109. Springer (2008)

    Google Scholar 

  87. Titterington, D.M., Murray, G.D., Murray, L.S., Spiegelhalter, D.J., Skene, A.M., Habbema, J.D.F., Gelpke, G.J.: Comparison of discriminant techniques applied to a complex data set of head injured patients. J. R. Stat. Soc. Series A (General) 144, 145–175 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  88. Tomek, I.: Two Modifications of CNN. IEEE Tran. Syst. Man Cybern. 7(2), 679–772 (1976)

    MathSciNet  Google Scholar 

  89. Verbaeten, S., Assche, A.V.: Ensemble methods for noise elimination in classification problems. In: Fourth International Workshop on Multiple Classifier Systems, pp. 317–325. Springer, Heidelberg (2003)

    Google Scholar 

  90. Wang, R.Y., Storey, V.C., Firth, C.P.: A framework for analysis of data quality research. IEEE Trans. Knowl. Data Eng. 7(4), 623–640 (1995)

    Article  Google Scholar 

  91. Wemecke, K.D.: A coupling procedure for the discrimination of mixed data. Biometrics 48, 497–506 (1992)

    Article  Google Scholar 

  92. Wheway, V.: Using boosting to detect noisy data. In: Revised Papers from the PRICAI 2000 Workshop Reader, Four Workshops Held at PRICAI 2000 on Advances in Artificial Intelligence, pp. 123–132. Springer (2001)

    Google Scholar 

  93. Wilson, D.R., Martinez, T.R.: Instance pruning techniques. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML ’97, pp. 403–411. Morgan Kaufmann Publishers Inc. (1997)

    Google Scholar 

  94. Woźniak, M., Graña, M., Corchado, E.: A survey of multiple classifier systems as hybrid systems. Inform. Fus. 16, 3–17 (2013)

    Google Scholar 

  95. Wu, T.F., Lin, C.J., Weng, R.C.: Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 5, 975–1005 (2004)

    MATH  MathSciNet  Google Scholar 

  96. Wu, X.: Knowledge Acquisition From Databases. Ablex Publishing Corp, Norwood (1996)

    Google Scholar 

  97. Xu, L., Krzyzak, A., Suen, C.Y.: Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans. Syst. Man Cybern. 22(3), 418–435 (1992)

    Article  Google Scholar 

  98. Zhang, C., Wu, C., Blanzieri, E., Zhou, Y., Wang, Y., Du, W., Liang, Y.: Methods for labeling error detection in microarrays based on the effect of data perturbation on the regression model. Bioinformatics 25(20), 2708–2714 (2009)

    Article  Google Scholar 

  99. Zhong, S., Khoshgoftaar, T.M., Seliya, N.: Analyzing software measurement data with clustering techniques. IEEE Intell. Syst. 19(2), 20–27 (2004)

    Article  Google Scholar 

  100. Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study. Artif. Intell. Rev. 22, 177–210 (2004)

    Article  MATH  Google Scholar 

  101. Zhu, X., Wu, X.: Class noise handling for effective cost-sensitive learning by cost-guided iterative classification filtering. IEEE Trans. Knowl. Data Eng. 18(10), 1435–1440 (2006)

    Article  MathSciNet  Google Scholar 

  102. Zhu, X., Wu, X., Chen, Q.: Eliminating class noise in large datasets. In: Proceeding of the Twentieth International Conference on Machine Learning, pp. 920–927 (2003)

    Google Scholar 

  103. Zhu, X., Wu, X., Chen, Q.: Bridging local and global data cleansing: Identifying class noise in large, distributed data datasets. Data Min. Knowl. Discov. 12(2–3), 275–308 (2006)

    Article  MathSciNet  Google Scholar 

  104. Zhu, X., Wu, X., Yang, Y.: Error detection and impact-sensitive instance ranking in noisy datasets. In: Proceedings of the Nineteenth National Conference on Artificial Intelligence, pp. 378–383. AAAI Press (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Salvador García .

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

García, S., Luengo, J., Herrera, F. (2015). Dealing with Noisy Data. In: Data Preprocessing in Data Mining. Intelligent Systems Reference Library, vol 72. Springer, Cham. https://doi.org/10.1007/978-3-319-10247-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10247-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10246-7

  • Online ISBN: 978-3-319-10247-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics