Abstract
It is common that a database contains noisy data. An important source of noise consists in mislabeled training instances. We present a new approach that deals with improving classification accuracies in such a case by using a preliminary filtering procedure. An example is suspect when in its neighborhood defined by a geometrical graph the proportion of examples of the same class is not significantly greater than in the whole database. Such suspect examples in the training data can be removed or relabeled. The filtered training set is then provided as input to learning algorithm. Our experiments on ten benchmarks of UCI Machine Learning Repository using 1-NN as the final algorithm show that removing give better results than relabeling. Removing allows maintaining the generalization error rate when we introduce from 0 to 20% of noise on the class, especially when classes are well separable.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
V. Barnett and T. Lewis. Outliers in statistical data. Wiley, Norwich, 1984.
R. J. Beckman and R. D. Cooks. Oulier...s. Technometrics, 25:119–149, 1983.
C. L. Blake and C. J. Merz. UCI repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science [http://www.ics.uci.edu/~mlearn/MLRepository.html], 1998.
C. E. Brodley and M. A. Friedl. Identifying and eliminating mislabeled training instances. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 799–805, Portland OR, 1996. AAI Press.
C. E. Brodley and M. A. Friedl. Identifying mislabeled training data. Journal of Artificial Intelligence Research, 11:131–167, 1999.
G. H. John. Robust decision trees: removing outliers from data. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, pages 174–179, Montréal, Québec, 1995. AAI Press.
E. M. Knorr, R. T. Ng, and V. Tucakov. Distance-based outliers: Algorithms and applications. The VLDB Journal, 8(3):237–253, February 2000.
C. Largeron. Reconnaissance des formes par relaxation: un modèle d’aide à la décision. PhD thesis, Université Lyon 1, 1991.
F. Muhlenbach, S. Lallich, and D. A. Zighed. Amélioration d’une classification par filtrage des exemples mal étiquetés. ECA, 1(4):155–166, 2001.
J. R. Quinlan. Induction of decisions trees. Machine Learning, 1:81–106, 1986.
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, 1993.
I. Tomek. An experiment with the edited nearest neighbor rule. IEEE Transactions on Systems, Man and Cybernetics, 6(6):448–452, 1976.
G. Toussaint. The relative neighborhood graph of a finite planar set. Pattern recognition, 12:261–268, 1980.
D. Wilson. Asymptotic properties of nearest neighbors rules using edited data. In IEEE Transactions on systems, Man and Cybernetics, 2:408–421, 1972.
D. R. Wilson and T. R. Martinez. Reduction techniques for exemplar-based learning algorithms. Machine Learning, 38:257–268, 2000.
D. A. Zighed, S. Lallich, and F. Muhlenbach. Séparabilité des classes dans Rp. In Actes des 8èmes Rencontres de la SFC, pages 356–363, 2001.
D. A. Zighed and M. Sebban. Sélection et validation statistique de variables et de prototypes. In M. Sebban and G. Venturini, editors, Apprentissage automatique. Hermès Science, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lallich, S., Muhlenbach, F., Zighed, D.A. (2002). Improving Classification by Removing or Relabeling Mislabeled Instances. In: Hacid, MS., Raś, Z.W., Zighed, D.A., Kodratoff, Y. (eds) Foundations of Intelligent Systems. ISMIS 2002. Lecture Notes in Computer Science(), vol 2366. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48050-1_3
Download citation
DOI: https://doi.org/10.1007/3-540-48050-1_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43785-7
Online ISBN: 978-3-540-48050-1
eBook Packages: Springer Book Archive