Identifying and Eliminating Irrelevant Instances Using Information Theory

Sebban, Marc; Nock, Richard

doi:10.1007/3-540-45486-1_8

Identifying and Eliminating Irrelevant Instances Using Information Theory

Marc Sebban² &
Richard Nock²

Conference paper
First Online: 01 January 2000

558 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1822))

Abstract

While classical approaches deal with prototype selection (PS) using accuracy maximization, we investigate PS in this paper as an information preserving problem. We use information theory to build a statistical criterion from the nearest-neighbor topology. This statistical framework is used in a backward prototype selection algorithm (PSRCG). It consists in identifying and eliminating uninformative instances, and then reducing the global uncertainty of the learning set. We draw from experimental results and rigorous comparisons two main conclusions: (i) our approach provides a good compromise solution based on the requirement to keep a small number of prototypes, while not compromising the classification accuracy; (ii) our PSRCG algorithm seems to be robust in the presence of noise. Performances on several benchmarks tend to show the relevance and the effectiveness of our method in comparison with the classic PS algorithms based on the accuracy.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D. Aha, D. Kibler, and M. Albert. Instance-based learning algorithms. Machine Learning, 6:37–66, 1991.
Google Scholar
H. Brighton and C. Mellish. On the consistency of information filters for lazy learning algorithms. In Third European Conference on Principles and Practices of Knowledge Discovery in Databases, PKDD’99, pages 283–288, 1999.
Google Scholar
C. Brodley. Adressing the selective superiority problem: Automatic algorithm/model class selection. In Tenth International Machine Learning Conference (Amherst), pages 17–24, 1993.
Google Scholar
C. Brodley and M. Friedl. Identifying and eliminating mislabeled training instances. In Thirteen National Conference on Artificial Intelligence, 1996.
Google Scholar
T. Cover and P. Hart. Nearest neighbor pattern classification. IEEE. Trans. Info. Theory, IT13:21–27, 1967.
Article Google Scholar
P. Domingos. Rule induction and instance-based learning: a unified approach. In International Joint Conference on Artificial Intelligence (IJCAI-95), 1995.
Google Scholar
Y. Freund and R. Schapire. Experiments with a new boosting algorithm. Proc. of the 13th International Conference on Machine Learning, 1996.
Google Scholar
G. Gates. The reduced nearest neighbor rule. IEEE Trans. Inform. Theory, pages 431–433, 1972.
Google Scholar
P. Hart. The condensed nearest neighbor rule. IEEE Trans. Inform. Theory, pages 515–516, 1968.
Google Scholar
M. Kearns and Y. Mansour. On the boosting ability of top-down decision tree learning algorithms. Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, pages 459–468, 1996.
Google Scholar
D. Koller and R. Sahami. Toward optimal feature selection. In Thirteenth International Conference on Machine Learning (Bari-Italy), 1996.
Google Scholar
R. Light and B. Margolin. An analysis of variance for categorical date. Journal of the American Statistical Association (66), pages 534–544, 1971.
Article MATH MathSciNet Google Scholar
R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. In Proceedings of the Eleventh Annual ACM Conference on Computational Learning Theory, pages 80–91, 1998.
Google Scholar
M. Sebban. On feature selection: a new filter model. In Twelfth International Florida AI Research Society Conference, pages 230–234, 1999.
Google Scholar
M. Sebban and R. Nock. Contribution of boosting in wrapper models. In Third European Conference on Principles and Practices of Knowledge Discovery in Databases, PKDD’99, pages 214–222, 1999.
Google Scholar
D. Skalak. Prototype and feature selection by sampling and random mutation hill climbing algorithms. In 11th International Conference on Machine Learning, pages 293–301, 1994.
Google Scholar
D. Skalak. Prototype selection for composite nearest neighbors classifiers. In Technical report UM-CS-1996-089, 1996.
Google Scholar
R. Sproull. Refinements to nearest-neighbor searching in k-dimensional trees. Algorithmica, 6:579–589, 1991.
Article MATH MathSciNet Google Scholar
Wettschereck and T. Dietterich. A hybrid nearest-neighbor and nearest-hyperrectangle algorithm. In 7th European Machine Learning Conference (Amherst), 1994.
Google Scholar
D. Wettschereck and T. Dietterich. An experimental comparison of the nearest neighbor and nearest hyperrectangle algorithms. In Machine Learning, 19, pages 5–28, 1995.
Google Scholar
D. Wilson and T. Martinez. Instance pruning techniques. In Proc. of the Fourteenth International Conference in Machine Learning, pages 404–411, 1997.
Google Scholar
D. Wilson and T. Martinez. Reduction techniques for exemplar-based learning algorithms. In Machine Learning, 1998.
Google Scholar
J. Zhang. Selecting typical instances in instance-based learning. In Proc. of the Ninth International Machine Learning Workshop, pages 470–479, 1992.
Google Scholar

Download references

Author information

Authors and Affiliations

TRIVIA-Computer Science, Dpt of Juridical Sciences, French West Indies and Guiana University, 97190, Pointe-a-Pitre Cedex, France
Marc Sebban & Richard Nock

Authors

Marc Sebban
View author publications
You can also search for this author in PubMed Google Scholar
Richard Nock
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Regina, Regina, SK, S4S 0A2, Canada
Howard J. Hamilton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sebban, M., Nock, R. (2000). Identifying and Eliminating Irrelevant Instances Using Information Theory. In: Hamilton, H.J. (eds) Advances in Artificial Intelligence. Canadian AI 2000. Lecture Notes in Computer Science(), vol 1822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45486-1_8

Download citation

DOI: https://doi.org/10.1007/3-540-45486-1_8
Published: 19 May 2000
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67557-0
Online ISBN: 978-3-540-45486-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics