Skip to main content

Identifying and Eliminating Irrelevant Instances Using Information Theory

  • Conference paper
  • First Online:
  • 558 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1822))

Abstract

While classical approaches deal with prototype selection (PS) using accuracy maximization, we investigate PS in this paper as an information preserving problem. We use information theory to build a statistical criterion from the nearest-neighbor topology. This statistical framework is used in a backward prototype selection algorithm (PSRCG). It consists in identifying and eliminating uninformative instances, and then reducing the global uncertainty of the learning set. We draw from experimental results and rigorous comparisons two main conclusions: (i) our approach provides a good compromise solution based on the requirement to keep a small number of prototypes, while not compromising the classification accuracy; (ii) our PSRCG algorithm seems to be robust in the presence of noise. Performances on several benchmarks tend to show the relevance and the effectiveness of our method in comparison with the classic PS algorithms based on the accuracy.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. Aha, D. Kibler, and M. Albert. Instance-based learning algorithms. Machine Learning, 6:37–66, 1991.

    Google Scholar 

  2. H. Brighton and C. Mellish. On the consistency of information filters for lazy learning algorithms. In Third European Conference on Principles and Practices of Knowledge Discovery in Databases, PKDD’99, pages 283–288, 1999.

    Google Scholar 

  3. C. Brodley. Adressing the selective superiority problem: Automatic algorithm/model class selection. In Tenth International Machine Learning Conference (Amherst), pages 17–24, 1993.

    Google Scholar 

  4. C. Brodley and M. Friedl. Identifying and eliminating mislabeled training instances. In Thirteen National Conference on Artificial Intelligence, 1996.

    Google Scholar 

  5. T. Cover and P. Hart. Nearest neighbor pattern classification. IEEE. Trans. Info. Theory, IT13:21–27, 1967.

    Article  Google Scholar 

  6. P. Domingos. Rule induction and instance-based learning: a unified approach. In International Joint Conference on Artificial Intelligence (IJCAI-95), 1995.

    Google Scholar 

  7. Y. Freund and R. Schapire. Experiments with a new boosting algorithm. Proc. of the 13th International Conference on Machine Learning, 1996.

    Google Scholar 

  8. G. Gates. The reduced nearest neighbor rule. IEEE Trans. Inform. Theory, pages 431–433, 1972.

    Google Scholar 

  9. P. Hart. The condensed nearest neighbor rule. IEEE Trans. Inform. Theory, pages 515–516, 1968.

    Google Scholar 

  10. M. Kearns and Y. Mansour. On the boosting ability of top-down decision tree learning algorithms. Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, pages 459–468, 1996.

    Google Scholar 

  11. D. Koller and R. Sahami. Toward optimal feature selection. In Thirteenth International Conference on Machine Learning (Bari-Italy), 1996.

    Google Scholar 

  12. R. Light and B. Margolin. An analysis of variance for categorical date. Journal of the American Statistical Association (66), pages 534–544, 1971.

    Article  MATH  MathSciNet  Google Scholar 

  13. R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. In Proceedings of the Eleventh Annual ACM Conference on Computational Learning Theory, pages 80–91, 1998.

    Google Scholar 

  14. M. Sebban. On feature selection: a new filter model. In Twelfth International Florida AI Research Society Conference, pages 230–234, 1999.

    Google Scholar 

  15. M. Sebban and R. Nock. Contribution of boosting in wrapper models. In Third European Conference on Principles and Practices of Knowledge Discovery in Databases, PKDD’99, pages 214–222, 1999.

    Google Scholar 

  16. D. Skalak. Prototype and feature selection by sampling and random mutation hill climbing algorithms. In 11th International Conference on Machine Learning, pages 293–301, 1994.

    Google Scholar 

  17. D. Skalak. Prototype selection for composite nearest neighbors classifiers. In Technical report UM-CS-1996-089, 1996.

    Google Scholar 

  18. R. Sproull. Refinements to nearest-neighbor searching in k-dimensional trees. Algorithmica, 6:579–589, 1991.

    Article  MATH  MathSciNet  Google Scholar 

  19. Wettschereck and T. Dietterich. A hybrid nearest-neighbor and nearest-hyperrectangle algorithm. In 7th European Machine Learning Conference (Amherst), 1994.

    Google Scholar 

  20. D. Wettschereck and T. Dietterich. An experimental comparison of the nearest neighbor and nearest hyperrectangle algorithms. In Machine Learning, 19, pages 5–28, 1995.

    Google Scholar 

  21. D. Wilson and T. Martinez. Instance pruning techniques. In Proc. of the Fourteenth International Conference in Machine Learning, pages 404–411, 1997.

    Google Scholar 

  22. D. Wilson and T. Martinez. Reduction techniques for exemplar-based learning algorithms. In Machine Learning, 1998.

    Google Scholar 

  23. J. Zhang. Selecting typical instances in instance-based learning. In Proc. of the Ninth International Machine Learning Workshop, pages 470–479, 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sebban, M., Nock, R. (2000). Identifying and Eliminating Irrelevant Instances Using Information Theory. In: Hamilton, H.J. (eds) Advances in Artificial Intelligence. Canadian AI 2000. Lecture Notes in Computer Science(), vol 1822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45486-1_8

Download citation

  • DOI: https://doi.org/10.1007/3-540-45486-1_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67557-0

  • Online ISBN: 978-3-540-45486-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics