Skip to main content

Identifying Competence-Critical Instances for Instance-Based Learners

  • Chapter
Book cover Instance Selection and Construction for Data Mining

Abstract

The basic nearest neighbour classifier suffers from the indiscriminate storage of all presented training instances. With a large database of instances classification response time can be slow. When noisy instances are present classification accuracy can suffer. Drawing on the large body of relevant work carried out in the past 30 years, we review the principle approaches to solving these problems. By deleting instances, both problems can be alleviated, but the criterion used is typically assumed to be all encompassing and effective over many domains. We argue against this position and introduce an algorithm that rivals the most successful existing algorithm. When evaluated on 30 different problems, neither algorithm consistently outperforms the other: consistency is very hard. To achieve the best results, we need to develop mechanisms that provide insights into the structure of class definitions. We discuss the possibility of these mechanisms and propose some initial measures that could be useful for the data miner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aha, D. W., Kibler, D., and Albert, M. K. (1991). Instance based learning algorithms. Machine Learning, 6(1):37–66.

    Google Scholar 

  • Blake, C. and Merz, C. (1998). UCI repository of machine learning databases.

    Google Scholar 

  • Brighton, H. (1997). Information filtering for lazy learning algorithms. Masters Thesis, Centre for Cognitive Science, University of Edinburgh, Scotland.

    Google Scholar 

  • Brighton, H. and Mellish, C. (1999). On the consistency of information filters for lazy learning algorithms. In Zytkow, J. M. and Rauch, J., editors, Principles of Data Mining and Knowledge Discovery: 3rd European Conference, LNAI 1704, pages 283–288, Prague, Czech Republic. Springer.

    Chapter  Google Scholar 

  • Brodley, C. (1993). Addressing the selective superiority problem: Automatic algorithm/mode class selection. In Proceedings of the Tenth International Machine Learning Conference, pages 17–24, Amherst, MA.

    Google Scholar 

  • Cameron-Jones, R. M. (1992). Minimum description length instance-based learning. In Proceedings of the Fifth Australian Joint Conference on Artificial Intelligence, pages 368–373, Hobart, Australia. World Scientific.

    Google Scholar 

  • Chang, C.-L. (1974). Finding prototypes for nearest neighbor classifiers. In IEEE Transactions on Computers, volume C-23, pages 1179–1184. IEEE.

    Google Scholar 

  • Cover, T. M. and Hart, P. E. (1967). Nearest neighbor pattern classification. Institute of Electrical and Electronics Engineers Transactions on Information Theory, IT-13:21–27.

    Article  MATH  Google Scholar 

  • Daelemans, W., van den Bosch, A., and Zavrel, J. (1999). Forgetting exceptions is harmful in language learning. Machine Learning, 34(1/3) :11–41.

    Article  MATH  Google Scholar 

  • Dasarathy, B. (1991). Nearest Neighbor (NN) norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alimos, CA.

    Google Scholar 

  • Gates, G. W. (1972). The reduced nearest neighbor rule. Institute of Electrical and Electronics Engineers Transactions on Information Theory, 18(3):431–433.

    Article  Google Scholar 

  • Hart, P. E. (1968). The condensed nearest neighbor rule. Institute of Electrical and Electronics Engineers Transactions on Information Theory, 14(3) :515–516.

    Article  Google Scholar 

  • King, R. D., Feng, C., and Sutherland, A. (1995). Statlog: Comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence, 9(3):289–333.

    Article  Google Scholar 

  • Kolodner, J. L. (1993). Case-based reasoning. Morgan Kaufmann Publishers, San Mateo, Calif.

    Google Scholar 

  • Markovitch, S. and Scott, P. D. (1988). The role of forgetting in learning. In Proceedings of the Fifth International Conference on Machine Learning, pages 459–465, Ann Arbor, MI. Morgan Kaufmann Publishers.

    Google Scholar 

  • Markovitch, S. and Scott, P. D. (1993). Information filtering: Selection mechanisms in learning systems. Machine Learning, 10(2): 113–151.

    Google Scholar 

  • Ritter, G. L., Woodruff, H. B., Lowry, S. R., and Isenhour, T. L. (1975). An algorithm for the selective nearest neighbour decision rule. Institute of Electrical and Electronics Engineers Transactions on Information Theory, 21(6):665–669.

    Article  MATH  Google Scholar 

  • Sebban, M., Zighed, D. A., and Di Palma, S. (1999). Selection and statistical validation of features and prototypes. In Zytkow, J. M. and Rauch, J., editors, Principles of Data Mining and Knowledge Discovery: 3rd European Conference, LNAI 1704, pages 184–192, Prague, Czech Republic. Springer.

    Chapter  Google Scholar 

  • Smyth, B. and Keane, M. T. (1995). Remembering to forget. In Mellish, C. S., editor, IJCAI-95: Proceedings of the Fourteenth International Conference on Artificial Intelligence, volume 1, pages 377–382. Morgan Kaufmann Publishers.

    Google Scholar 

  • Swonger, C. W. (1972). Sample set condensation for a condensed nearest neighbour decision rule for pattern recognition. In Watariabe, S., editor, Frontiers of Pattern Recognition, pages 511–519. Academic Press, Orlando, Fla.

    Google Scholar 

  • Tomek, I. (1976). An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics, SMC-6(6):448–452.

    Article  MathSciNet  MATH  Google Scholar 

  • Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics, SMC-2(3):408–421.

    Article  MATH  Google Scholar 

  • Wilson, D. R. and Martinez, A. R. (1997). Instance pruning techniques. In Fisher, D., editor, Machine Learning: Proceedings of the Fourteenth International Conference, San Francisco, CA. Morgan Kaufmann.

    Google Scholar 

  • Zhang, J. (1992). Selecting typical instances in instance-based learning. In Proceedings of the Ninth International Machine Learning Conference, pages 470–479, Aberdeen, Scotland. Morgan Kaufmann.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Brighton, H., Mellish, C. (2001). Identifying Competence-Critical Instances for Instance-Based Learners. In: Liu, H., Motoda, H. (eds) Instance Selection and Construction for Data Mining. The Springer International Series in Engineering and Computer Science, vol 608. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3359-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-3359-4_5

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-4861-8

  • Online ISBN: 978-1-4757-3359-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics