Abstract
The basic nearest neighbour classifier suffers from the indiscriminate storage of all presented training instances. With a large database of instances classification response time can be slow. When noisy instances are present classification accuracy can suffer. Drawing on the large body of relevant work carried out in the past 30 years, we review the principle approaches to solving these problems. By deleting instances, both problems can be alleviated, but the criterion used is typically assumed to be all encompassing and effective over many domains. We argue against this position and introduce an algorithm that rivals the most successful existing algorithm. When evaluated on 30 different problems, neither algorithm consistently outperforms the other: consistency is very hard. To achieve the best results, we need to develop mechanisms that provide insights into the structure of class definitions. We discuss the possibility of these mechanisms and propose some initial measures that could be useful for the data miner.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aha, D. W., Kibler, D., and Albert, M. K. (1991). Instance based learning algorithms. Machine Learning, 6(1):37–66.
Blake, C. and Merz, C. (1998). UCI repository of machine learning databases.
Brighton, H. (1997). Information filtering for lazy learning algorithms. Masters Thesis, Centre for Cognitive Science, University of Edinburgh, Scotland.
Brighton, H. and Mellish, C. (1999). On the consistency of information filters for lazy learning algorithms. In Zytkow, J. M. and Rauch, J., editors, Principles of Data Mining and Knowledge Discovery: 3rd European Conference, LNAI 1704, pages 283–288, Prague, Czech Republic. Springer.
Brodley, C. (1993). Addressing the selective superiority problem: Automatic algorithm/mode class selection. In Proceedings of the Tenth International Machine Learning Conference, pages 17–24, Amherst, MA.
Cameron-Jones, R. M. (1992). Minimum description length instance-based learning. In Proceedings of the Fifth Australian Joint Conference on Artificial Intelligence, pages 368–373, Hobart, Australia. World Scientific.
Chang, C.-L. (1974). Finding prototypes for nearest neighbor classifiers. In IEEE Transactions on Computers, volume C-23, pages 1179–1184. IEEE.
Cover, T. M. and Hart, P. E. (1967). Nearest neighbor pattern classification. Institute of Electrical and Electronics Engineers Transactions on Information Theory, IT-13:21–27.
Daelemans, W., van den Bosch, A., and Zavrel, J. (1999). Forgetting exceptions is harmful in language learning. Machine Learning, 34(1/3) :11–41.
Dasarathy, B. (1991). Nearest Neighbor (NN) norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alimos, CA.
Gates, G. W. (1972). The reduced nearest neighbor rule. Institute of Electrical and Electronics Engineers Transactions on Information Theory, 18(3):431–433.
Hart, P. E. (1968). The condensed nearest neighbor rule. Institute of Electrical and Electronics Engineers Transactions on Information Theory, 14(3) :515–516.
King, R. D., Feng, C., and Sutherland, A. (1995). Statlog: Comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence, 9(3):289–333.
Kolodner, J. L. (1993). Case-based reasoning. Morgan Kaufmann Publishers, San Mateo, Calif.
Markovitch, S. and Scott, P. D. (1988). The role of forgetting in learning. In Proceedings of the Fifth International Conference on Machine Learning, pages 459–465, Ann Arbor, MI. Morgan Kaufmann Publishers.
Markovitch, S. and Scott, P. D. (1993). Information filtering: Selection mechanisms in learning systems. Machine Learning, 10(2): 113–151.
Ritter, G. L., Woodruff, H. B., Lowry, S. R., and Isenhour, T. L. (1975). An algorithm for the selective nearest neighbour decision rule. Institute of Electrical and Electronics Engineers Transactions on Information Theory, 21(6):665–669.
Sebban, M., Zighed, D. A., and Di Palma, S. (1999). Selection and statistical validation of features and prototypes. In Zytkow, J. M. and Rauch, J., editors, Principles of Data Mining and Knowledge Discovery: 3rd European Conference, LNAI 1704, pages 184–192, Prague, Czech Republic. Springer.
Smyth, B. and Keane, M. T. (1995). Remembering to forget. In Mellish, C. S., editor, IJCAI-95: Proceedings of the Fourteenth International Conference on Artificial Intelligence, volume 1, pages 377–382. Morgan Kaufmann Publishers.
Swonger, C. W. (1972). Sample set condensation for a condensed nearest neighbour decision rule for pattern recognition. In Watariabe, S., editor, Frontiers of Pattern Recognition, pages 511–519. Academic Press, Orlando, Fla.
Tomek, I. (1976). An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics, SMC-6(6):448–452.
Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics, SMC-2(3):408–421.
Wilson, D. R. and Martinez, A. R. (1997). Instance pruning techniques. In Fisher, D., editor, Machine Learning: Proceedings of the Fourteenth International Conference, San Francisco, CA. Morgan Kaufmann.
Zhang, J. (1992). Selecting typical instances in instance-based learning. In Proceedings of the Ninth International Machine Learning Conference, pages 470–479, Aberdeen, Scotland. Morgan Kaufmann.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Brighton, H., Mellish, C. (2001). Identifying Competence-Critical Instances for Instance-Based Learners. In: Liu, H., Motoda, H. (eds) Instance Selection and Construction for Data Mining. The Springer International Series in Engineering and Computer Science, vol 608. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3359-4_5
Download citation
DOI: https://doi.org/10.1007/978-1-4757-3359-4_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-4861-8
Online ISBN: 978-1-4757-3359-4
eBook Packages: Springer Book Archive