Identifying Competence-Critical Instances for Instance-Based Learners

Brighton, Henry; Mellish, Chris

doi:10.1007/978-1-4757-3359-4_5

Henry Brighton³ &
Chris Mellish³

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 608))

284 Accesses
7 Citations

Abstract

The basic nearest neighbour classifier suffers from the indiscriminate storage of all presented training instances. With a large database of instances classification response time can be slow. When noisy instances are present classification accuracy can suffer. Drawing on the large body of relevant work carried out in the past 30 years, we review the principle approaches to solving these problems. By deleting instances, both problems can be alleviated, but the criterion used is typically assumed to be all encompassing and effective over many domains. We argue against this position and introduce an algorithm that rivals the most successful existing algorithm. When evaluated on 30 different problems, neither algorithm consistently outperforms the other: consistency is very hard. To achieve the best results, we need to develop mechanisms that provide insights into the structure of class definitions. We discuss the possibility of these mechanisms and propose some initial measures that could be useful for the data miner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aha, D. W., Kibler, D., and Albert, M. K. (1991). Instance based learning algorithms. Machine Learning, 6(1):37–66.
Google Scholar
Blake, C. and Merz, C. (1998). UCI repository of machine learning databases.
Google Scholar
Brighton, H. (1997). Information filtering for lazy learning algorithms. Masters Thesis, Centre for Cognitive Science, University of Edinburgh, Scotland.
Google Scholar
Brighton, H. and Mellish, C. (1999). On the consistency of information filters for lazy learning algorithms. In Zytkow, J. M. and Rauch, J., editors, Principles of Data Mining and Knowledge Discovery: 3rd European Conference, LNAI 1704, pages 283–288, Prague, Czech Republic. Springer.
Chapter Google Scholar
Brodley, C. (1993). Addressing the selective superiority problem: Automatic algorithm/mode class selection. In Proceedings of the Tenth International Machine Learning Conference, pages 17–24, Amherst, MA.
Google Scholar
Cameron-Jones, R. M. (1992). Minimum description length instance-based learning. In Proceedings of the Fifth Australian Joint Conference on Artificial Intelligence, pages 368–373, Hobart, Australia. World Scientific.
Google Scholar
Chang, C.-L. (1974). Finding prototypes for nearest neighbor classifiers. In IEEE Transactions on Computers, volume C-23, pages 1179–1184. IEEE.
Google Scholar
Cover, T. M. and Hart, P. E. (1967). Nearest neighbor pattern classification. Institute of Electrical and Electronics Engineers Transactions on Information Theory, IT-13:21–27.
Article MATH Google Scholar
Daelemans, W., van den Bosch, A., and Zavrel, J. (1999). Forgetting exceptions is harmful in language learning. Machine Learning, 34(1/3) :11–41.
Article MATH Google Scholar
Dasarathy, B. (1991). Nearest Neighbor (NN) norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alimos, CA.
Google Scholar
Gates, G. W. (1972). The reduced nearest neighbor rule. Institute of Electrical and Electronics Engineers Transactions on Information Theory, 18(3):431–433.
Article Google Scholar
Hart, P. E. (1968). The condensed nearest neighbor rule. Institute of Electrical and Electronics Engineers Transactions on Information Theory, 14(3) :515–516.
Article Google Scholar
King, R. D., Feng, C., and Sutherland, A. (1995). Statlog: Comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence, 9(3):289–333.
Article Google Scholar
Kolodner, J. L. (1993). Case-based reasoning. Morgan Kaufmann Publishers, San Mateo, Calif.
Google Scholar
Markovitch, S. and Scott, P. D. (1988). The role of forgetting in learning. In Proceedings of the Fifth International Conference on Machine Learning, pages 459–465, Ann Arbor, MI. Morgan Kaufmann Publishers.
Google Scholar
Markovitch, S. and Scott, P. D. (1993). Information filtering: Selection mechanisms in learning systems. Machine Learning, 10(2): 113–151.
Google Scholar
Ritter, G. L., Woodruff, H. B., Lowry, S. R., and Isenhour, T. L. (1975). An algorithm for the selective nearest neighbour decision rule. Institute of Electrical and Electronics Engineers Transactions on Information Theory, 21(6):665–669.
Article MATH Google Scholar
Sebban, M., Zighed, D. A., and Di Palma, S. (1999). Selection and statistical validation of features and prototypes. In Zytkow, J. M. and Rauch, J., editors, Principles of Data Mining and Knowledge Discovery: 3rd European Conference, LNAI 1704, pages 184–192, Prague, Czech Republic. Springer.
Chapter Google Scholar
Smyth, B. and Keane, M. T. (1995). Remembering to forget. In Mellish, C. S., editor, IJCAI-95: Proceedings of the Fourteenth International Conference on Artificial Intelligence, volume 1, pages 377–382. Morgan Kaufmann Publishers.
Google Scholar
Swonger, C. W. (1972). Sample set condensation for a condensed nearest neighbour decision rule for pattern recognition. In Watariabe, S., editor, Frontiers of Pattern Recognition, pages 511–519. Academic Press, Orlando, Fla.
Google Scholar
Tomek, I. (1976). An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics, SMC-6(6):448–452.
Article MathSciNet MATH Google Scholar
Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics, SMC-2(3):408–421.
Article MATH Google Scholar
Wilson, D. R. and Martinez, A. R. (1997). Instance pruning techniques. In Fisher, D., editor, Machine Learning: Proceedings of the Fourteenth International Conference, San Francisco, CA. Morgan Kaufmann.
Google Scholar
Zhang, J. (1992). Selecting typical instances in instance-based learning. In Proceedings of the Ninth International Machine Learning Conference, pages 470–479, Aberdeen, Scotland. Morgan Kaufmann.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Artificial Intelligence, The University of Edinburgh, Edinburgh, EH1 2QL, UK
Henry Brighton & Chris Mellish

Authors

Henry Brighton
View author publications
You can also search for this author in PubMed Google Scholar
Chris Mellish
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Arizona State University, USA
Huan Liu
Osaka University, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Brighton, H., Mellish, C. (2001). Identifying Competence-Critical Instances for Instance-Based Learners. In: Liu, H., Motoda, H. (eds) Instance Selection and Construction for Data Mining. The Springer International Series in Engineering and Computer Science, vol 608. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3359-4_5

Download citation

DOI: https://doi.org/10.1007/978-1-4757-3359-4_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-4861-8
Online ISBN: 978-1-4757-3359-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics