Prototype Selection Using Boosted Nearest-Neighbors

Nock, Richard; Sebban, Marc

doi:10.1007/978-1-4757-3359-4_17

Richard Nock³ &
Marc Sebban⁴

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 608))

282 Accesses

Abstract

We present a new approach to Prototype Selection (PS), that is, the search for relevant subsets of instances. It is inspired by a recent classification technique known as Boosting, whose ideas were previously unused in that field. Three interesting properties emerge from our adaptation. First, the accuracy, which was the standard in PS since Hart and Gates, is no longer the reliability criterion. Second, PS interacts with a prototype weighting scheme, i. e., each prototype receives periodically a real confidence, its significance, with respect to the currently selected set. Finally, Boosting as used in PS allows to obtain an algorithm whose time complexity compares favorably with classical PS algorithms.

Three types of experiments lead to the following conclusions. First, the optimally reachable prototype subset is almost always more accurate than the whole set. Second, more practically, the output of the algorithm on two types of experiments with fourteen and twenty benchmarks compares favorably to those of five recent or state-of-the-art PS algorithms. Third, visual investigations on a particular simulated dataset gives evidence in that case of the relevance of the selected prototypes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blum, A. and Langley, P. (1997). Selection of relevant features and examples in Machine Learning. Artificial Intelligence, pages 245–272.
Google Scholar
Breiman, L., Freidman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth.
MATH Google Scholar
Brodley, C. (1993). Adressing the selective superiority problem: automatic algorithm/model class selection. In Proc. of the 10 ^th International Conference on Machine Learning, pages 17–24.
Google Scholar
Brodley, C. and Friedl, M. A. (1996). Identifying and eliminating mislabeled training instances. In Proc. of AAAI’96, pages 799–805.
Google Scholar
Buntine, W. and Niblett, T. (1992). A further comparison of splitting rules for Decision-Tree induction. Machine Learning, pages 75–85.
Google Scholar
Cover, T. M. and Hart, P. E. (1967). Nearest Neighbor pattern classification. IEEE Transactions on Information Theory, pages 21–27.
Google Scholar
Freund, Y. and Schapire, R. E. (1997). A Decision-Theoretic generalization of on-line learning and an application to Boosting. Journal of Computer and System Sciences, 55:119–139.
Article MathSciNet MATH Google Scholar
Gates, G. W. (1972). The Reduced Nearest Neighbor rule. IEEE Transactions on Information Theory, pages 431–433.
Google Scholar
Hart, P. E. (1968). The Condensed Nearest Neighbor rule. IEEE Transactions on Information Theory, pages 515–516.
Google Scholar
John, G. H., Kohavi, R., and Pfleger, K. (1994). Irrelevant Features and the subset selection problem. In Proc. of the 11 ^th International Conference on Machine Learning, pages 121–129.
Google Scholar
Quinlan, J. R. (1996). Bagging, Boosting and C4.5. In Proc. of AAAI’ 96, pages 725–730.
Google Scholar
Schapire, R. E. and Singer, Y. (1998). Improved boosting algorithms using confidence-rated predictions. In Proc. of the 11 ^th Annual ACM Conference on Computational Learning Theory, pages 80–91.
Google Scholar
Sebban, M. and Nock, R. (2000). Prototype selection as an informationpreserving problem. In Proc. of the 17 ^th International Conference on Machine Learning. to appear.
Google Scholar
Skalak, D. B. (1994). Prototype and feature selection by sampling and random mutation hill-climbing algorithms. In Proc. of the 11 ^th International Conference on Machine Learning, pages 293–301.
Google Scholar
Wilson, D. and Martinez, T. (1997). Instance pruning techniques. In Proc. of the 14 ^th International Conference on Machine Learning, pages 404–411.
Google Scholar
Zhang, J. (1992). Selecting typical instances in instance-based learning. In Proc. of the 9 ^th International Conference on Machine Learning, pages 470–479.
Google Scholar

Download references

Author information

Authors and Affiliations

Département Scientifique Interfacultés, Campus Universitaire de Schoelcher, Université Antilles-Guyane, 97233, Schoelcher, Martinique, France
Richard Nock
Département de Sciences Juridiques, Campus Universitaire de Fouillole, Université Antilles-Guyane, 97159, Pointe-à-Pitre, Guadeloupe, France
Marc Sebban

Authors

Richard Nock
View author publications
You can also search for this author in PubMed Google Scholar
Marc Sebban
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Arizona State University, USA
Huan Liu
Osaka University, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nock, R., Sebban, M. (2001). Prototype Selection Using Boosted Nearest-Neighbors. In: Liu, H., Motoda, H. (eds) Instance Selection and Construction for Data Mining. The Springer International Series in Engineering and Computer Science, vol 608. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3359-4_17

Download citation

DOI: https://doi.org/10.1007/978-1-4757-3359-4_17
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-4861-8
Online ISBN: 978-1-4757-3359-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics