Abstract
We present a new approach to Prototype Selection (PS), that is, the search for relevant subsets of instances. It is inspired by a recent classification technique known as Boosting, whose ideas were previously unused in that field. Three interesting properties emerge from our adaptation. First, the accuracy, which was the standard in PS since Hart and Gates, is no longer the reliability criterion. Second, PS interacts with a prototype weighting scheme, i. e., each prototype receives periodically a real confidence, its significance, with respect to the currently selected set. Finally, Boosting as used in PS allows to obtain an algorithm whose time complexity compares favorably with classical PS algorithms.
Three types of experiments lead to the following conclusions. First, the optimally reachable prototype subset is almost always more accurate than the whole set. Second, more practically, the output of the algorithm on two types of experiments with fourteen and twenty benchmarks compares favorably to those of five recent or state-of-the-art PS algorithms. Third, visual investigations on a particular simulated dataset gives evidence in that case of the relevance of the selected prototypes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Blum, A. and Langley, P. (1997). Selection of relevant features and examples in Machine Learning. Artificial Intelligence, pages 245–272.
Breiman, L., Freidman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth.
Brodley, C. (1993). Adressing the selective superiority problem: automatic algorithm/model class selection. In Proc. of the 10 th International Conference on Machine Learning, pages 17–24.
Brodley, C. and Friedl, M. A. (1996). Identifying and eliminating mislabeled training instances. In Proc. of AAAI’96, pages 799–805.
Buntine, W. and Niblett, T. (1992). A further comparison of splitting rules for Decision-Tree induction. Machine Learning, pages 75–85.
Cover, T. M. and Hart, P. E. (1967). Nearest Neighbor pattern classification. IEEE Transactions on Information Theory, pages 21–27.
Freund, Y. and Schapire, R. E. (1997). A Decision-Theoretic generalization of on-line learning and an application to Boosting. Journal of Computer and System Sciences, 55:119–139.
Gates, G. W. (1972). The Reduced Nearest Neighbor rule. IEEE Transactions on Information Theory, pages 431–433.
Hart, P. E. (1968). The Condensed Nearest Neighbor rule. IEEE Transactions on Information Theory, pages 515–516.
John, G. H., Kohavi, R., and Pfleger, K. (1994). Irrelevant Features and the subset selection problem. In Proc. of the 11 th International Conference on Machine Learning, pages 121–129.
Quinlan, J. R. (1996). Bagging, Boosting and C4.5. In Proc. of AAAI’ 96, pages 725–730.
Schapire, R. E. and Singer, Y. (1998). Improved boosting algorithms using confidence-rated predictions. In Proc. of the 11 th Annual ACM Conference on Computational Learning Theory, pages 80–91.
Sebban, M. and Nock, R. (2000). Prototype selection as an informationpreserving problem. In Proc. of the 17 th International Conference on Machine Learning. to appear.
Skalak, D. B. (1994). Prototype and feature selection by sampling and random mutation hill-climbing algorithms. In Proc. of the 11 th International Conference on Machine Learning, pages 293–301.
Wilson, D. and Martinez, T. (1997). Instance pruning techniques. In Proc. of the 14 th International Conference on Machine Learning, pages 404–411.
Zhang, J. (1992). Selecting typical instances in instance-based learning. In Proc. of the 9 th International Conference on Machine Learning, pages 470–479.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Nock, R., Sebban, M. (2001). Prototype Selection Using Boosted Nearest-Neighbors. In: Liu, H., Motoda, H. (eds) Instance Selection and Construction for Data Mining. The Springer International Series in Engineering and Computer Science, vol 608. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3359-4_17
Download citation
DOI: https://doi.org/10.1007/978-1-4757-3359-4_17
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-4861-8
Online ISBN: 978-1-4757-3359-4
eBook Packages: Springer Book Archive