Skip to main content

Prototype Selection Using Boosted Nearest-Neighbors

  • Chapter
Instance Selection and Construction for Data Mining

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 608))

  • 282 Accesses

Abstract

We present a new approach to Prototype Selection (PS), that is, the search for relevant subsets of instances. It is inspired by a recent classification technique known as Boosting, whose ideas were previously unused in that field. Three interesting properties emerge from our adaptation. First, the accuracy, which was the standard in PS since Hart and Gates, is no longer the reliability criterion. Second, PS interacts with a prototype weighting scheme, i. e., each prototype receives periodically a real confidence, its significance, with respect to the currently selected set. Finally, Boosting as used in PS allows to obtain an algorithm whose time complexity compares favorably with classical PS algorithms.

Three types of experiments lead to the following conclusions. First, the optimally reachable prototype subset is almost always more accurate than the whole set. Second, more practically, the output of the algorithm on two types of experiments with fourteen and twenty benchmarks compares favorably to those of five recent or state-of-the-art PS algorithms. Third, visual investigations on a particular simulated dataset gives evidence in that case of the relevance of the selected prototypes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Blum, A. and Langley, P. (1997). Selection of relevant features and examples in Machine Learning. Artificial Intelligence, pages 245–272.

    Google Scholar 

  • Breiman, L., Freidman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth.

    MATH  Google Scholar 

  • Brodley, C. (1993). Adressing the selective superiority problem: automatic algorithm/model class selection. In Proc. of the 10 th International Conference on Machine Learning, pages 17–24.

    Google Scholar 

  • Brodley, C. and Friedl, M. A. (1996). Identifying and eliminating mislabeled training instances. In Proc. of AAAI’96, pages 799–805.

    Google Scholar 

  • Buntine, W. and Niblett, T. (1992). A further comparison of splitting rules for Decision-Tree induction. Machine Learning, pages 75–85.

    Google Scholar 

  • Cover, T. M. and Hart, P. E. (1967). Nearest Neighbor pattern classification. IEEE Transactions on Information Theory, pages 21–27.

    Google Scholar 

  • Freund, Y. and Schapire, R. E. (1997). A Decision-Theoretic generalization of on-line learning and an application to Boosting. Journal of Computer and System Sciences, 55:119–139.

    Article  MathSciNet  MATH  Google Scholar 

  • Gates, G. W. (1972). The Reduced Nearest Neighbor rule. IEEE Transactions on Information Theory, pages 431–433.

    Google Scholar 

  • Hart, P. E. (1968). The Condensed Nearest Neighbor rule. IEEE Transactions on Information Theory, pages 515–516.

    Google Scholar 

  • John, G. H., Kohavi, R., and Pfleger, K. (1994). Irrelevant Features and the subset selection problem. In Proc. of the 11 th International Conference on Machine Learning, pages 121–129.

    Google Scholar 

  • Quinlan, J. R. (1996). Bagging, Boosting and C4.5. In Proc. of AAAI’ 96, pages 725–730.

    Google Scholar 

  • Schapire, R. E. and Singer, Y. (1998). Improved boosting algorithms using confidence-rated predictions. In Proc. of the 11 th Annual ACM Conference on Computational Learning Theory, pages 80–91.

    Google Scholar 

  • Sebban, M. and Nock, R. (2000). Prototype selection as an informationpreserving problem. In Proc. of the 17 th International Conference on Machine Learning. to appear.

    Google Scholar 

  • Skalak, D. B. (1994). Prototype and feature selection by sampling and random mutation hill-climbing algorithms. In Proc. of the 11 th International Conference on Machine Learning, pages 293–301.

    Google Scholar 

  • Wilson, D. and Martinez, T. (1997). Instance pruning techniques. In Proc. of the 14 th International Conference on Machine Learning, pages 404–411.

    Google Scholar 

  • Zhang, J. (1992). Selecting typical instances in instance-based learning. In Proc. of the 9 th International Conference on Machine Learning, pages 470–479.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Nock, R., Sebban, M. (2001). Prototype Selection Using Boosted Nearest-Neighbors. In: Liu, H., Motoda, H. (eds) Instance Selection and Construction for Data Mining. The Springer International Series in Engineering and Computer Science, vol 608. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3359-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-3359-4_17

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-4861-8

  • Online ISBN: 978-1-4757-3359-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics