Skip to main content

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 608))

Abstract

Selection pressures are pervasive. As data grows, the demand for data reduction increases for effective data mining. Instance selection is one of effective means to data reduction. This chapter expounds basic concepts of instance selection, its context, necessity and functionality. It briefly introduces the state-of-the-art methods for instance selection, and presents an overview of the field as well as a summary of contributing chapters in this collection. Its coverage also includes evaluation issues, related work, and future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aha, D. W., Kibler, D., and Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6:37–66.

    Google Scholar 

  • Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Morden Information Retrieval Addison Wesley and ACM Press.

    Google Scholar 

  • Blum, A. and Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97:245–271.

    Article  MathSciNet  MATH  Google Scholar 

  • Bradley, P., Fayyad, U., and Reina, C. (1998). Scaling clustering algorithms to large databases. In Proceedings of the Fourth International Conference on Knowledge Discovery & Data Mining, pages 9–15. AAAI PRESS, California.

    Google Scholar 

  • Breiman, L. and Friedman, J. (1984). Tool for large data set analysis. In Wegman, E. and Smith, J., editors, Statistical Signal Processing, pages 191–197. New York: M. Dekker.

    Google Scholar 

  • Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classification and Regression Trees. Wadsworth & Brooks/Cole Advanced Books & Software.

    MATH  Google Scholar 

  • Burges, C. (1998). A tutorial on support vector machines. Journal of Data Mining and Knowledge Discovery, 2.

    Google Scholar 

  • Chang, C. (1974). Finding prototypes for nearest neighbor classifiers. IEEE Transactions on Computers, C-23.

    Google Scholar 

  • Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. Wiley.

    Book  MATH  Google Scholar 

  • DuMouchel, W., Volinsky, C., Johnson, T., Cortes, C., and Pregibon, D. (1999). Squashing flat files flatter. In Proceedings of the 5th ACM Conference on Knowledege Discovery and Data Mining. AIII/MIT Press.

    Google Scholar 

  • Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R., editors (1996). Advances in Knowledge Discovery and Data Mining. AAAI Press / The MIT Press.

    Google Scholar 

  • Fisher, D. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2:139–172.

    Google Scholar 

  • Freund, Y. and Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer Systems and Science, 55(1):119–139.

    Article  MathSciNet  MATH  Google Scholar 

  • Lewis, D. and Gale, W. (1994). A sequential algorithm for training text classifiers. In Proceedings of the Seventeenth Annual ACM-SIGR Conference on Research and Development in Information Retrieval, pages 3–12.

    Google Scholar 

  • Liu, H. and Motoda, H., editors (1998). Feature Extraction, Construction and Selection: A Data Mining Perspective. Boston: Kluwer Academic Publishers.

    Book  MATH  Google Scholar 

  • Michalski, R. (1975). On the selection of representative samples from large relational tables for inductive inference. Report No. M.D.C. 1.1.9, Department of Engineering, University of Illinois at Chicago Circle.

    Google Scholar 

  • Provost, F. and Kolluri, V. (1999). A survey fo methods for scaling up inductive algorithms. Journal of Data Mining and Knowledge Discovery, 3:131–169.

    Article  Google Scholar 

  • Quinlan, J. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann.

    Google Scholar 

  • Schapire, R. (1990). The strength of weak learnability. Machine Learning, 5(2):197–227.

    Google Scholar 

  • Seung, H., Opper, M., and Sompolinsky, H. (1992). Query by committee. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pages 287–294, Pittsburgh, PA. ACM Press, New York.

    Chapter  Google Scholar 

  • Syed, N., Liu, H., and Sung, K. (1999a). Handling concept drifts in incremental learning with support vector machines. In Chaudhuri, S. and Madigan, D., editors, Proceedings of ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining, pages 317–321, New York, NY. ACM.

    Chapter  Google Scholar 

  • Syed, N., Liu, H., and Sung, K. (1999b). A study of support vectors on model independent example selection. In Chaudhuri, S. and Madigan, D., editors, Proceedings of ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining, pages 272–276, New York, NY. ACM.

    Chapter  Google Scholar 

  • Szalay, A. and Gray, J. (1999). Drowning in data. Scientific American, page www.sciam.com/explorations/1999/.

    Google Scholar 

  • Weiss, S. and Indurkhya, N. (1998). Predictive Data Mining. Morgan Kaufmann Publishers, San Francisco, California.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Liu, H., Motoda, H. (2001). Data Reduction via Instance Selection. In: Liu, H., Motoda, H. (eds) Instance Selection and Construction for Data Mining. The Springer International Series in Engineering and Computer Science, vol 608. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3359-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-3359-4_1

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-4861-8

  • Online ISBN: 978-1-4757-3359-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics