Data Reduction via Instance Selection

Liu, Huan; Motoda, Hiroshi

doi:10.1007/978-1-4757-3359-4_1

Huan Liu³ &
Hiroshi Motoda⁴

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 608))

299 Accesses
6 Citations

Abstract

Selection pressures are pervasive. As data grows, the demand for data reduction increases for effective data mining. Instance selection is one of effective means to data reduction. This chapter expounds basic concepts of instance selection, its context, necessity and functionality. It briefly introduces the state-of-the-art methods for instance selection, and presents an overview of the field as well as a summary of contributing chapters in this collection. Its coverage also includes evaluation issues, related work, and future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aha, D. W., Kibler, D., and Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6:37–66.
Google Scholar
Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Morden Information Retrieval Addison Wesley and ACM Press.
Google Scholar
Blum, A. and Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97:245–271.
Article MathSciNet MATH Google Scholar
Bradley, P., Fayyad, U., and Reina, C. (1998). Scaling clustering algorithms to large databases. In Proceedings of the Fourth International Conference on Knowledge Discovery & Data Mining, pages 9–15. AAAI PRESS, California.
Google Scholar
Breiman, L. and Friedman, J. (1984). Tool for large data set analysis. In Wegman, E. and Smith, J., editors, Statistical Signal Processing, pages 191–197. New York: M. Dekker.
Google Scholar
Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classification and Regression Trees. Wadsworth & Brooks/Cole Advanced Books & Software.
MATH Google Scholar
Burges, C. (1998). A tutorial on support vector machines. Journal of Data Mining and Knowledge Discovery, 2.
Google Scholar
Chang, C. (1974). Finding prototypes for nearest neighbor classifiers. IEEE Transactions on Computers, C-23.
Google Scholar
Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. Wiley.
Book MATH Google Scholar
DuMouchel, W., Volinsky, C., Johnson, T., Cortes, C., and Pregibon, D. (1999). Squashing flat files flatter. In Proceedings of the 5th ACM Conference on Knowledege Discovery and Data Mining. AIII/MIT Press.
Google Scholar
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R., editors (1996). Advances in Knowledge Discovery and Data Mining. AAAI Press / The MIT Press.
Google Scholar
Fisher, D. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2:139–172.
Google Scholar
Freund, Y. and Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer Systems and Science, 55(1):119–139.
Article MathSciNet MATH Google Scholar
Lewis, D. and Gale, W. (1994). A sequential algorithm for training text classifiers. In Proceedings of the Seventeenth Annual ACM-SIGR Conference on Research and Development in Information Retrieval, pages 3–12.
Google Scholar
Liu, H. and Motoda, H., editors (1998). Feature Extraction, Construction and Selection: A Data Mining Perspective. Boston: Kluwer Academic Publishers.
Book MATH Google Scholar
Michalski, R. (1975). On the selection of representative samples from large relational tables for inductive inference. Report No. M.D.C. 1.1.9, Department of Engineering, University of Illinois at Chicago Circle.
Google Scholar
Provost, F. and Kolluri, V. (1999). A survey fo methods for scaling up inductive algorithms. Journal of Data Mining and Knowledge Discovery, 3:131–169.
Article Google Scholar
Quinlan, J. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann.
Google Scholar
Schapire, R. (1990). The strength of weak learnability. Machine Learning, 5(2):197–227.
Google Scholar
Seung, H., Opper, M., and Sompolinsky, H. (1992). Query by committee. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pages 287–294, Pittsburgh, PA. ACM Press, New York.
Chapter Google Scholar
Syed, N., Liu, H., and Sung, K. (1999a). Handling concept drifts in incremental learning with support vector machines. In Chaudhuri, S. and Madigan, D., editors, Proceedings of ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining, pages 317–321, New York, NY. ACM.
Chapter Google Scholar
Syed, N., Liu, H., and Sung, K. (1999b). A study of support vectors on model independent example selection. In Chaudhuri, S. and Madigan, D., editors, Proceedings of ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining, pages 272–276, New York, NY. ACM.
Chapter Google Scholar
Szalay, A. and Gray, J. (1999). Drowning in data. Scientific American, page www.sciam.com/explorations/1999/.
Google Scholar
Weiss, S. and Indurkhya, N. (1998). Predictive Data Mining. Morgan Kaufmann Publishers, San Francisco, California.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, Arizona State University, Tempe, AZ, 85287-5406, USA
Huan Liu
Institute of Scientific & Industrial Research, Osaka University, Ibaraki, Osaka, 567-0047, Japan
Hiroshi Motoda

Authors

Huan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Motoda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Arizona State University, USA
Huan Liu
Osaka University, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Liu, H., Motoda, H. (2001). Data Reduction via Instance Selection. In: Liu, H., Motoda, H. (eds) Instance Selection and Construction for Data Mining. The Springer International Series in Engineering and Computer Science, vol 608. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3359-4_1

Download citation

DOI: https://doi.org/10.1007/978-1-4757-3359-4_1
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-4861-8
Online ISBN: 978-1-4757-3359-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics