Abstract
Selection pressures are pervasive. As data grows, the demand for data reduction increases for effective data mining. Instance selection is one of effective means to data reduction. This chapter expounds basic concepts of instance selection, its context, necessity and functionality. It briefly introduces the state-of-the-art methods for instance selection, and presents an overview of the field as well as a summary of contributing chapters in this collection. Its coverage also includes evaluation issues, related work, and future directions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aha, D. W., Kibler, D., and Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6:37–66.
Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Morden Information Retrieval Addison Wesley and ACM Press.
Blum, A. and Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97:245–271.
Bradley, P., Fayyad, U., and Reina, C. (1998). Scaling clustering algorithms to large databases. In Proceedings of the Fourth International Conference on Knowledge Discovery & Data Mining, pages 9–15. AAAI PRESS, California.
Breiman, L. and Friedman, J. (1984). Tool for large data set analysis. In Wegman, E. and Smith, J., editors, Statistical Signal Processing, pages 191–197. New York: M. Dekker.
Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classification and Regression Trees. Wadsworth & Brooks/Cole Advanced Books & Software.
Burges, C. (1998). A tutorial on support vector machines. Journal of Data Mining and Knowledge Discovery, 2.
Chang, C. (1974). Finding prototypes for nearest neighbor classifiers. IEEE Transactions on Computers, C-23.
Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. Wiley.
DuMouchel, W., Volinsky, C., Johnson, T., Cortes, C., and Pregibon, D. (1999). Squashing flat files flatter. In Proceedings of the 5th ACM Conference on Knowledege Discovery and Data Mining. AIII/MIT Press.
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R., editors (1996). Advances in Knowledge Discovery and Data Mining. AAAI Press / The MIT Press.
Fisher, D. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2:139–172.
Freund, Y. and Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer Systems and Science, 55(1):119–139.
Lewis, D. and Gale, W. (1994). A sequential algorithm for training text classifiers. In Proceedings of the Seventeenth Annual ACM-SIGR Conference on Research and Development in Information Retrieval, pages 3–12.
Liu, H. and Motoda, H., editors (1998). Feature Extraction, Construction and Selection: A Data Mining Perspective. Boston: Kluwer Academic Publishers.
Michalski, R. (1975). On the selection of representative samples from large relational tables for inductive inference. Report No. M.D.C. 1.1.9, Department of Engineering, University of Illinois at Chicago Circle.
Provost, F. and Kolluri, V. (1999). A survey fo methods for scaling up inductive algorithms. Journal of Data Mining and Knowledge Discovery, 3:131–169.
Quinlan, J. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann.
Schapire, R. (1990). The strength of weak learnability. Machine Learning, 5(2):197–227.
Seung, H., Opper, M., and Sompolinsky, H. (1992). Query by committee. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pages 287–294, Pittsburgh, PA. ACM Press, New York.
Syed, N., Liu, H., and Sung, K. (1999a). Handling concept drifts in incremental learning with support vector machines. In Chaudhuri, S. and Madigan, D., editors, Proceedings of ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining, pages 317–321, New York, NY. ACM.
Syed, N., Liu, H., and Sung, K. (1999b). A study of support vectors on model independent example selection. In Chaudhuri, S. and Madigan, D., editors, Proceedings of ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining, pages 272–276, New York, NY. ACM.
Szalay, A. and Gray, J. (1999). Drowning in data. Scientific American, page www.sciam.com/explorations/1999/.
Weiss, S. and Indurkhya, N. (1998). Predictive Data Mining. Morgan Kaufmann Publishers, San Francisco, California.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Liu, H., Motoda, H. (2001). Data Reduction via Instance Selection. In: Liu, H., Motoda, H. (eds) Instance Selection and Construction for Data Mining. The Springer International Series in Engineering and Computer Science, vol 608. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3359-4_1
Download citation
DOI: https://doi.org/10.1007/978-1-4757-3359-4_1
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-4861-8
Online ISBN: 978-1-4757-3359-4
eBook Packages: Springer Book Archive