Abstract
We present an efficient method called ICE for incremental classification that employs tree-based sampling techniques and is independent of data distribution. The basic idea is to represent the class distribution in the dataset by using the weighted samples. The weighted samples are extracted from the nodes of intermediate decision trees using a clustering technique. As the data grows, an intermediate classifier is built only on the incremental portion of the data. The weighted samples from the intermediate classifier are combined with the previously generated samples to obtain an up-to-date classifier for the current data in an efficient, incremental fashion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Imielinski, T., and Swami, A. (1993). Database Mining: A Performance Perspective. IEEE Transactions on Knowledge and Data Engineering, 5(6):914–925.
AlSabti, K. (1998). Efficient Algorithms for Data Mining. PhD thesis, Syracuse University.
AlSabti, K., Ranka, S., and Singh, V. (1998). Coulds: A decision tree classifier for large datasets. In International Conference on Knowledge Discovery and Data Mining, pages 2–8, New York, NY.
Blake, C., Keogh, E., and Merz, C. J. (1998). Uci repository of machine learning databases. The URL is http://www.ics.uci.edu/~mlearn/ML-Repository.html.
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont.
Catlett, J. (1984). Megainduction: Machine Learning on Very Large Databases. PhD thesis, University of Sydney.
Chan, P. K. and Stolfo, S. J. (1997). On the accuracy of meta-learning for scalable data mining. Intelligent Information Systems, 8:5–28.
Cheeseman, P., Kelly, J., Self, M., Stutz, J., and Taylor, W. (1988). Autoclass: A bayesian classification system. In The 5th Internaltion Conference on Machine Learning, pages 54–64, San Francisco, CA.
Ester, M., Kriegel, H., Sander, J., and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In International Conference on Knowledge Discovery and Data Mining, pages 226–231, Portland, OR.
Gehrke, J., Ganti, V., Ramakrishnan, R., and Loh, W.-Y. (1999). Boat: Optimistic decision tree construction. In ACM SIGMOD Conference, pages 169–180, Philadelphia, PA.
Gehrke, J., Ramakrishinan, R., and Ganti, V. (1998). Rainforest: A framework for fast decision tree classification of large datasets. In Internation Conference on Very Large Databases, pages 416–427, New York, NY.
Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Morgan Kaufman, San Francisco, CA.
James, M. (1985). Classification Algorithms. Wiley and Sons, New York, NY.
Kaufman, L. and Rousseeuw, P. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley and Sons, New York, NY.
Lim, T.-S., Loh, W.-Y., and Shih, Y.-S. (1997). An Emperical Comparison of Decision Trees and Other Classification Methods. Technical Report TR 979, Department of Statistics, University of Wisconsin, Madison.
Mitchie, D., Spiegelhalter, D. J., and Taylor, C. C. (1994). Machine Learning, Neural and Statistical Classification. Ellis Horwood, New York, NY.
Morimoto, Y., Fukuta, T., Matsuzawa, H., Tokuyama, T., and Yoda, K. (1998). Algorithms for mining association rules for binary segmentations of huge categorical databases. In International Conference on Very Large Databases, pages 380–391, New York, NY.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1:81–106.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufman, San Francisco, CA.
Quinlan, J. R. and Rivest, R. L. (1989). Inferring decision trees using minimum description length principle. Information and Computation, 80:227–248.
Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, UK.
Shafer, J., Agrawal, R., and Mehta, M. (1996). Sprint: A scalable parallel classifier for data mining. In International Conference on Very Large Databases, pages 544–555, Bombay, India.
Weiss, S. M. and Kulikowski, C. A. (1991). Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. Morgan Kaufman, San Mateo, CA.
Yoon, H. (2000). Efficient Algorithms and Software for Mining Sparse, High-dimensional Data. PhD thesis, University of Florida.
Zhang, T., Ramakrishinan, R., and Livny, M. (1996). Birch: An efficient data clustering method for very large databases. In ACM SIGMOD Conference, pages 103–114, Montreal, Canada.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Yoon, H., Alsabti, K., Ranka, S. (2001). Incremental Classification Using Tree-Based Sampling for Large Data. In: Liu, H., Motoda, H. (eds) Instance Selection and Construction for Data Mining. The Springer International Series in Engineering and Computer Science, vol 608. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3359-4_11
Download citation
DOI: https://doi.org/10.1007/978-1-4757-3359-4_11
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-4861-8
Online ISBN: 978-1-4757-3359-4
eBook Packages: Springer Book Archive