Incremental Classification Using Tree-Based Sampling for Large Data

Yoon, Hankil; Alsabti, Khaled; Ranka, Sanjay

doi:10.1007/978-1-4757-3359-4_11

Hankil Yoon³,
Khaled Alsabti⁴ &
Sanjay Ranka³

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 608))

285 Accesses

Abstract

We present an efficient method called ICE for incremental classification that employs tree-based sampling techniques and is independent of data distribution. The basic idea is to represent the class distribution in the dataset by using the weighted samples. The weighted samples are extracted from the nodes of intermediate decision trees using a clustering technique. As the data grows, an intermediate classifier is built only on the incremental portion of the data. The weighted samples from the intermediate classifier are combined with the previously generated samples to obtain an up-to-date classifier for the current data in an efficient, incremental fashion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imielinski, T., and Swami, A. (1993). Database Mining: A Performance Perspective. IEEE Transactions on Knowledge and Data Engineering, 5(6):914–925.
Article Google Scholar
AlSabti, K. (1998). Efficient Algorithms for Data Mining. PhD thesis, Syracuse University.
Google Scholar
AlSabti, K., Ranka, S., and Singh, V. (1998). Coulds: A decision tree classifier for large datasets. In International Conference on Knowledge Discovery and Data Mining, pages 2–8, New York, NY.
Google Scholar
Blake, C., Keogh, E., and Merz, C. J. (1998). Uci repository of machine learning databases. The URL is http://www.ics.uci.edu/~mlearn/ML-Repository.html.
Google Scholar
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont.
MATH Google Scholar
Catlett, J. (1984). Megainduction: Machine Learning on Very Large Databases. PhD thesis, University of Sydney.
Google Scholar
Chan, P. K. and Stolfo, S. J. (1997). On the accuracy of meta-learning for scalable data mining. Intelligent Information Systems, 8:5–28.
Article Google Scholar
Cheeseman, P., Kelly, J., Self, M., Stutz, J., and Taylor, W. (1988). Autoclass: A bayesian classification system. In The 5th Internaltion Conference on Machine Learning, pages 54–64, San Francisco, CA.
Google Scholar
Ester, M., Kriegel, H., Sander, J., and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In International Conference on Knowledge Discovery and Data Mining, pages 226–231, Portland, OR.
Google Scholar
Gehrke, J., Ganti, V., Ramakrishnan, R., and Loh, W.-Y. (1999). Boat: Optimistic decision tree construction. In ACM SIGMOD Conference, pages 169–180, Philadelphia, PA.
Google Scholar
Gehrke, J., Ramakrishinan, R., and Ganti, V. (1998). Rainforest: A framework for fast decision tree classification of large datasets. In Internation Conference on Very Large Databases, pages 416–427, New York, NY.
Google Scholar
Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Morgan Kaufman, San Francisco, CA.
MATH Google Scholar
James, M. (1985). Classification Algorithms. Wiley and Sons, New York, NY.
MATH Google Scholar
Kaufman, L. and Rousseeuw, P. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley and Sons, New York, NY.
Book Google Scholar
Lim, T.-S., Loh, W.-Y., and Shih, Y.-S. (1997). An Emperical Comparison of Decision Trees and Other Classification Methods. Technical Report TR 979, Department of Statistics, University of Wisconsin, Madison.
Google Scholar
Mitchie, D., Spiegelhalter, D. J., and Taylor, C. C. (1994). Machine Learning, Neural and Statistical Classification. Ellis Horwood, New York, NY.
Google Scholar
Morimoto, Y., Fukuta, T., Matsuzawa, H., Tokuyama, T., and Yoda, K. (1998). Algorithms for mining association rules for binary segmentations of huge categorical databases. In International Conference on Very Large Databases, pages 380–391, New York, NY.
Google Scholar
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1:81–106.
Google Scholar
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufman, San Francisco, CA.
Google Scholar
Quinlan, J. R. and Rivest, R. L. (1989). Inferring decision trees using minimum description length principle. Information and Computation, 80:227–248.
Article MathSciNet MATH Google Scholar
Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, UK.
MATH Google Scholar
Shafer, J., Agrawal, R., and Mehta, M. (1996). Sprint: A scalable parallel classifier for data mining. In International Conference on Very Large Databases, pages 544–555, Bombay, India.
Google Scholar
Weiss, S. M. and Kulikowski, C. A. (1991). Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. Morgan Kaufman, San Mateo, CA.
Google Scholar
Yoon, H. (2000). Efficient Algorithms and Software for Mining Sparse, High-dimensional Data. PhD thesis, University of Florida.
Google Scholar
Zhang, T., Ramakrishinan, R., and Livny, M. (1996). Birch: An efficient data clustering method for very large databases. In ACM SIGMOD Conference, pages 103–114, Montreal, Canada.
Google Scholar

Download references

Author information

Authors and Affiliations

CISE Department, University of Florida, Gainesville, FL, 32611, USA
Hankil Yoon & Sanjay Ranka
Computer Science Department, King Saud University, P.O.Box 51178, Riyadh, 11543, Saudi Arabia
Khaled Alsabti

Authors

Hankil Yoon
View author publications
You can also search for this author in PubMed Google Scholar
Khaled Alsabti
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Ranka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Arizona State University, USA
Huan Liu
Osaka University, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yoon, H., Alsabti, K., Ranka, S. (2001). Incremental Classification Using Tree-Based Sampling for Large Data. In: Liu, H., Motoda, H. (eds) Instance Selection and Construction for Data Mining. The Springer International Series in Engineering and Computer Science, vol 608. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3359-4_11

Download citation

DOI: https://doi.org/10.1007/978-1-4757-3359-4_11
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-4861-8
Online ISBN: 978-1-4757-3359-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics