Abstract
Incremental learning is a good approach for classification when data-sets are too large or when new examples can arrive at any time. Forgetting these examples while keeping only the relevant information lets us reduce memory requirements. The algorithm presented in this paper, called IADEM, has been developed using these approaches and other concepts such as Chernoff and Hoeffding bounds. The most relevant features of this new algorithm are: its capability to deal with datasets of any size for inducing accurate trees and its capacity to keep updated the estimation error of the tree that is being induced. This estimation of the error is fundamental to satisfy the user requirements about the desired error in the tree and to detect noise in the datasets.
This work has been partially supported by the FPI program and the MOISES-TA project, number TIN2005-08832-C03-01, of the MEC, Spain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 226–235. ACM Press, New York (2003)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Fisher, D.H., Schlimmer, J.C.: Models of incremental concept learning: A coupled research proposal. Technical Report CS-88-05, Vanderbilt University (1998)
Schlimmer, J.C., Fisher, D.H.: A case study of incremental concept induction. In: Proc. 5th Nat. Conf. on Artificial Intelligence, Philadelphia, pp. 496–501. Morgan Kaufmann, San Francisco (1986)
Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sums of observations. Annals of Mathematical Statistics 23, 493–507 (1952)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 13–30 (1963)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proc. of the 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 71–80. ACM Press, New York (2000)
Yang, J., Wang, W., Yu, P.S., Han, J.: Mining long sequential patterns in a noisy environment. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 406–417. ACM Press, New York (2002)
Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 523–528. ACM Press, New York (2003)
Blake, C., Merz, C.J.: UCI repository of machine learning databases. University of California, Department of Information and Computer Science (2000)
Utgoff, P.E., Berkman, N.C., Clouse, J.A.: Decision tree induction based on efficient tree restructuring. Machine Learning 29(1), 5–44 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ramos-Jiménez, G., del Campo-Ávila, J., Morales-Bueno, R. (2006). Incremental Algorithm Driven by Error Margins . In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds) Discovery Science. DS 2006. Lecture Notes in Computer Science(), vol 4265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893318_42
Download citation
DOI: https://doi.org/10.1007/11893318_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46491-4
Online ISBN: 978-3-540-46493-8
eBook Packages: Computer ScienceComputer Science (R0)