Abstract
This paper sheds light on a strong connection between AdaBoost and several optimization algorithms for data mining. AdaBoost has been the subject of much interests as an effective methodology for classification task. AdaBoost repeatedly generates one hypothesis in each round, and finally it is able to make a highly accurate prediction by taking a weighted majority vote on the resulting hypotheses. Freund and Schapire have remarked that the use of simple hypotheses such as singletest decision trees instead of huge trees would be promising for achieving high accuracy and avoiding overfitting to the training data. One major drawback of this approach however is that accuracies of simple individual hypotheses may not always be high, hence demanding a way of computing more accurate (or, the most accurate) simple hypotheses effciently. In this paper, we consider several classes of simple but expressive hypotheses such as ranges and regions for numeric attributes, subsets of categorical values, and conjunctions of Boolean tests. For each class, we develop an efficient algorithm for choosing the optimal hypothesis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 26–28, 1993, pages 207–216. ACM Press, 1993
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB’94, Proceedings of 20th International Conference on Very Large Data Bases, September 12–15, 1994, Santiago de Chile, Chile, pages 487–499. Morgan Kaufmann, 1994.
E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36(2):105–139, 1999.
J. Bentley. Programming pearls. Communications of the ACM, 27(27):865–871, Sept. 1984.
S. Brin, R. Rastogi, and K. Shim. Mining optimized gain rules for numeric attributes. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining, 15–18 August 1999, San Diego, CA USA, pages 135–144. ACM Press, 1999
C. Domingo and O. Watanabe. A modification of adaboost: A preliminary report. Research Reports, Dept. of Math. and Comp. Sciences, Tokyo Institute of Technology, (C-133), July 1999.
Y. Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121(2):256–285, 1995.
Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference, pages 148–156, 1996.
Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, Aug. 1997.
T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Data mining using twodimensional optimized accociation rules: Scheme, algorithms, and visualization. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4–6, 1996, pages 13–23. ACM Press, 1996
S. Khanna, S. Muthukrishnan, and M. Paterson. On approximating rectangle tiling and packing. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 384–393, Jan. 1998.
S. Morishita. On classification and regression. In Proceedings of DiscoveryScience, First International Conference, DS’98 — Lecture Notes in Artificial Intelligence, volume 1532, pages 40–57, Dec. 1998.
S. Morishita and J. Sese. Traversing itemset lattices with statistical metric pruning. In Proc. of ACM SIGACT-SIGMOD-SIGART Symp. on Database Systems (PODS), pages 226–236, May 2000.
R. E. Schapire. The strength of weak learnability (extended abstract). In FOCS, pages 28–33, 1989.
H. Tamaki and T. Tokuyama. Algorithms for the maxium subarray problem based on matrix multiplication. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 446–452, Jan. 1998.
K. Yoda, T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Computing optimized rectilinear regions for association rules. In Proceedings of the Third International Conference on Knowledge Discoveryand Data Mining, pages 96–103, Aug. 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Morishita, S. (2002). Computing Optimal Hypotheses Efficiently for Boosting. In: Arikawa, S., Shinohara, A. (eds) Progress in Discovery Science. Lecture Notes in Computer Science(), vol 2281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45884-0_35
Download citation
DOI: https://doi.org/10.1007/3-540-45884-0_35
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43338-5
Online ISBN: 978-3-540-45884-5
eBook Packages: Springer Book Archive