Computing Optimal Hypotheses Efficiently for Boosting

Morishita, Shinichi

doi:10.1007/3-540-45884-0_35

Shinichi Morishita²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2281))

505 Accesses
5 Citations

Abstract

This paper sheds light on a strong connection between AdaBoost and several optimization algorithms for data mining. AdaBoost has been the subject of much interests as an effective methodology for classification task. AdaBoost repeatedly generates one hypothesis in each round, and finally it is able to make a highly accurate prediction by taking a weighted majority vote on the resulting hypotheses. Freund and Schapire have remarked that the use of simple hypotheses such as singletest decision trees instead of huge trees would be promising for achieving high accuracy and avoiding overfitting to the training data. One major drawback of this approach however is that accuracies of simple individual hypotheses may not always be high, hence demanding a way of computing more accurate (or, the most accurate) simple hypotheses effciently. In this paper, we consider several classes of simple but expressive hypotheses such as ranges and regions for numeric attributes, subsets of categorical values, and conjunctions of Boolean tests. For each class, we develop an efficient algorithm for choosing the optimal hypothesis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 26–28, 1993, pages 207–216. ACM Press, 1993
Google Scholar
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB’94, Proceedings of 20th International Conference on Very Large Data Bases, September 12–15, 1994, Santiago de Chile, Chile, pages 487–499. Morgan Kaufmann, 1994.
Google Scholar
E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36(2):105–139, 1999.
Article Google Scholar
J. Bentley. Programming pearls. Communications of the ACM, 27(27):865–871, Sept. 1984.
Article Google Scholar
S. Brin, R. Rastogi, and K. Shim. Mining optimized gain rules for numeric attributes. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining, 15–18 August 1999, San Diego, CA USA, pages 135–144. ACM Press, 1999
Google Scholar
C. Domingo and O. Watanabe. A modification of adaboost: A preliminary report. Research Reports, Dept. of Math. and Comp. Sciences, Tokyo Institute of Technology, (C-133), July 1999.
Google Scholar
Y. Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121(2):256–285, 1995.
Article MATH MathSciNet Google Scholar
Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference, pages 148–156, 1996.
Google Scholar
Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, Aug. 1997.
Article MATH MathSciNet Google Scholar
T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Data mining using twodimensional optimized accociation rules: Scheme, algorithms, and visualization. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4–6, 1996, pages 13–23. ACM Press, 1996
Google Scholar
S. Khanna, S. Muthukrishnan, and M. Paterson. On approximating rectangle tiling and packing. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 384–393, Jan. 1998.
Google Scholar
S. Morishita. On classification and regression. In Proceedings of DiscoveryScience, First International Conference, DS’98 — Lecture Notes in Artificial Intelligence, volume 1532, pages 40–57, Dec. 1998.
Google Scholar
S. Morishita and J. Sese. Traversing itemset lattices with statistical metric pruning. In Proc. of ACM SIGACT-SIGMOD-SIGART Symp. on Database Systems (PODS), pages 226–236, May 2000.
Google Scholar
R. E. Schapire. The strength of weak learnability (extended abstract). In FOCS, pages 28–33, 1989.
Google Scholar
H. Tamaki and T. Tokuyama. Algorithms for the maxium subarray problem based on matrix multiplication. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 446–452, Jan. 1998.
Google Scholar
K. Yoda, T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Computing optimized rectilinear regions for association rules. In Proceedings of the Third International Conference on Knowledge Discoveryand Data Mining, pages 96–103, Aug. 1997.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Tokyo, Japan
Shinichi Morishita

Authors

Shinichi Morishita
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, Kyushu University, 6-10-1 Hakozaki, Higashi-ku, 812-8581, Fukuoka, Japan
Setsuo Arikawa & Ayumi Shinohara &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Morishita, S. (2002). Computing Optimal Hypotheses Efficiently for Boosting. In: Arikawa, S., Shinohara, A. (eds) Progress in Discovery Science. Lecture Notes in Computer Science(), vol 2281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45884-0_35

Download citation

DOI: https://doi.org/10.1007/3-540-45884-0_35
Published: 14 March 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43338-5
Online ISBN: 978-3-540-45884-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics