Skip to main content

Computing Optimal Hypotheses Efficiently for Boosting

  • Chapter
  • First Online:
Progress in Discovery Science

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2281))

Abstract

This paper sheds light on a strong connection between AdaBoost and several optimization algorithms for data mining. AdaBoost has been the subject of much interests as an effective methodology for classification task. AdaBoost repeatedly generates one hypothesis in each round, and finally it is able to make a highly accurate prediction by taking a weighted majority vote on the resulting hypotheses. Freund and Schapire have remarked that the use of simple hypotheses such as singletest decision trees instead of huge trees would be promising for achieving high accuracy and avoiding overfitting to the training data. One major drawback of this approach however is that accuracies of simple individual hypotheses may not always be high, hence demanding a way of computing more accurate (or, the most accurate) simple hypotheses effciently. In this paper, we consider several classes of simple but expressive hypotheses such as ranges and regions for numeric attributes, subsets of categorical values, and conjunctions of Boolean tests. For each class, we develop an efficient algorithm for choosing the optimal hypothesis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 26–28, 1993, pages 207–216. ACM Press, 1993

    Google Scholar 

  2. R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB’94, Proceedings of 20th International Conference on Very Large Data Bases, September 12–15, 1994, Santiago de Chile, Chile, pages 487–499. Morgan Kaufmann, 1994.

    Google Scholar 

  3. E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36(2):105–139, 1999.

    Article  Google Scholar 

  4. J. Bentley. Programming pearls. Communications of the ACM, 27(27):865–871, Sept. 1984.

    Article  Google Scholar 

  5. S. Brin, R. Rastogi, and K. Shim. Mining optimized gain rules for numeric attributes. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining, 15–18 August 1999, San Diego, CA USA, pages 135–144. ACM Press, 1999

    Google Scholar 

  6. C. Domingo and O. Watanabe. A modification of adaboost: A preliminary report. Research Reports, Dept. of Math. and Comp. Sciences, Tokyo Institute of Technology, (C-133), July 1999.

    Google Scholar 

  7. Y. Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121(2):256–285, 1995.

    Article  MATH  MathSciNet  Google Scholar 

  8. Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference, pages 148–156, 1996.

    Google Scholar 

  9. Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, Aug. 1997.

    Article  MATH  MathSciNet  Google Scholar 

  10. T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Data mining using twodimensional optimized accociation rules: Scheme, algorithms, and visualization. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4–6, 1996, pages 13–23. ACM Press, 1996

    Google Scholar 

  11. S. Khanna, S. Muthukrishnan, and M. Paterson. On approximating rectangle tiling and packing. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 384–393, Jan. 1998.

    Google Scholar 

  12. S. Morishita. On classification and regression. In Proceedings of DiscoveryScience, First International Conference, DS’98 — Lecture Notes in Artificial Intelligence, volume 1532, pages 40–57, Dec. 1998.

    Google Scholar 

  13. S. Morishita and J. Sese. Traversing itemset lattices with statistical metric pruning. In Proc. of ACM SIGACT-SIGMOD-SIGART Symp. on Database Systems (PODS), pages 226–236, May 2000.

    Google Scholar 

  14. R. E. Schapire. The strength of weak learnability (extended abstract). In FOCS, pages 28–33, 1989.

    Google Scholar 

  15. H. Tamaki and T. Tokuyama. Algorithms for the maxium subarray problem based on matrix multiplication. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 446–452, Jan. 1998.

    Google Scholar 

  16. K. Yoda, T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Computing optimized rectilinear regions for association rules. In Proceedings of the Third International Conference on Knowledge Discoveryand Data Mining, pages 96–103, Aug. 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Morishita, S. (2002). Computing Optimal Hypotheses Efficiently for Boosting. In: Arikawa, S., Shinohara, A. (eds) Progress in Discovery Science. Lecture Notes in Computer Science(), vol 2281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45884-0_35

Download citation

  • DOI: https://doi.org/10.1007/3-540-45884-0_35

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43338-5

  • Online ISBN: 978-3-540-45884-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics