Skip to main content

Boosting over Groups and Its Application to Acronym-Expansion Extraction

  • Conference paper
Advanced Data Mining and Applications (ADMA 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5139))

Included in the following conference series:

Abstract

In many real-world classification applications, instances are generated from different ‘groups’. Take webpage classification as an example, the webpages for training and testing can be naturally grouped by network domains, which often vary a lot from one to another in domain size or webpage template. The differences between ‘groups’ would result that the distribution of instances from different ‘groups’ also vary. Thus, it is not so reasonable to equally treat the instances as the independent elements during training and testing as in conventional classification algorithms. This paper addresses the classification problem where all the instances can be naturally grouped. Specifically, we give a formulation to this kind of problem and propose a simple but effective boosting approach, which is called AdaBoost.Group. The problem is demonstrated by the task of recognizing acronyms and their expansions from text, where all the instances are grouped by sentences. The experimental results show that our approach is more appropriate to this kind of problems than conventional classification approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Naueau, D., Turney, P.D.: A supervised learning approach to acronym identification. In: Proceedings of the 18th Canadian Conference on Aritifical Intelligence (2005)

    Google Scholar 

  2. Xu, J., Huang, Y.L.: A Machine Learning Approach to Recognizing Acronym and Their Expansions. In: Proceedings of the 4th International Conference on Machine Learning and Cybernetics, pp. 2313–2319 (2005)

    Google Scholar 

  3. Taghva, K., Gilbreth, J.: Recognizing Acronym and their Definitions. International Journal on Document Analysis and Recognition 1, 191–198 (1999)

    Article  Google Scholar 

  4. Larkey, L.S., Ogilvie, P., Price, M.A., Tamilio, B.: Acrophile: An Automatic Acronym Extractor and Server. In: Proceedings of the 15th ACM Conference on Digital Libraries, pp. 205–214 (2000)

    Google Scholar 

  5. Park, Y., Byrd, R.J.: Hybrid Text Mining for Finding Abbreviations and Their Definitions. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing (2001)

    Google Scholar 

  6. Yu, H., Hripcsak, G., Friedman, C.: Mapping abbreviations to full forms in biomedical articales. Journal of the American Medical Informatics Association 9, 262–272 (2002)

    Article  Google Scholar 

  7. Schwartz, A., Hearst, M.: A simple algorithm for identifying abbreviation definitions in biomedical text. In: Proceedings of the Pacific Symposium on Bio-computing (2003)

    Google Scholar 

  8. Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of Online Learning and an Application to Boosting. Journal of Computer Sciences 55, 119–139 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  9. Schapire, R.E., Singer, Y.: Improved Boosting Algorithms Using Confidence-rated Predictions. Machine Learning 37, 297–336 (1999)

    Article  MATH  Google Scholar 

  10. Duffy, N., Helmbold, D.: Boosting Methods for Regression. Machine Learning 47, 153–200 (2002)

    Article  MATH  Google Scholar 

  11. Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An Efficient Boosting Algorithm for Combining Preferences. Singer, Y.: An Efficient Boosting Algorithm for Combining Preferences. Journal of Machine Learning Research 4, 933–969 (2003)

    MathSciNet  Google Scholar 

  12. Xu, J., Li, H.: AdaRank: A Boosting Algrithm for Information Retrieval. In: The proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, pp. 391–398 (2007)

    Google Scholar 

  13. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning - Data Mining, Inference, and Prediction. Springer, Heidelberg (2001)

    MATH  Google Scholar 

  14. Friedman, J., Hasite, T., Tibshirani, R.: Additive Logistic Regression: A Statistical View of Boosting. The Annals of Statistics 28, 337–407 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  15. Schapire, R.E.: A Brief Introduction to Boosting. In: The proceedings of the 16th International Joint conference on Artifical Intelligence, pp. 1401–1406 (1999)

    Google Scholar 

  16. Hettich, S., Bay, S.D.: The UCI KDD Archieve. University of California, Department of Information and Computer Science, Irvine, http://kdd.ics.uci.edu

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ni, W., Huang, Y., Li, D., Wang, Y. (2008). Boosting over Groups and Its Application to Acronym-Expansion Extraction. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2008. Lecture Notes in Computer Science(), vol 5139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88192-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88192-6_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88191-9

  • Online ISBN: 978-3-540-88192-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics