Boosting over Groups and Its Application to Acronym-Expansion Extraction

Ni, Weijian; Huang, Yalou; Li, Dong; Wang, Yang

doi:10.1007/978-3-540-88192-6_5

Weijian Ni⁶,
Yalou Huang⁶,
Dong Li⁶ &
…
Yang Wang⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5139))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2477 Accesses
1 Citations

Abstract

In many real-world classification applications, instances are generated from different ‘groups’. Take webpage classification as an example, the webpages for training and testing can be naturally grouped by network domains, which often vary a lot from one to another in domain size or webpage template. The differences between ‘groups’ would result that the distribution of instances from different ‘groups’ also vary. Thus, it is not so reasonable to equally treat the instances as the independent elements during training and testing as in conventional classification algorithms. This paper addresses the classification problem where all the instances can be naturally grouped. Specifically, we give a formulation to this kind of problem and propose a simple but effective boosting approach, which is called AdaBoost.Group. The problem is demonstrated by the task of recognizing acronyms and their expansions from text, where all the instances are grouped by sentences. The experimental results show that our approach is more appropriate to this kind of problems than conventional classification approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Naueau, D., Turney, P.D.: A supervised learning approach to acronym identification. In: Proceedings of the 18th Canadian Conference on Aritifical Intelligence (2005)
Google Scholar
Xu, J., Huang, Y.L.: A Machine Learning Approach to Recognizing Acronym and Their Expansions. In: Proceedings of the 4th International Conference on Machine Learning and Cybernetics, pp. 2313–2319 (2005)
Google Scholar
Taghva, K., Gilbreth, J.: Recognizing Acronym and their Definitions. International Journal on Document Analysis and Recognition 1, 191–198 (1999)
Article Google Scholar
Larkey, L.S., Ogilvie, P., Price, M.A., Tamilio, B.: Acrophile: An Automatic Acronym Extractor and Server. In: Proceedings of the 15th ACM Conference on Digital Libraries, pp. 205–214 (2000)
Google Scholar
Park, Y., Byrd, R.J.: Hybrid Text Mining for Finding Abbreviations and Their Definitions. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing (2001)
Google Scholar
Yu, H., Hripcsak, G., Friedman, C.: Mapping abbreviations to full forms in biomedical articales. Journal of the American Medical Informatics Association 9, 262–272 (2002)
Article Google Scholar
Schwartz, A., Hearst, M.: A simple algorithm for identifying abbreviation definitions in biomedical text. In: Proceedings of the Pacific Symposium on Bio-computing (2003)
Google Scholar
Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of Online Learning and an Application to Boosting. Journal of Computer Sciences 55, 119–139 (1997)
Article MATH MathSciNet Google Scholar
Schapire, R.E., Singer, Y.: Improved Boosting Algorithms Using Confidence-rated Predictions. Machine Learning 37, 297–336 (1999)
Article MATH Google Scholar
Duffy, N., Helmbold, D.: Boosting Methods for Regression. Machine Learning 47, 153–200 (2002)
Article MATH Google Scholar
Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An Efficient Boosting Algorithm for Combining Preferences. Singer, Y.: An Efficient Boosting Algorithm for Combining Preferences. Journal of Machine Learning Research 4, 933–969 (2003)
MathSciNet Google Scholar
Xu, J., Li, H.: AdaRank: A Boosting Algrithm for Information Retrieval. In: The proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, pp. 391–398 (2007)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning - Data Mining, Inference, and Prediction. Springer, Heidelberg (2001)
MATH Google Scholar
Friedman, J., Hasite, T., Tibshirani, R.: Additive Logistic Regression: A Statistical View of Boosting. The Annals of Statistics 28, 337–407 (2000)
Article MATH MathSciNet Google Scholar
Schapire, R.E.: A Brief Introduction to Boosting. In: The proceedings of the 16th International Joint conference on Artifical Intelligence, pp. 1401–1406 (1999)
Google Scholar
Hettich, S., Bay, S.D.: The UCI KDD Archieve. University of California, Department of Information and Computer Science, Irvine, http://kdd.ics.uci.edu

Download references

Author information

Authors and Affiliations

College of Information Technical Science, Nankai University, No. 94 Weijin Road, Tianjin, China
Weijian Ni, Yalou Huang, Dong Li & Yang Wang

Authors

Weijian Ni
View author publications
You can also search for this author in PubMed Google Scholar
Yalou Huang
View author publications
You can also search for this author in PubMed Google Scholar
Dong Li
View author publications
You can also search for this author in PubMed Google Scholar
Yang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Sichuan University, 610065, Chengdu, China
Changjie Tang
Department of Computer Science, The University of Western Ontario, Canada
Charles X. Ling
School of ITEE, The University of Queensland, Australia
Xiaofang Zhou
Faculty of Science & Engineering, York University, 355 Lumbers Building, M3J 1P3, Toronto, Ontario, Canada
Nick J. Cercone
School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, 4072, Queensland, Australia
Xue Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ni, W., Huang, Y., Li, D., Wang, Y. (2008). Boosting over Groups and Its Application to Acronym-Expansion Extraction. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2008. Lecture Notes in Computer Science(), vol 5139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88192-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-88192-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88191-9
Online ISBN: 978-3-540-88192-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics