Abstract
In many real-world classification applications, instances are generated from different ‘groups’. Take webpage classification as an example, the webpages for training and testing can be naturally grouped by network domains, which often vary a lot from one to another in domain size or webpage template. The differences between ‘groups’ would result that the distribution of instances from different ‘groups’ also vary. Thus, it is not so reasonable to equally treat the instances as the independent elements during training and testing as in conventional classification algorithms. This paper addresses the classification problem where all the instances can be naturally grouped. Specifically, we give a formulation to this kind of problem and propose a simple but effective boosting approach, which is called AdaBoost.Group. The problem is demonstrated by the task of recognizing acronyms and their expansions from text, where all the instances are grouped by sentences. The experimental results show that our approach is more appropriate to this kind of problems than conventional classification approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Naueau, D., Turney, P.D.: A supervised learning approach to acronym identification. In: Proceedings of the 18th Canadian Conference on Aritifical Intelligence (2005)
Xu, J., Huang, Y.L.: A Machine Learning Approach to Recognizing Acronym and Their Expansions. In: Proceedings of the 4th International Conference on Machine Learning and Cybernetics, pp. 2313–2319 (2005)
Taghva, K., Gilbreth, J.: Recognizing Acronym and their Definitions. International Journal on Document Analysis and Recognition 1, 191–198 (1999)
Larkey, L.S., Ogilvie, P., Price, M.A., Tamilio, B.: Acrophile: An Automatic Acronym Extractor and Server. In: Proceedings of the 15th ACM Conference on Digital Libraries, pp. 205–214 (2000)
Park, Y., Byrd, R.J.: Hybrid Text Mining for Finding Abbreviations and Their Definitions. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing (2001)
Yu, H., Hripcsak, G., Friedman, C.: Mapping abbreviations to full forms in biomedical articales. Journal of the American Medical Informatics Association 9, 262–272 (2002)
Schwartz, A., Hearst, M.: A simple algorithm for identifying abbreviation definitions in biomedical text. In: Proceedings of the Pacific Symposium on Bio-computing (2003)
Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of Online Learning and an Application to Boosting. Journal of Computer Sciences 55, 119–139 (1997)
Schapire, R.E., Singer, Y.: Improved Boosting Algorithms Using Confidence-rated Predictions. Machine Learning 37, 297–336 (1999)
Duffy, N., Helmbold, D.: Boosting Methods for Regression. Machine Learning 47, 153–200 (2002)
Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An Efficient Boosting Algorithm for Combining Preferences. Singer, Y.: An Efficient Boosting Algorithm for Combining Preferences. Journal of Machine Learning Research 4, 933–969 (2003)
Xu, J., Li, H.: AdaRank: A Boosting Algrithm for Information Retrieval. In: The proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, pp. 391–398 (2007)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning - Data Mining, Inference, and Prediction. Springer, Heidelberg (2001)
Friedman, J., Hasite, T., Tibshirani, R.: Additive Logistic Regression: A Statistical View of Boosting. The Annals of Statistics 28, 337–407 (2000)
Schapire, R.E.: A Brief Introduction to Boosting. In: The proceedings of the 16th International Joint conference on Artifical Intelligence, pp. 1401–1406 (1999)
Hettich, S., Bay, S.D.: The UCI KDD Archieve. University of California, Department of Information and Computer Science, Irvine, http://kdd.ics.uci.edu
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ni, W., Huang, Y., Li, D., Wang, Y. (2008). Boosting over Groups and Its Application to Acronym-Expansion Extraction. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2008. Lecture Notes in Computer Science(), vol 5139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88192-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-88192-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88191-9
Online ISBN: 978-3-540-88192-6
eBook Packages: Computer ScienceComputer Science (R0)