Journal of Computer Science and Technology

, Volume 23, Issue 4, pp 602–611 | Cite as

Predicting Chinese Abbreviations from Definitions: An Empirical Learning Approach Using Support Vector Regression

  • Xu SunEmail author
  • Hou-Feng Wang
  • Bo Wang
Regular Paper


In Chinese, phrases and named entities play a central role in information retrieval. Abbreviations, however, make keyword-based approaches less effective. This paper presents an empirical learning approach to Chinese abbreviation prediction. In this study, each abbreviation is taken as a reduced form of the corresponding definition (expanded form), and the abbreviation prediction is formalized as a scoring and ranking problem among abbreviation candidates, which are automatically generated from the corresponding definition. By employing Support Vector Regression (SVR) for scoring, we can obtain multiple abbreviation candidates together with their SVR values, which are used for candidate ranking. Experimental results show that the SVR method performs better than the popular heuristic rule of abbreviation prediction. In addition, in abbreviation prediction, the SVR method outperforms the hidden Markov model (HMM).


statistical natural language processing abbreviation prediction support vector regression word clustering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11390_2008_9156_MOESM1_ESM.pdf (108 kb)
(PDF 107 kb)


  1. [1]
    Wren J D, Chang J T, Pustejovsky J, Adar E, Garner H R, Altman R B. Biomedical term mapping databases. Nucleic Acid Research, 2005, 33: 289–293.CrossRefGoogle Scholar
  2. [2]
    Yoshida M, Fukuda K, Takagi T. Pnad-css: A workbench for constructing a protein name abbreviation dictionary. Bioinformatics, 2000, 16(2): 169–175.CrossRefGoogle Scholar
  3. [3]
    Nenadic G, Spasic I, Ananiadou S. Automatic acronym acquisition and term variation management within domain-specific texts. In Proc. the LREC-3, Las Palmas, Spain, 2002, pp.2155–2162.Google Scholar
  4. [4]
    Schwartz A, Hearst M. A simple algorithm for identifying abbreviation definitions in biomedical texts. In Proc. the Pacific Symposium on Biocomputing (PSB 2003), pp.451–462.Google Scholar
  5. [5]
    Manuel Zahariev. An efficient methodology for acronym-expansion matching. In Proc. the International Conference on Information and Knowledge Engineering (IKE), Las Vegas, USA, 2003, pp.32–37.Google Scholar
  6. [6]
    Adar E. Sarad: A simple and robust abbreviation dictionary. Bioinformatics, 2004, 20(4): 527–533.CrossRefGoogle Scholar
  7. [7]
    Tsuruoka Y, Ananiadou S, Tsujii J. A machine learning approach to abbreviation generation. In Proc. the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, Michigan, USA, 2005, pp.25–31.Google Scholar
  8. [8]
    Fu G, Luke K, Zhang M, Zhou G. A hybrid approach to Chinese abbreviation expansion. In Proc ICCPOL’06: 21st International Conference on Computer Processing of Oriental Languages, Singapore, 2006, pp.277–287.Google Scholar
  9. [9]
    Huang C R, Ahrens K, Chen K J. A data-driven approach to psychological reality of the mental lexicon: Two studies on Chinese corpus linguistics. In Proc. Language and Its Psychobiological Bases, Taipei, 1994a.Google Scholar
  10. [10]
    Huang C R, Hong W M, Chen K J. Suoxie: An information based lexical rule of abbreviation. In Proc. the Second Pacific Asia Conference on Formal and Computational Linguistics II, Japan, 1994b, pp.49–52.Google Scholar
  11. [11]
    Chang J, Lai L. A preliminary study on probabilistic models for Chinese abbreviations. In Proc. the Third SIGHAN Workshop on Chinese Language Learning, ACL, Barcelona, Spain, 2004, pp.9–16.Google Scholar
  12. [12]
    Chang J, Teng T. Mining atomic Chinese abbreviation pairs: A probabilistic model for single character word recovery. Language Resources and Evaluation, 2007, 40(3/4): 367–374.CrossRefGoogle Scholar
  13. [13]
    Christianini N, Shawe-Taylor J. An Introduction to Support Vector Machines and Other Kernel-Based Methods. Cambridge University Press, 2000.Google Scholar
  14. [14]
    Eubank R L. Spline Smoothing and Nonparametric Regression. New York: Marcel Dekker, 1988.zbMATHGoogle Scholar
  15. [15]
    Smola A, Schölkopf B. A tutorial on support vector regression. Statistics and Computing, 2003, 14(3): 199–222.CrossRefGoogle Scholar
  16. [16]
    Chang C C, Lin C J. LIBSVM: A library for support vector machines. Software available at http://www.csie.
  17. [17]
    Hsu C W, Chang C C, Lin C J. A Practical Guide to Support Vector Classification, 2003, Working Paper,
  18. [18]
    Och F J. An efficient method for determining bilingual word classes. In Proc. Ninth Conference of the European Chapter of the Association for Computational Linguistics, EACL’99, 1999, pp.71–76.Google Scholar
  19. [19]
    Martin S, Liermann J, Ney H. Algorithms for bigram and trigram word clustering. Speech Communication, 1998, 24(1): 19–37.CrossRefGoogle Scholar
  20. [20]
    Katz S M. Estimation of probabilities from sparse data for the language model component of a speech recogniser. IEEE Trans. Acoustics, Speech, and Signal Processing, 1987, 35(3): 400–401.CrossRefGoogle Scholar
  21. [21]
    Yan H, Wan X. Modern Chinese Abbreviation Dictionary. China: Yuwen Publisher, 2002. (In Chinese)Google Scholar
  22. [22]
    Sun X, Wang H F. Chinese abbreviation identification using abbreviation-template features and context information. In Proc. 21st International Conference on Computer Processing of Oriental Languages (ICCPOL-06), Singapore, 2006, pp.245–255.Google Scholar
  23. [23]
    Sun X, Wang H F, Zhang Y. Chinese abbreviation-definition identification: A SVM approach using context information. In Proc. PRICAI-06: the 9th Pacific Rim International Conference on Artificial Intelligence, 2006, pp.495–504.Google Scholar

Copyright information

© Springer 2008

Authors and Affiliations

  1. 1.Institute of Computational Linguistics, School of Electronics Engineering and Computer SciencePeking UniversityBeijingChina
  2. 2.Department of Computer Science, Graduate School of Information Science and TechnologyThe University of TokyoTokyoJapan

Personalised recommendations