Advertisement

Collaborative Recognition and Recovery of the Chinese Intercept Abbreviation

  • Jinshuo Liu
  • Yusen Chen
  • Juan DengEmail author
  • Donghong Ji
  • Jeff Pan
Conference paper
  • 1.4k Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10565)

Abstract

One of the important works of Information Content Security is evaluating the theme words of the text. Because of the variety of the Chinese expression, especially of the abbreviation, the supervision of the theme words becomes harder. The goal of this paper is to quickly and accurately discover the intercept abbreviations from the text crawled at the short time period. The paper firstly segments the target texts, and then utilizes the Supported Vector Machine (SVM) to recognize the abbreviations from the wrongly segmented texts as the candidates. Secondly, this paper presents the collaborative methods: Improve the Conditional Random Fields (CRF) to predict the corresponding word to each character of the abbreviation; To solve the problems of the 1:n relationship, collaboratively merge the ranking list from the predict steps with the matched results of the thesaurus of abbreviations. The experiments demonstrate that our method at the recognizing stage is 76.5% of the accuracy and 77.8% of the recall rate. At the recovery step, the accuracy is 62.1%, which is 20.8% higher than the method based on Hidden Markov Model (HMM).

Keywords

Collaborative recovery Improved CRF Chinese abbreviation 

References

  1. 1.
    Wang, H.F.: Survey: abbreviation processing in chinese text. J. Chin. Inf. Process. 25(5), 60–67 (2011)Google Scholar
  2. 2.
    Wang, A.: Mining informal language from chinese microtext: joint word recognition and segmentation. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 731–741. ACL, Sofia (2013)Google Scholar
  3. 3.
    Wang, A.: Chinese informal word normalization: an experimental study. In: The 6th International Joint Conference on Natural Language Processing (IJCNLP), pp. 127–135. ACL, Nagoya (2013)Google Scholar
  4. 4.
    Li, C.: Improving named entity recognition in tweets via detecting non-standard words. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, pp. 929–938. ACL, Beijing (2015)Google Scholar
  5. 5.
    Monroe, W.: Word segmentation of informal arabic with domain adaptation. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 206–211. ACL, Baltimore (2014)Google Scholar
  6. 6.
    Barrena, A.: Alleviating poor context with background knowledge for named entity disambiguation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 1903–1912. ACL, Berlin (2016)Google Scholar
  7. 7.
    Chang, J.S.: A preliminary study on probabilistic models for chinese abbreviations. In: Proceedings of the 3rd SIGHAN workshop on Chinese language learning, pp. 9–16. ACL, Barcelona (2004)Google Scholar
  8. 8.
    Roark, B.: Hippocratic abbreviation expansion. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 364–369. ACL, Baltimore (2014)Google Scholar
  9. 9.
    Jiao, Y.: Abbreviation Prediction Using Conditional Random Field and Web Data. J. Chin. Inf. Process. 26(2), 62–68 (2012)Google Scholar
  10. 10.
    Zhang, L.K.: Predicting chinese abbreviations with minimum semantic unit and global constraints. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1405–1414. ACL, Doha (2014)Google Scholar
  11. 11.
    Zhang, L.K.: Coarse-grained candidate generation and fine-grained re-ranking for chinese abbreviation prediction. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1881–1890. ACL, Doha (2014)Google Scholar
  12. 12.
    Chen, H.: Chinese named entity abbreviation generation using first-order logic. In: The 6th International Joint Conference on Natural Language Processing (IJCNLP), pp. 320–328. ACL, Nagoya (2013)Google Scholar
  13. 13.
    Shi, Y.Y.: Cluster based Chinese Abbreviation Modeling. In: 15th Annual Conference of the International Speech Communication Association, pp. 273–277. COLIPS, Singapore (2014)Google Scholar
  14. 14.
    Chen, F.: Open Domain New Word Detection Using Condition Random Field Method. Ruan Jian Xue Bao/J. Softw. 24(5), 1051–1060 (2013)Google Scholar
  15. 15.
    Lavergne, T.: From n -gram-based to CRF-based translation models. In: Proceedings of the 6th Workshop on Statistical Machine Translation, pp. 542–553. ACL, Edinburgh (2011)Google Scholar
  16. 16.
    Tsuruoka, Y.: Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty. In: Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pp. 477–485. AFNLP, Suntec (2009)Google Scholar
  17. 17.
    Sokolovska, N.: Efficient learning of sparse conditional random fields for supervised sequence labeling. IEEE J. Sel. Top. Sign. Process. 4(6), 953–964 (2010)CrossRefGoogle Scholar
  18. 18.
    Yin, Q.: A joint model for ellipsis identification and recovery. J. Comput. Res. Dev. 52(11), 2460–2467 (2015)Google Scholar
  19. 19.
    Sun, X.: Learning abbreviations from chinese and english terms by modeling non-local information. ACM Trans. Asian Lang. Inf. Process. (TALIP) 12(2), 5:1–5:17 (2013)Google Scholar
  20. 20.
    Kenyon-Dean, K.: Verb phrase ellipsis resolution using discriminative and margin-infused algorithms. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1734–1743. ACL, Austin (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Jinshuo Liu
    • 1
  • Yusen Chen
    • 1
  • Juan Deng
    • 2
    Email author
  • Donghong Ji
    • 1
  • Jeff Pan
    • 3
  1. 1.Computer SchoolWuhan UniversityWuhanChina
  2. 2.International School of SoftwareWuhan UniversityWuhanChina
  3. 3.University of AberdeenAberdeenUK

Personalised recommendations