Based on Support Vector and Word Features New Word Discovery Research

  • Li Chengcheng
  • Xu Yuanfang
Part of the Communications in Computer and Information Science book series (CCIS, volume 320)


Chinese word segmentation is difficult to deal with ambiguity and unknown words recognition, this paper proposes the new word mode features as well as various word internal patterns from the training corpus of positive and negative samples to quantify extraction, and then through the training of support vector machine to get new support vector classification. On the test corpus with absolute discounting method new candidate extraction and selection, and with the training corpus to extract word patterns to quantify the new support vector classification for support vector machine test, through a portion of the rule filter to get the final word recognition results.


natural language processing support vector machine word recognition word feature 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chen, K., Bai, M.H.: Unknown word detection for Chinese by a corpus- based learning method. Computational Linguistics and Chinese Language Processing 3(1), 27–44 (1998)MathSciNetGoogle Scholar
  2. 2.
    Ning, S.: Based on word features and search engine for Chinese new word identification. Journal of Wuhan University (Science Edition ) 56(6), 704–710 (2010)Google Scholar
  3. 3.
    Qian, Q., Zhang, Z.: A method based on multiple SVM classification method of relevance feedback image retrieval. Computer Technology and Development 19(8), 66–69 (2009)MathSciNetGoogle Scholar
  4. 4.
    Huang, X., Wang, Y.: SVM in unbalanced data set. Computer Technology and Development 19(6), 190–193 (2009)Google Scholar
  5. 5.
    Yong, F., Hua, L.: Based on Adaptive Chinese word segmentation and approximation of SVM text classification algorithm. Computer Science 37, 251–254, 293 (2010)Google Scholar
  6. 6.
    Cao, B., Han, Z.: ASP.NET database system project development practice. Science Press, Beijing (2005)Google Scholar
  7. 7.
    Wang, B.: Database access technology based on ASP.NET. Computer Application and Software 21(2), 120–122 (2004)Google Scholar
  8. 8.
    Jeroslow, R., Wang, J.: Solving propositional satisfiability problems. In: Annals of Mat Hematics and Artificial intelligence. Springer (1990)Google Scholar
  9. 9.
    Nie, J.-Y.: Unknown Word Detection and Segmentation of Chinese using Statistical andheuristic Knowledge. Communications of COLIPS 5(I&2), 47–57 (2008)Google Scholar
  10. 10.
    Luo, Z., Song, R.: The adaptive method for Chinese new word identification based on multiple feature. Journal of Beijing University of Technology 23(7), 718–725 (2007)Google Scholar
  11. 11.
    Li, Y., Wang, H.: Intelligent computer assisted instruction system of knowledge ambiguity elimination. Computer Technology and Development 19(4), 220–223 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Li Chengcheng
    • 1
  • Xu Yuanfang
    • 1
  1. 1.School of Computer and Information EngineeringInner Mongolia Normal UniversityHohhotChina

Personalised recommendations