Journal of Computer Science and Technology

, Volume 18, Issue 1, pp 131–136 | Cite as

Incorporating linguistic structure into maximum entropy language models

  • Fang GaoLin 
  • Gao Wen 
  • Wang ZhaoQi 


In statistical language models, how to integrate diverse linguistic knowledge in a general framework for long-distance dependencies is a challenging issue. In this paper, an improved language model incorporating linguistic structure into maximum entropy framework is presented. The proposed model combines trigram with the structure knowledge of base phrase in which trigram is used to capture the local relation between words, while the structure knowledge of base phrase is considered to represent the long-distance relations between syntactical structures. The knowledge of syntax, semantics and vocabulary is integrated into the maximum entropy framework. Experimental results show that the proposed model improves by 24% for language model perplexity and increases about 3% for sign language recognition rate compared with the trigram model.


maximum entropy language model base phrase identification sign language recognition 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Rosenfeld R. Two decades of statistical language modeling: Where do we go from here? InProc. the IEEE, 2000, 88(8): 1270–1278.Google Scholar
  2. [2]
    Niesler T, Whittaker E, Woodland P. Comparison of part-of-speech and automatically derived category-based language models for speech recognition. InProc. ICASSP-98, Seattle, USA, 1998, pp.177–180.Google Scholar
  3. [3]
    Niesler T. Category-based statistical language models [Dissertation]. University of Cambridge, UK, 1997.Google Scholar
  4. [4]
    Kuo H, Reichl W. Phrase-based language models for speech recognition. InProc. Eurospeech-99, Budapest, Hungary, 1999, pp.1595–1598.Google Scholar
  5. [5]
    Ney H, Essen U, Kneser R. On structuring probabilistic dependences in stochastic language modeling.Computer Speech and Language, 1994, 8: 1–38.CrossRefGoogle Scholar
  6. [6]
    Lau R, Rosenfeld R, Roukos S. Trigger-based language models: A maximum entropy approach. InProc. ICASSP-93, Minneapolis, USA, 1993, pp.45–48.Google Scholar
  7. [7]
    Ron D, Singer Y, Tishby N. The power of amnesia: Learning probabilistic automata with variable memory length.Machine Learning, 1996, 25: 117–149.MATHCrossRefGoogle Scholar
  8. [8]
    Siu M, Ostendorf M. Variable n-grams and extensions for conversational speech language modeling.IEEE Trans. Speech and Audio Processing, 2000, 8(1): 63–75.CrossRefGoogle Scholar
  9. [9]
    Chelba Cet al. Structure and performance of a dependency language model. InProc. Eurospeech-97, Rhodes, Greece, 1997, pp.2775–2778.Google Scholar
  10. [10]
    Benedi J, Sanchez J. Combination of n-grams and stochastic context-free grammars for language modeling. InProc. the 18th Int. Conf. on Computational Linguistics, Saarbrücken, Luxembourg, 2000, pp. 55–61.Google Scholar
  11. [11]
    Lafferty J, Sleator D, Temperley D. Grammatical trigrams: A probabilistic model of link grammar. InProc. the AAAI Fall Symposium on Probabilistic Approaches to Natural Language, Cambridge, Massachusetts, 1992, pp.89–97.Google Scholar
  12. [12]
    Rosenfeld R. A maximum entropy approach to adaptive statistical language modeling.Computer Speech and Language, 1996, 10: 187–228.CrossRefGoogle Scholar
  13. [13]
    Zhou Q, Sun M S, Huang C N. Chunk parsing scheme for Chinese sentences.Chinese Journal of Computers, 1999, 22(11): 1159–1165.Google Scholar
  14. [14]
    Zhao T J, Yang M Y, Liu Fet al. Statistics-based hybrid approach to Chinese base phrase identification. InProc. the Second Chinese Language Processing Workshop, Hong Kong, China, 2001, pp.73–77.Google Scholar
  15. [15]
    Jaynes E T. Information theory and statistical mechanics.Physics Reviews, 1957, 106(108): 620–630.CrossRefMathSciNetGoogle Scholar
  16. [16]
    Della Pietra S, Della Pietra V, Mercer R, Roukos S. Adaptive language modeling using minimum discriminant estimation. InProc. ICASSP-92, San Francisco, USA, 1992, pp.633–636.Google Scholar
  17. [17]
    Zhao T Jet al. Increasing accuracy of Chinese segmentation with strategy of multi-step processing.Journal of Chinese Information Processing, 2000, 15(1): 13–18.Google Scholar
  18. [18]
    Fang G L, Gao W, Chen X Let al. A signer-independent continuous sign language recognition system based on SRN/HMM.Journal of Software, 2002, 13(11): 2169–2174.Google Scholar

Copyright information

© Science Press, Beijing China and Allerton Press Inc. 2003

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringHarbin Institute of TechnologyHarbinP.R. China
  2. 2.Institute of Computing TechnologyThe Chinese Academy of SciencesBeijingP.R. China

Personalised recommendations