An Improved Hierarchical Word Sequence Language Model Using Word Association

Wu, Xiaoyi; Matsumoto, Yuji; Duh, Kevin; Shindo, Hiroyuki

doi:10.1007/978-3-319-25789-1_26

Xiaoyi Wu¹⁶,
Yuji Matsumoto¹⁶,
Kevin Duh¹⁶ &
…
Hiroyuki Shindo¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9449))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

655 Accesses
2 Citations

Abstract

Language modeling is a fundamental research problem that has applications for many NLP tasks. For estimating probabilities, most research on language modeling uses n-gram approach to factor sentence probabilities. However, the assumption of n-gram is too simple to cope with the data sparseness problem, which affects the final performance of language models. At the point, Hierarchical Word Sequence (abbreviated as HWS) language model, which uses word frequency information to convert raw sentences into special n-gram sequences, can be viewed as an effective alternative to normal n-gram method.

In this paper, we improve upon the basic HWS approach by generalizing it to exploit not only word frequencies but word association.

For evaluation, we compare word association based HWS models to normal HWS models and normal n-gram models. Both intrinsic and extrinsic experiments verify that word association based HWS models can achieve better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
If \(w_{i}\) appears multiple times in s, then select the first one.
2.
http://www.natcorp.ox.ac.uk.
3.
https://catalog.ldc.upenn.edu/LDC2011T07.
4.
http://www.nltk.org.
5.
https://github.com/aisophie/HWS.
6.
http://www.statmt.org/moses/.
7.
http://sourceforge.net/projects/irstlm/.
8.
http://www.speech.sri.com/projects/srilm/.
9.
For the settings of IRSTLM and SRILM, we use default settings except for using modified Kneser-Ney as the smoothing method.

References

Brown, P.F., Cocke, J., Pietra, S.A., Pietra, V.J., Jelinek, F., Lafferty, J.D., Mercer, R.L., Roossin, P.S.: A statistical approach to machine translation. Comput. Linguist. 16(2), 79–85 (1990)
Google Scholar
Mays, E., Damerau, F.J., Mercer, R.L.: Context based spelling correction. Inf. Process. Manage. 27(5), 517–522 (1991)
Article Google Scholar
Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs (1993)
MATH Google Scholar
Bickel, S., Haider, P., Scheffer, T.: Predicting sentences using n-gram language models. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT’05, pp. 193–200, Stroudsburg, PA, USA. Association for Computational Linguistics (2005)
Google Scholar
Katz, S.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. Acoust. Speech Signal Process. 35(3), 400–401 (1987)
Article Google Scholar
Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: 1995 International Conference on Acoustics, Speech, and Signal Processing. ICASSP-95, vol. 1, pp. 181–184. IEEE (1995)
Google Scholar
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13(4), 359–393 (1999)
Article Google Scholar
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948)
Article MathSciNet MATH Google Scholar
Allison, B., Guthrie, D., Guthrie, L., Liu, W., Wilks, Y.: Quantifying the Likelihood of Unseen Events: A Further Look at the Data Sparsity Problem. Awaiting publication (2005)
Google Scholar
Wu, X., Matsumoto, Y.: A hierarchical word sequence language model. In: Proceedings of The 28th Pacific Asia Conference on Language, Information and Computation (PACLIC), pp. 489–494 (2014)
Google Scholar
Huang, X., Alleva, F., Hon, H.W., Hwang, M.Y., Lee, K.F.: The SPHINX-II speech recognition system: an overview. Comput. Speech Lang. 7(2), 137–148 (1993)
Article Google Scholar
Guthrie, D., Allison, B., Liu, W., Guthrie, L.: A Closer Look at Skip-gram Modeling. In: Proceedings of the 5th International Conference on Language Resources and Evaluation, pp. 1–4 (2006)
Google Scholar
Brown, P.F., Desouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–479 (1992)
Google Scholar
Bilmes, J.A., Kirchhoff, K.: Factored language models and generalized parallel backoff. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 2, pp. 4–6 (2003)
Google Scholar
Shen, L., Xu, J., Weischedel, R.M.: A new string-to-dependency machine translation algorithm with a target dependency language model. In: ACL, pp. 577–585 (2008)
Google Scholar
Chen, W., Zhang, M., Li, H.: Utilizing dependency language models for graph-based dependency parsing models. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Long Papers, vol. 1. Association for Computational Linguistics, pp. 213–222 (2012)
Google Scholar
Chelba, C.: A structured language model. In: Proceedings of ACL-EACL, Madrid, Spain, pp. 498–500 (1997)
Google Scholar
Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26, 297–302 (1945)
Article Google Scholar
Pickhardt, R., Gottron, T., Körner, M., Staab, S.: A generalized language model as the combination of skipped n-grams and modified kneser-ney smoothing. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 1145–1154 (2014)
Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Google Scholar
Banerjee, S., Lavie, A.: METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)
Google Scholar
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Nara Institute of Science and Technology, Ikoma, Japan
Xiaoyi Wu, Yuji Matsumoto, Kevin Duh & Hiroyuki Shindo

Authors

Xiaoyi Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yuji Matsumoto
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Duh
View author publications
You can also search for this author in PubMed Google Scholar
Hiroyuki Shindo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoyi Wu .

Editor information

Editors and Affiliations

Research Group on Mathematical Linguistic, Rovira i Virgili University, Tarragona, Spain
Adrian-Horia Dediu
Research Group on Mathematical Linguistic, Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide
Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary
Klára Vicsi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, X., Matsumoto, Y., Duh, K., Shindo, H. (2015). An Improved Hierarchical Word Sequence Language Model Using Word Association. In: Dediu, AH., Martín-Vide, C., Vicsi, K. (eds) Statistical Language and Speech Processing. SLSP 2015. Lecture Notes in Computer Science(), vol 9449. Springer, Cham. https://doi.org/10.1007/978-3-319-25789-1_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-25789-1_26
Published: 17 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25788-4
Online ISBN: 978-3-319-25789-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics