Skip to main content

Multi-view LSTM Language Model with Word-Synchronized Auxiliary Feature for LVCSR

  • Conference paper
  • First Online:
Book cover Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data (NLP-NABD 2017, CCL 2017)

Abstract

Recently long short-term memory language model (LSTM LM) has received tremendous interests from both language and speech communities, due to its superiorty on modelling long-term dependency. Moreover, integrating auxiliary information, such as context feature, into the LSTM LM has shown improved performance in perplexity (PPL). However, improper feed of auxiliary information won’t give consistent gain on word error rate (WER) in a large vocabulary continuous speech recognition (LVCSR) task. To solve this problem, a multi-view LSTM LM architecture combining a tagging model is proposed in this paper. Firstly an on-line unidirectional LSTM-RNN is built as a tagging model, which can generate word-synchronized auxiliary feature. Then the auxiliary feature from the tagging model is combined with the word sequence to train a multi-view unidirectional LSTM LM. Different training modes for the tagging model and language model are explored and compared. The new architecture is evaluated on PTB, Fisher English and SMS Chinese data sets, and the results show that not only LM PPL promotion is observed, but also the improvements can be well transferred to WER reduction in ASR-rescore task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://nlp.stanford.edu/software/.

References

  1. Katz, S.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. Acoust. Speech Signal Process. 35(3), 400–401 (1987)

    Article  Google Scholar 

  2. Mikolov, T., Karafiát, M., Burget, L., Černockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: INTERSPEECH, Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September, DBLP, pp. 1045–1048 (2010)

    Google Scholar 

  3. Mikolov, T., Kombrink, S., Burget, L., Černockỳ, J., Khudanpur, S.: Extensions of recurrent neural network language model. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5528–5531. IEEE (2011)

    Google Scholar 

  4. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  5. Sundermeyer, M., Schlüter, R., Ney, H.: LSTM neural networks for language modeling. In: INTERSPEECH, vol. 31, pp. 194–197 (2012)

    Google Scholar 

  6. Shi, Y., Wiggers, P., Jonker, C.M.: Towards recurrent neural networks language models with linguistic and contextual features. In: INTERSPEECH, vol. 48, pp. 1664–1667 (2012)

    Google Scholar 

  7. Sang, T.K., Erik, F., Buchholz, S.: Introduction to the CoNLL-2000 shared task: chunking. In: The Workshop on Learning Language in Logic and the Conference on Computational Natural Language Learning, pp. 127–132 (2000)

    Google Scholar 

  8. Shi, Y., Larson, M., Pelemans, J., Jonker, C.M., Wambacq, P., Wiggers, P., Demuynck, K.: Integrating meta-information into recurrent neural network language models. Speech Commun. 73, 64–80 (2015)

    Article  Google Scholar 

  9. Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: International Conference on Machine Learning, ICML 2008, pp. 160–167 (2008)

    Google Scholar 

  10. Liu, B., Lane, I.: Joint online spoken language understanding and language modeling with recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 160–167 (2016)

    Google Scholar 

  11. Schmid, H.: Part-of-speech tagging with neural networks. In: Proceedings of the 15th Conference on Computational Linguistics, vol. 1, pp. 172–176. Association for Computational Linguistics (1994)

    Google Scholar 

  12. Wang, P., Qian, Y., Soong, F.K., He, L., Zhao, H.: A unified tagging solution: bidirectional LSTM recurrent neural network with word embedding. Comput. Sci. (2015)

    Google Scholar 

  13. Viterbi, A.J.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13, 260–269 (1967)

    Article  MATH  Google Scholar 

  14. Ghahremani, P., Droppo, J.: Self-stabilized deep neural network. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5450–5454 (2016)

    Google Scholar 

  15. Qi, L., Tian, T., Kai, Y.: An investigation on deep learning with beta stabilizer. In: IEEE International Conference on Signal Processing, pp. 557–561 (2017)

    Google Scholar 

  16. Taylor, A., Marcus, M., Santorini, B.: The penn treebank: an overview. In: Abeillé, A. (ed.) Treebanks. Text, Speech and Language Technology, vol. 20, pp. 5–22. Springer, Netherlands (2003). doi:10.1007/978-94-010-0201-1_1

  17. Stolcke, A., et al.: SRILM-an extensible language modeling toolkit. In: International Conference on Spoken Language Processing, pp. 901–904 (2002)

    Google Scholar 

  18. Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the Meeting of the Association for Computational Linguistics, pp. 63–70 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yue Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Wu, Y., He, T., Chen, Z., Qian, Y., Yu, K. (2017). Multi-view LSTM Language Model with Word-Synchronized Auxiliary Feature for LVCSR. In: Sun, M., Wang, X., Chang, B., Xiong, D. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2017 2017. Lecture Notes in Computer Science(), vol 10565. Springer, Cham. https://doi.org/10.1007/978-3-319-69005-6_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69005-6_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69004-9

  • Online ISBN: 978-3-319-69005-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics