Multi-view LSTM Language Model with Word-Synchronized Auxiliary Feature for LVCSR

Wu, Yue; He, Tianxing; Chen, Zhehuai; Qian, Yanmin; Yu, Kai

doi:10.1007/978-3-319-69005-6_33

Yue Wu¹⁷,
Tianxing He¹⁷,
Zhehuai Chen¹⁷,
Yanmin Qian¹⁷ &
…
Kai Yu¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10565))

Included in the following conference series:

1943 Accesses
2 Citations

Abstract

Recently long short-term memory language model (LSTM LM) has received tremendous interests from both language and speech communities, due to its superiorty on modelling long-term dependency. Moreover, integrating auxiliary information, such as context feature, into the LSTM LM has shown improved performance in perplexity (PPL). However, improper feed of auxiliary information won’t give consistent gain on word error rate (WER) in a large vocabulary continuous speech recognition (LVCSR) task. To solve this problem, a multi-view LSTM LM architecture combining a tagging model is proposed in this paper. Firstly an on-line unidirectional LSTM-RNN is built as a tagging model, which can generate word-synchronized auxiliary feature. Then the auxiliary feature from the tagging model is combined with the word sequence to train a multi-view unidirectional LSTM LM. Different training modes for the tagging model and language model are explored and compared. The new architecture is evaluated on PTB, Fisher English and SMS Chinese data sets, and the results show that not only LM PPL promotion is observed, but also the improvements can be well transferred to WER reduction in ASR-rescore task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://nlp.stanford.edu/software/.

References

Katz, S.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. Acoust. Speech Signal Process. 35(3), 400–401 (1987)
Article Google Scholar
Mikolov, T., Karafiát, M., Burget, L., Černockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: INTERSPEECH, Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September, DBLP, pp. 1045–1048 (2010)
Google Scholar
Mikolov, T., Kombrink, S., Burget, L., Černockỳ, J., Khudanpur, S.: Extensions of recurrent neural network language model. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5528–5531. IEEE (2011)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Sundermeyer, M., Schlüter, R., Ney, H.: LSTM neural networks for language modeling. In: INTERSPEECH, vol. 31, pp. 194–197 (2012)
Google Scholar
Shi, Y., Wiggers, P., Jonker, C.M.: Towards recurrent neural networks language models with linguistic and contextual features. In: INTERSPEECH, vol. 48, pp. 1664–1667 (2012)
Google Scholar
Sang, T.K., Erik, F., Buchholz, S.: Introduction to the CoNLL-2000 shared task: chunking. In: The Workshop on Learning Language in Logic and the Conference on Computational Natural Language Learning, pp. 127–132 (2000)
Google Scholar
Shi, Y., Larson, M., Pelemans, J., Jonker, C.M., Wambacq, P., Wiggers, P., Demuynck, K.: Integrating meta-information into recurrent neural network language models. Speech Commun. 73, 64–80 (2015)
Article Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: International Conference on Machine Learning, ICML 2008, pp. 160–167 (2008)
Google Scholar
Liu, B., Lane, I.: Joint online spoken language understanding and language modeling with recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 160–167 (2016)
Google Scholar
Schmid, H.: Part-of-speech tagging with neural networks. In: Proceedings of the 15th Conference on Computational Linguistics, vol. 1, pp. 172–176. Association for Computational Linguistics (1994)
Google Scholar
Wang, P., Qian, Y., Soong, F.K., He, L., Zhao, H.: A unified tagging solution: bidirectional LSTM recurrent neural network with word embedding. Comput. Sci. (2015)
Google Scholar
Viterbi, A.J.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13, 260–269 (1967)
Article MATH Google Scholar
Ghahremani, P., Droppo, J.: Self-stabilized deep neural network. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5450–5454 (2016)
Google Scholar
Qi, L., Tian, T., Kai, Y.: An investigation on deep learning with beta stabilizer. In: IEEE International Conference on Signal Processing, pp. 557–561 (2017)
Google Scholar
Taylor, A., Marcus, M., Santorini, B.: The penn treebank: an overview. In: Abeillé, A. (ed.) Treebanks. Text, Speech and Language Technology, vol. 20, pp. 5–22. Springer, Netherlands (2003). doi:10.1007/978-94-010-0201-1_1
Stolcke, A., et al.: SRILM-an extensible language modeling toolkit. In: International Conference on Spoken Language Processing, pp. 901–904 (2002)
Google Scholar
Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the Meeting of the Association for Computational Linguistics, pp. 63–70 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, SpeechLab, Department of Computer Science and Engineering Brain Science and Technology Research Center, Shanghai Jiao Tong University, Shanghai, China
Yue Wu, Tianxing He, Zhehuai Chen, Yanmin Qian & Kai Yu

Authors

Yue Wu
View author publications
You can also search for this author in PubMed Google Scholar
Tianxing He
View author publications
You can also search for this author in PubMed Google Scholar
Zhehuai Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yanmin Qian
View author publications
You can also search for this author in PubMed Google Scholar
Kai Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Wu .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Maosong Sun
Beijing University of Posts and Telecommunications, Beijing, China
Xiaojie Wang
Peking University, Beijing, China
Baobao Chang
Soochow University, Suzhou, China
Deyi Xiong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, Y., He, T., Chen, Z., Qian, Y., Yu, K. (2017). Multi-view LSTM Language Model with Word-Synchronized Auxiliary Feature for LVCSR. In: Sun, M., Wang, X., Chang, B., Xiong, D. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2017 2017. Lecture Notes in Computer Science(), vol 10565. Springer, Cham. https://doi.org/10.1007/978-3-319-69005-6_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-69005-6_33
Published: 07 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69004-9
Online ISBN: 978-3-319-69005-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics