Advertisement

A Public Chinese Dataset for Language Model Adaptation

  • Ye Bai
  • Jiangyan YiEmail author
  • Jianhua Tao
  • Zhengqi Wen
  • Cunhang Fan
Article
  • 71 Downloads

Abstract

A language model (LM) is an important part of a speech recognition system. The performance of an LM is affected when the domains of training data and test data are different. Language model adaptation is to compensate for this mismatch. However, there is no public dataset in Chinese for evaluating language model adaptation. In this paper, we present a public Chinese dataset called CLMAD for language model adaptation. The dataset consists of four domains: sport, stock, fashion, and finance. The differences in these four domains are evaluated. We present baselines for two commonly used adaptation techniques: interpolation for n-gram, and fine-tuning for recurrent neural network language models (RNNLMs). For n-gram interpolation, when the source domain and target domain are relatively similar, the adapted model can be improved. But interpolating LMs of very different domains does not obtain improvement. For RNNLMs, fine-tuning whole network achieves the largest improvement over only fine-tuning softmax layer or embedding layer. When the domain difference is large, the improvement of the adapted RNNLM is significant. We also provide speech recognition results on AISHELL-1 with the LMs trained on CLMAD. CLMAD can be freely downloaded at http://www.openslr.org/55/ .

Keywords

Chinese dataset Language model adaptation Speech recognition N-gram RNNLM 

Notes

Acknowledgments

This work is supported by the National Key R&D Program of China (No. 2017YFB1002802). We thank NLP lab of Tsinghua University to provide THUCNews corpus, and Dr. Zhiyuan Liu to admit us to extend this dataset. We thank anonymous reviewers for their invaluable comments.

References

  1. 1.
    Jurafsky, D. (2000). Speech & language processing. Pearson Education India.Google Scholar
  2. 2.
    Rosenfeld, R. (2000). Two decades of statistical language modeling: Where do we go from here? Proceedings of the IEEE, 88(8), 1270–1278.CrossRefGoogle Scholar
  3. 3.
    Bellegarda, J. R. (2004). Statistical language model adaptation: review and perspectives. Speech Communication, 42(1), 93–108.CrossRefGoogle Scholar
  4. 4.
    Kuhn, R., & De Mori, R. (1990). A cache-based natural language model for speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(6), 570–583.CrossRefGoogle Scholar
  5. 5.
    Jelinek, F., Merialdo, B., Roukos, S., & Strauss, M. (1991). A dynamic language model for speech recognition. In Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991.Google Scholar
  6. 6.
    Rao, P. S., Dharanipragada, S., & Roukos, S. (1997). MDI adaptation of language models across corpora. In Fifth European Conference on Speech Communication and Technology.Google Scholar
  7. 7.
    Xu, W., & Rudnicky, A. (2000). Can artificial neural networks learn language models?. In Sixth International Conference on Spoken Language Processing.Google Scholar
  8. 8.
    Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3(Feb), 1137–1155.zbMATHGoogle Scholar
  9. 9.
    Mikolov, T., Karafiát, M., Burget, L., Černocký, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In Eleventh Annual Conference of the International Speech Communication Association.Google Scholar
  10. 10.
    Takase, S., Suzuki, J., & Nagata, M. (2018). Direct Output Connection for a High-Rank Language Model. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 4599-4609).Google Scholar
  11. 11.
    Yang, Z., Dai, Z., Salakhutdinov, R., & Cohen, W. W. (2017). Breaking the softmax bottleneck: A high-rank RNN language model. arXiv preprint arXiv:1711.03953.Google Scholar
  12. 12.
    Chen, X., Tan, T., Liu, X., Lanchantin, P., Wan, M., Gales, M. J., & Woodland, P. C. (2015). Recurrent neural network language model adaptation for multi-genre broadcast speech recognition. In Sixteenth Annual Conference of the International Speech Communication Association.Google Scholar
  13. 13.
    Deena, S., Ng, R. W., Madhyashta, P., Specia, L., & Hain, T. (2017). Semi-supervised adaptation of RNNLMs by fine-tuning with domain-specific auxiliary features. In Eighteenth Annual Conference of the International Speech Communication Association.Google Scholar
  14. 14.
    Li, K., Xu, H., Wang, Y., Povey, D., & Khudanpur, S. (2018). Recurrent neural network language model adaptation for conversational speech recognition. In Nighteenth Annual Conference of the International Speech Communication Association.Google Scholar
  15. 15.
    Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.Google Scholar
  16. 16.
    Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365.Google Scholar
  17. 17.
    Melis, G., Dyer, C., & Blunsom, P. (2017). On the state of the art of evaluation in neural language models. arXiv preprint arXiv:1707.05589.Google Scholar
  18. 18.
    Kim, Y., Jernite, Y., Sontag, D., & Rush, A. M. (2016). Character-Aware Neural Language Models. In AAAI (pp. 2741-2749).Google Scholar
  19. 19.
    Chelba, C., Mikolov, T., Schuster, M., Ge, Q., Brants, T., Koehn, P., & Robinson, T. (2014). One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling. In Fifteenth Annual Conference of the International Speech Communication Association.Google Scholar
  20. 20.
    Xu, H., Li, K., Wang, Y., Wang, J., Kang, S., Chen, X., ... & Khudanpur, S. (2018, April). Neural network language modeling with letter-based features and importance sampling. In Acoustics, Speech and Signal Processing (ICASSP), 2018 IEEE International Conference on. IEEE.Google Scholar
  21. 21.
    Xu, H., Chen, T., Gao, D., Wang, Y., Li, K., Goel, N., ... & Khudanpur, S. (2018). A Pruned RNNLM Lattice-Rescoring Algorithm for Automatic Speech Recognition. In Nighteenth Annual Conference of the International Speech Communication Association.Google Scholar
  22. 22.
    Zhang, Y., Zhang, P., & Yan, Y. (2018). Improving Language Modeling with an Adversarial Critic for Automatic Speech Recognition. Proc. Interspeech, 2018, 3348–3352.CrossRefGoogle Scholar
  23. 23.
    Bell, P., Gales, M. J., Hain, T., Kilgour, J., Lanchantin, P., Liu, X., ... & Woodland, P. C. (2015). The MGB challenge: Evaluating multi-genre broadcast media recognition. In Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on (pp. 687-693). IEEE.Google Scholar
  24. 24.
    Zhang, H. P., Yu, H. K., Xiong, D. Y., & Liu, Q. (2003). HHMM-based Chinese lexical analyzer ICTCLAS. In Proceedings of the second SIGHAN workshop on Chinese language processing-Volume 17 (pp. 184-187). Association for Computational Linguistics.Google Scholar
  25. 25.
    Stolcke, A. (2002). SRILM-an extensible language modeling toolkit. In Seventh international conference on spoken language processing.Google Scholar
  26. 26.
    Kuznetsov, V., Liao, H., Mohri, M., Riley, M., & Roark, B. (2016). Learning N-Gram Language Models from Uncertain Data. In Seventeenth Annual Conference of the International Speech Communication Association.Google Scholar
  27. 27.
    Beneš, K., Kesiraju, S., Burget, L. (2018) i-Vectors in Language Modeling: An Efficient Way of Domain Adaptation for Feed-Forward Models. In Nighteenth Annual Conference of the International Speech Communication Association.Google Scholar
  28. 28.
    Ma, M., Nirschl, M., Biadsy, F., & Kumar, S. (2017). Approaches for neural-network language model adaptation. In Eighteenth Annual Conference of the International Speech Communication Association.Google Scholar
  29. 29.
    Gangireddy, S. R., Swietojanski, P., Bell, P., & Renals, S. (2016). Unsupervised Adaptation of Recurrent Neural Network Language Models. In Seventeenth Annual Conference of the International Speech Communication Association.Google Scholar
  30. 30.
    Andrés-Ferrer, J., Bodenstab, N., & Vozila, P. (2018). Efficient Language Model Adaptation with Noise Contrastive Estimation and Kullback-Leibler Regularization. In Nighteenth Annual Conference of the International Speech Communication Association.Google Scholar
  31. 31.
    Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.CrossRefGoogle Scholar
  32. 32.
    Chung, J., et al. (2014) "Empirical evaluation of gated recurrent neural networks on sequence modeling." arXiv preprint arXiv:1412.3555.Google Scholar
  33. 33.
    Werbos, P. J. (1990). Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10), 1550–1560.CrossRefGoogle Scholar
  34. 34.
    Liu, X., et al. (2014) "Efficient lattice rescoring using recurrent neural network language models." 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE.Google Scholar
  35. 35.
    Emami, A., and Mangu L. (2007) "Empirical study of neural network language models for Arabic speech recognition." 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU). IEEE.Google Scholar
  36. 36.
    Schwenk, H. (2007). Continuous space language models. Computer Speech & Language, 21(3), 492–518.CrossRefGoogle Scholar
  37. 37.
    Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., ... & Kudlur, M. (2016). Tensorflow: a system for large-scale machine learning. In OSDI (Vol. 16, pp. 265-283).Google Scholar
  38. 38.
    Bu, H., Du, J., Na, X., Wu, B., & Zheng, H. (2017). AIShell-1: An open-source Mandarin speech corpus and a speech recognition baseline. In 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA) (pp. 1-5). IEEE.Google Scholar
  39. 39.
    Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., ... & Silovsky, J. (2011). The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding (No. EPFL-CONF-192584). IEEE Signal Processing Society.Google Scholar
  40. 40.
    Peddinti, V., Povey, D., & Khudanpur, S. (2015). A time delay neural network architecture for efficient modeling of long temporal contexts. In Sixteenth Annual Conference of the International Speech Communication Association.Google Scholar
  41. 41.
    Povey, D., Peddinti, V., Galvez, D., Ghahremani, P., Manohar, V., Na, X., ... & Khudanpur, S. (2016). Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI. In Seventeenth Annual Conference of the International Speech Communication Association.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of ScienceBeijingChina
  2. 2.School of Artificial IntelligenceUniversity of Chinese Academy of SciencesBeijingChina
  3. 3.CAS Center for Excellence in Brain Science and Intelligence Technology, Institute of AutomationChinese Academy of ScienceBeijingChina

Personalised recommendations