Skip to main content

Class n-Gram Models for Very Large Vocabulary Speech Recognition of Finnish and Estonian

  • Conference paper
  • First Online:
Book cover Statistical Language and Speech Processing (SLSP 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9918))

Included in the following conference series:

Abstract

We study class n-gram models for very large vocabulary speech recognition of Finnish and Estonian. The models are trained with vocabulary sizes of several millions of words using automatically derived classes. To evaluate the models on Finnish and an Estonian broadcast news speech recognition task, we modify Aalto University’s LVCSR decoder to operate with the class n-grams and very large vocabularies. Linear interpolation of a standard n-gram model and a class n-gram model provides relative perplexity improvements of 21.3 % for Finnish and 12.8 % for Estonian over the n-gram model. The relative improvements in word error rates are 5.5 % for Finnish and 7.4 % for Estonian. We also compare our word-based models to a state-of-the-art unlimited vocabulary recognizer utilizing subword n-gram models, and show that the very large vocabulary word-based models can perform equally well or better.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aalto University: AaltoASR (2014). http://github.com/aalto-speech/AaltoASR/

  2. Aubert, X.L.: An overview of decoding techniques for large vocabulary continuous speech recognition. Comput. Speech Lang. 16(1), 89–114 (2002)

    Article  Google Scholar 

  3. Botros, R., Irie, K., Sundermeyer, M., Ney, H.: On efficient training of word classes and their application to recurrent neural network language models. In: Proceedings of the INTERSPEECH, pp. 1443–1447, Dresden, Germany (2015)

    Google Scholar 

  4. Brown, P.F., deSouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–470 (1992)

    Google Scholar 

  5. Brychcín, T., Konopik, M.: Morphological based language models for inflectional languages. In: The 6th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, Prague, Czech Republic (2011)

    Google Scholar 

  6. Chen, S.F., Goodman, J.T.: An empirical study of smoothing techniques for language modeling. Technical report, TR-10-98. Computer Science Group, Harvard University (1998)

    Google Scholar 

  7. Creutz, M., Lagus, K.: Unsupervised discovery of morphemes. In: Proceedings of the ACL 2002 Workshop on Morphological and Phonological Learning. MPL 2002, vol. 6, pp. 21–30 (2002)

    Google Scholar 

  8. Creutz, M., Stolcke, A., Hirsimäki, T., Kurimo, M., Puurula, A., Pylkkönen, J., Siivola, V., Varjokallio, M., Arisoy, E., Saraçlar, M.: Morph-based speech recognition and modeling of out-of-vocabulary words across languages. ACM Trans. Speech Lang. Process. 5(1), 1–29 (2007)

    Article  Google Scholar 

  9. Deligne, S., Bimbot, F.: Inference of variable-length linguistic and acoustic units by multigrams. Speech Commun. 23(3), 223–241 (1997)

    Article  Google Scholar 

  10. Hirsimäki, T., Creutz, M., Siivola, V., Kurimo, M., Virpioja, S., Pylkkönen, J.: Unlimited vocabulary speech recognition with morph language models applied to Finnish. Comput. Speech Lang. 20(4), 515–541 (2006)

    Article  Google Scholar 

  11. Hirsimäki, T., Kurimo, M.: Decoder issues in unlimited Finnish speech recognition. In: Proceedings of the 6th Nordic Signal Processing Symposium (Norsig 2004), pp. 320–323, Espoo, Finland (2004)

    Google Scholar 

  12. Hirsimäki, T., Kurimo, M.: Analysing recognition errors in unlimited-vocabulary speech recognition. In: Proceedings of the HLT-NAACL, pp. 193–196 (2009)

    Google Scholar 

  13. Hirsimäki, T., Pylkkönen, J., Kurimo, M.: Importance of high-order n-gram models in morph-based speech recognition. IEEE Trans. Audio Speech Lang. Process. 17(4), 724–732 (2009)

    Article  Google Scholar 

  14. Iskra, D.J., Grosskopf, B., Marasek, K., van den Heuvel, H., Diehl, F., Kießling, A.: SPEECON - speech databases for consumer devices: database specification and validation. In: Proceedings of Third International Conference on Language Resources and Evaluation (LREC 2002), Canary Islands, Spain, May 2002

    Google Scholar 

  15. Kneser, R., Ney, H.: Forming word classes by statistical clustering for statistical language modelling. In: Proceedings of the First International Conference on Quantitative Linguistics (QUALICO), pp. 221–226, Trier, Germany (1991)

    Google Scholar 

  16. Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proceedings of the 1995 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 181–184 (1995)

    Google Scholar 

  17. Kurimo, M., Enarvi, S., Tilk, O., Varjokallio, M., Mansikkaniemi, A., Alumäe, T.: Modeling under-resourced languages for speech recognition. Lang. Res. Eval. 1–27 (2015)

    Google Scholar 

  18. Martin, S., Liermann, J., Ney, H.: Algorithms for bigram and trigram word clustering. Speech Commun. 24, 19–37 (1998)

    Article  Google Scholar 

  19. Meister, E., Meister, L., Metsvahi, R.: New speech corpora at IoC. In: XXVII Fonetiikan, 2012 – Phonetics Symposium 2012, pp. 30–33 (2012)

    Google Scholar 

  20. Mohri, M., Pereira, F.C.N., Riley, M.: Speech recognition with weighted finite state transducers. In: Benesty, J., Sondhi, M., Huang, Y. (eds.) Handbook on Speech Processing and Speech Communication, pp. 559–584. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  21. Ney, H., Ortmanns, S.: Progress in dynamic programming search for LVCSR. Proc. IEEE 88(8), 1224–1240 (2000)

    Article  Google Scholar 

  22. Niesler, T., Whittaker, E., Woodland, P.: Comparison of part-of-speech and automatically derived category-based language models for speech recognition. In: Proceedings of the ICASSP, Seattle, USA (1998)

    Google Scholar 

  23. Niesler, T., Woodland, P.: Variable-length category n-gram language models. Comput. Speech Lang. 13, 99–124 (1999)

    Article  Google Scholar 

  24. Ortmanns, S., Ney, H.: Look-ahead techniques for fast beam search. Comput. Speech Lang. 14(1), 15–32 (2000)

    Article  Google Scholar 

  25. Pirinen, T.A.: Omorfi - free and open source morphological lexical database for Finnish. In: Proceedings of the 20th Nordic Conference of Computational Linguistics NODALIDA, Vilnius, Lithuania (2015)

    Google Scholar 

  26. Pylkkönen, J.: An efficient one-pass decoder for Finnish large vocabulary continuous speech recognition. In: Proceedings of the 2nd Baltic Confrence on Human Language Technologies (2005)

    Google Scholar 

  27. Siivola, V., Hirsimäki, T., Virpioja, S.: On growing and pruning Kneser-Ney smoothed n-gram models. IEEE Trans. Speech, Audio Lang. Process. 15(5), 1617–1624 (2007)

    Article  Google Scholar 

  28. Silfverberg, M., Ruokolainen, T., Lindén, K., Kurimo, M.: FinnPos: an open-source morphological tagging and lemmatization toolkit for Finnish. Lang. Resour. Eval. 1–16 (2015)

    Google Scholar 

  29. Sixtus, A., Ney, H.: From within-word model search to across-word model search in large vocabulary continuous speech recognition. Comput. Speech Lang. 16(2), 245–271 (2002)

    Article  Google Scholar 

  30. Soltau, H., Saon, G.: Dynamic network decoding revisited. In: IEEE Automatic Speech Recognition and Understanding Workshop, pp. 276–281 (2009)

    Google Scholar 

  31. Tarjan, B., Fegyó, T., Mihajlik, P.: A bilingual study on the prediction of morph-based improvement. In: Proceedings of the 4th International Workshop on Spoken Language Technologies for Under-resourced Languages SLTU, St. Petersburg, Russia (2014)

    Google Scholar 

  32. The Department of General Linguistics, University of Helsinki; The University of Eastern Finland; CSC - IT Center for Science Ltd

    Google Scholar 

  33. Vaic̆iūnas, A.: Statistical language models of Lithuanian and their application to very large vocabulary speech recognition. Summary of Doctoral dissertation. Vytautas Magnus University, Kaunas (2006)

    Google Scholar 

  34. Vaic̆iūnas, A., Kaminskas, V.: Statistical language models of Lithuanian based on word clustering and morphological decomposition. Inform. (Lith. Acad. Sci.) 15, 565–580 (2004)

    Google Scholar 

  35. Varjokallio, M., Kurimo, M.: A word-level token-passing decoder for subword n-gram LVCSR. In: Proceedings of the IEEE Workshop on Spoken Language Technology, South Lake Tahoe, USA(2014)

    Google Scholar 

  36. Varjokallio, M., Kurimo, M., Virpioja, S.: Learning a subword vocabulary based on unigram likelihood. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic (2013)

    Google Scholar 

  37. Whittaker, E., Woodland, P.: Efficient class-based language modelling for very large vocabularies. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Salt Lake City, USA (2001)

    Google Scholar 

  38. Whittaker, E., Woodland, P.: Language modelling for Russian and English using words and classes. Comput. Speech Lang. 17, 87–104 (2003)

    Article  Google Scholar 

  39. Young, S.J., Russell, N.H., Thornton, J.H.S.: Token passing: a simple conceptual model for connected speech recognition system. Technical report, Cambridge University Engineering Department (1989)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the Academy of Finland with the grant 251170. Aalto Science-IT project provided computational resources for the work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matti Varjokallio .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Varjokallio, M., Kurimo, M., Virpioja, S. (2016). Class n-Gram Models for Very Large Vocabulary Speech Recognition of Finnish and Estonian. In: Král, P., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science(), vol 9918. Springer, Cham. https://doi.org/10.1007/978-3-319-45925-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45925-7_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45924-0

  • Online ISBN: 978-3-319-45925-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics