Skip to main content

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

  • 709 Accesses

Abstract

In this chapter basic concepts of speech recognition are presented. Acoustic processing, acoustic modeling and search algorithms are briefly described. A more detailed explanation is given on language modeling. Afterwards some features of inflective languages are described and how these features are important in the process of designing speech recognition systems. First inflective languages are discussed in general, then Slovene as an example is discussed in more detail. Next some typical methods to overcome the difficulties of speech recognition in inflective languages and improve speech recognition accuracy are described. These are the enlargement of the vocabulary, the use of sub-word language models and other more sophisticated language models. The last part of the chapter discusses morphosyntactic description tagging in inflective languages that will be used in further chapters. This chapter does not give a comprehensive overview of speech recognition, solely basic descriptions and some more information that is necessary to understand the content of further chapters are given.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alleva F, Huang X, Hwang MY (1993) An improved search algorithm using incremental knowledge for continuous speech recognition. In: 1993 IEEE international conference on acoustics, speech, and signal processing, Minneapolis, April 1993, pp 307–310

    Google Scholar 

  2. Arhar Š, Gorjanc V, Krek S (2007) FidaPLUS corpus of Slovenian: the new generation of the Slovenian reference corpus: its design and tools. In: Davies M (ed) Proceedings of the corpus linguistics conference, Birmingham, 2007, pp 27–30

    Google Scholar 

  3. Aubert XL (2002) An overview of decoding techniques for large vocabulary continuous speech recognition. Comput Speech Lang 16:89–114. doi:10.1006/csla.2001.0185

    Article  Google Scholar 

  4. Axelrod AE (2006) Factored language models for statistical machine translation. Dissertation, University of Edinburgh

    Google Scholar 

  5. Biem A, McDermott E, Katagiri S (1996) A discriminative filter bank model for speech recognition. In: Proceedings of the IEEE, ICASSP-96, Atlanta, May 1996, pp 545–548

    Google Scholar 

  6. Bilmes JA, Kirchhoff K (2003) Factored language models and generalized parallel backoff. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology, Edmonton, 2003, pp 4–6

    Google Scholar 

  7. Chen SF, Goodman J (1999) An empirical study of smoothing techniques for language modeling. Comput Speech Lang 13:359–394. doi:10.1006/csla.1999.0128

    Article  Google Scholar 

  8. Donaj G, Kačič Z (2011) Perplexity testing of factored language models on morphological tags in the Slovene language. In: International conference; 1st, information technology and computer networks; latest trends in information technology, Vienna, 2011, pp 237–242

    Google Scholar 

  9. Erjavec T, Fišer D, Krek S, Ledinek S (2010) The JOS linguistically tagged corpus of Slovene. In: 7th International conference on language resources and evaluations (LREC-10), Valletta, 19–21 May 2010, pp 1806–1809

    Google Scholar 

  10. Flynn R, Jones E (2012) Feature selection for reduced-bandwidth distributed speech recognition. Speech Commun 54:836–843. doi:10.1016/j.specom.2012.01.003

    Article  Google Scholar 

  11. Gales M, Young S (2007) The application of hidden Markov models in speech recognition. Found Trends Signal Process 1:195–304. doi:10.1561/2000000004

    Article  MATH  Google Scholar 

  12. Gemmeke JF, Cranen B, Remes U (2011) Sparse imputation for large vocabulary noise robust ASR. Comput Speech Lang 25:462–479. doi:10.1016/j.csl.2010.06.004

    Article  Google Scholar 

  13. Geutner P, Finke M, Scheytt P (1998) Adaptive vocabularies for transcribing multilingual broadcast news. In: Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, Seattle, 1998, pp 925–928

    Google Scholar 

  14. Giménez J, Màrquez L (2004) SVMTool: a general POS tagger generator based on support vector machines. In: Proceedings of the 4th international conference on language resources and evaluation (LREC-04), Lisbon, 26–28 May 2004, pp 43–46

    Google Scholar 

  15. Grčar M, Krek S, Dobrovoljc K (2012) Obeliks: statistični oblikoskladenjski označevalnik in lematizator za slovenski jezik. In: Jezikovne tehnologije 2012, Ljubljana, September 2012, pp 89–94

    Google Scholar 

  16. Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am 87:1738–1752. doi:10.1121/1.399423

    Article  Google Scholar 

  17. Hirsimäki T, Kurimo M (2004) Decoder issues in unlimited Finnish speech recognition. In: Proceedings of the 6th nordic signal processing symposium, Espoo, 9–11 June 2004, pp 320–323

    Google Scholar 

  18. Hirsimäki T, Creutz M, Siivola V, Kurimo M, Virpioja S, Pylkkönen J (2005) Unlimited vocabulary speech recognition with morph language models applied to Finnish. Comput Speech Lang 20:515–541. doi:10.1016/j.csl.2005.07.002

    Article  Google Scholar 

  19. Hirsimäki T, Pylkkonen J, Kurimo M (2009) Importance of high-order N-gram models in morph-based speech recognition. IEEE Trans Audio Speech 17:724–732. doi:10.1109/TASL.2008.2012323

    Article  Google Scholar 

  20. Huang X, Acero A, Hon HW (2001) Spoken language processing: a guide to theory, algorithm and system development. Prentice Hall PTR, Upper Saddle River

    Google Scholar 

  21. Huet S, Gravier G, Sebillot P (2010) Morpho-syntactic post-processing of N-best lists for improved French automatic speech recognition. Comput Speech Lang 24:663–684. doi:10.1016/j.csl.2009.10.001

    Article  Google Scholar 

  22. Ircing P, Psutka JV, Psutka J (2009) Using morphological information for robust language modeling in Czech ASR system. IEEE Trans Audio Speech 17:840–847. doi:10.1109/TASL.2009.2014217

    Article  MATH  Google Scholar 

  23. Jelinek F (1976) Continuous speech recognition by statistical methods. Proc IEEE 64:532–556. doi:10.1109/PROC.1976.10159

    Article  Google Scholar 

  24. Jiang H (2010) Discriminative training of HMMs for automatic speech recognition: a survey. Comput Speech Lang 24:589–608. doi:10.1016/j.csl.2009.08.002

    Article  Google Scholar 

  25. Katz S (1987) Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans Acoust Speech 35:400–401. doi:10.1109/TASSP.1987.1165125

    Article  Google Scholar 

  26. Kaufmann T, Pfister B (2012) Syntactic language modeling with formal grammars. Speech Commun 54:715–731. doi:10.1016/j.specom.2012.01.001

    Article  Google Scholar 

  27. Kirchhoff K, Vergyri D, Bilmes J, Duh K, Stolcke A (2005) Morphology-based language modeling for conversational Arabic speech recognition. Comput Speech Lang 20:589–608. doi:10.1016/j.csl.2005.10.001

    Article  Google Scholar 

  28. Kirchhoff K, Bilmes J, Duh K (2008) Factored language models tutorial. http://ssli.ee.washington.edu/people/duh/papers/flm-manual.pdf. Accessed 1 June 2015

  29. Klakow D, Peters J (2002) Testing the correlation of word error rate and perplexity. Speech Commun 38:19–28. doi:10.1016/S0167-6393(01)00041-3

    Article  MATH  Google Scholar 

  30. Krek S (2012) Slovenski jezik v digitalni dobi: the Slovene language in the digital age. Springer, Heidelberg

    Google Scholar 

  31. Logar Beginc N, Kosem I (2011) Gigafida – the new corpus of modern Slovene: what is really in there? In: The second conference on Slavic Corpora, Dubrovnik, 12–14 September 2011

    Google Scholar 

  32. Lv Z, Liu W, Yang Z (2009) A novel interpolated N-gram language model based on class hierarchy. In: International conference on natural language processing and knowledge engineering, Dalian, 24–27 September 2009, pp 1–5

    Google Scholar 

  33. Màrquez L, Rodríguez H (1998) Part-of-speech tagging using decision trees. In: Nedellec C, Rouveirol C (eds) Machine learning: ECML-98. Springer, Heidelberg, pp 25–36

    Chapter  Google Scholar 

  34. Ming J, Smith FJ (1999) A Bayesian triphone model. Comput Speech Lang 13:195–206. doi:10.1006/csla.1999.0120

    Article  Google Scholar 

  35. Mohri M, Pereira F, Riley M (2001) Weighted finite-state transducers in speech recognition. Comput Speech Lang 16:69–88. doi:10.1006/csla.2001.0184

    Article  Google Scholar 

  36. Mousa AED, Shaik MAB, Schlüter R, Ney H (2010) Sub-lexical language models for German LVCSR. In: 2010 IEEE spoken language technology workshop (SLT), Berkeley, 2010, pp 171–176

    Google Scholar 

  37. Mousa AED, Shaik MAB, Schlüter R, Ney H (2011) Morpheme based factored language models for German LVCSR. In: Proceedings of interspeech 2011, Florence, August 2011, pp I-1053–I-1056

    Google Scholar 

  38. Najedlova D (2002) Comparative study on bigram language models for spoken Czech recognition. In: Sojka P, Kopeček I, Pala K (eds) Text, speech and dialogue: 5th international conference, TSD 2002, Brno, September 2002. Lecture notes in computer science, vol 2448. Springer, Heidelberg, pp 197–204

    Chapter  Google Scholar 

  39. Nouza J, Nejedlova D, Zdansky J, Kolorenc J (2004) Very large vocabulary speech recognition system for automatic transcription of Czech broadcast programs. In: Proceedings of interspeech 2004, Jeju, pp 409–412

    Google Scholar 

  40. Nouza J, Zdansky J, Cerva P et al (2010) Challenges in speech processing of slavic languages (case studies in speech recognition of Czech and Slovak). In: Esposito A, Campbell N, Vogel C et al (eds) Development of multimodal interfaces: active listening and synchrony: second COST 2102 international training school. Springer, Heidelberg, pp 225–241

    Google Scholar 

  41. Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77:257–286. doi:10.1109/5.18626

    Article  Google Scholar 

  42. Ramirez J, Gorriz JM, Segura JC (2007) Voice activity detection. Fundamentals and speech recognition system robustness. In: Grimm M, Kroschel K (eds) Robust speech recognition and understanding. InTech, Vienna, pp 1–22

    Google Scholar 

  43. Rotovnik T, Sepesy Maučec M, Kačič Z (2007) Large vocabulary continuous speech recognition of an inflected language using stems and endings. Speech Commun 49:437–452. doi:10.1016/j.specom.2007.02.010

    Article  Google Scholar 

  44. Sak H, Saraçlar M, Güngör T (2010) Morphology-based and sub-word language modeling for Turkish speech recognition. In: 2010 IEEE international conference on acoustics speech and signal processing (ICASSP), Dallas, 14–19 March 2010, pp 5402–5405

    Google Scholar 

  45. Schmid H (1994) Part-of-speech tagging with neural networks. In: Proceedings of the 15th international conference on computational linguistics, Kathmandu, 6–12 April 1994, pp 172–176

    Google Scholar 

  46. Sepesy Maučec M, Donaj G, Kačič Z (2013) Improving statistical machine translation with additional language models. In: 6th Language & technology conference, Poznan, 7–9 December 2013, pp 137–141

    Google Scholar 

  47. Shaik MAB, Mousa AED, Schlüter R, Ney H (2011) Using morpheme and syllable based sub-words for polish LVCSR. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), Prague, 22–27 May 2011, pp 4680–4683

    Google Scholar 

  48. Shin JW, Chang JH, Kim NS (2010) Voice activity detection based on statistical models and machine learning approaches. Comput Speech Lang 24:515–530. doi:10.1016/j.csl.2009.02.003

    Article  Google Scholar 

  49. Su Y, Jelinek F, Khudanpur S (2007) Large-scale random forest language models for speech recognition. In: Proceedings of interspeech, Antwerp, 2007, pp 598–601

    Google Scholar 

  50. Topirišic J (1984) Slovenska slovnica. Obzorja, Ljubljana

    Google Scholar 

  51. Viterbi AJ (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inform Theory 13:260–269. doi:10.1109/TIT.1967.1054010

    Article  MATH  Google Scholar 

  52. Whittaker EWD, Woodland PC (2003) Language modelling for Russian and English using words and classes. Comput Speech Lang 17:87–104. doi:10.1016/S0885-2308(02)00047-5

    Article  Google Scholar 

  53. Young SJ, Evermann G, Gales MJF et al (2006) The HTK book, version 3.4. Cambridge University Press, Cambridge

    Google Scholar 

  54. Zablotskiy S, Zablotskaya K, Minker W (2010) Some approaches for Russian speech recognition. In: 2010 Sixth international conference on intelligent environments, Kuala Lumpur, 19–21 July 2010, pp 96–99

    Google Scholar 

  55. Žgank A, Verdonik D, Zögling Markuš A, Kačič Z (2005) BNSI Slovenian broadcast news database – speech and text corpus. In: Proceedings of interspeech 2005 – Eurospeech, Lisbon, 4–8 September 2005, pp 2525–2528

    Google Scholar 

  56. Zitouni I (2007) Backoff hierarchical class n-gram language models: effectiveness to model unseen events in speech recognition. Comput Speech Lang 21:88–104. doi:10.1016/j.csl.2006.01.001

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2017 The Author(s) - SpringerBriefs

About this chapter

Cite this chapter

Donaj, G., Kačič, Z. (2017). Speech Recognition in Inflective Languages. In: Language Modeling for Automatic Speech Recognition of Inflective Languages. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-41607-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41607-6_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41605-2

  • Online ISBN: 978-3-319-41607-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics