Skip to main content

Turkish Speech Recognition

  • Chapter
  • First Online:

Abstract

Automatic speech recognition (ASR) is one of the most important applications of speech and language processing, as it forms the bridge between spoken and written language processing. This chapter presents an overview of the foundations of ASR, followed by a summary of Turkish language resources for ASR and a review of various Turkish ASR systems. Language resources include acoustic and text corpora as well as linguistic tools such as morphological parsers, morphological disambiguators, and dependency parsers, discussed in more detail in other chapters. Turkish ASR systems vary in the type and amount of data used for building the models. The focus of most of the research for Turkish ASR is the language modeling component covered in Chap. 4.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    University of Pennsylvania, PA, USA. Linguistic Data Consortium: http://www.ldc.upenn.edu (Accessed Sept. 14, 2017).

  2. 2.

    European Language Resources Association. Catalogue of Language Resources: http://catalog.elra.info (Accessed Sept. 14, 2017).

  3. 3.

    Vu, Ngoc Thang and Schultz, Tanja. GlobalPhone Language Models. University of Bremen, Germany. Cognitive Systems Lab: http://www.csl.uni-bremen.de/GlobalPhone/ (Accessed Sept. 14. 2017).

  4. 4.

    European Language Resources Association. Catalogue of Language Resources: http://catalog.elra.info/ (Accessed Sept. 14, 2017).

  5. 5.

    Phonetic acoustic models together with a finite-state transducer based pronunciation lexicon similar to Oflazer and Inkelas (2006) result in similar overall performance, possibly due to a small number of Turkish words with exceptional pronunciation.

References

  • Arısoy E (2004) Turkish dictation system for radiology and broadcast news applications. Master’s thesis, Boğaziçi University, Istanbul

    Google Scholar 

  • Arısoy E (2009) Statistical and discriminative language modeling for Turkish large vocabulary continuous speech recognition. PhD thesis, Boğaziçi University, Istanbul

    Google Scholar 

  • Arısoy E, Saraçlar M (2009) Lattice extension and vocabulary adaptation for Turkish LVCSR. IEEE Trans Audio Speech Lang Process 17(1):163–173

    Google Scholar 

  • Arısoy E, Dutağacı H, Arslan LM (2006) A unified language model for large vocabulary continuous speech recognition of Turkish. Signal Process 86(10):2844–2862

    Google Scholar 

  • Arısoy E, Can D, Parlak S, Sak H, Saraçlar M (2009) Turkish broadcast news transcription and retrieval. IEEE Trans Audio Speech Lang Process 17(5):874–883

    Google Scholar 

  • Arısoy E, Sainath TN, Kingsbury B, Ramabhadran B (2012) Deep neural network language models. In: Proceedings of the NAACL-HLT workshop: will we ever really replace the n-gram model? On the future of language modeling for HLT, Montreal, pp 20–28

    Google Scholar 

  • Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155

    Google Scholar 

  • Çarkı K, Geutner P, Schultz T (2000) Turkish LVCSR: towards better speech recognition for agglutinative languages. In: Proceedings of ICASSP, Istanbul, pp 1563–1566

    Google Scholar 

  • Çetinoğlu Ö (2000) Prolog-based natural language processing infrastructure for Turkish. Master’s thesis, Boğaziçi University, Istanbul

    Google Scholar 

  • Çiloğlu T, Acar D, Tokatlı A (2004) Orientel Turkish: telephone speech database description and notes on the experience. In: Proceedings of INTERSPEECH, Jeju, pp 2725–2728

    Google Scholar 

  • Creutz M, Lagus K (2005) Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0. Publications in Computer and Information Science Report A81, Helsinki University of Technology, Helsinki

    Google Scholar 

  • Dutağacı H (2002) Statistical language models for large vocabulary continuous speech recognition of Turkish. Master’s thesis, Boğaziçi University, Istanbul

    Google Scholar 

  • Erdoğan H, Büyük O, Oflazer K (2005) Incorporating language constraints in sub-word based speech recognition. In: Proceedings of ASRU, San Juan, PR, pp 98–103

    Google Scholar 

  • Eryiğit G, Nivre J, Oflazer K (2008) Dependency parsing of Turkish. Comput Linguist 34(3):357–389

    Google Scholar 

  • Fromkin V, Rodman R, Hyams N (2003) An introduction to language. Thomson Heinle, Boston, MA

    Google Scholar 

  • Geutner P, Finke M, Scheytt P, Waibel A, Wactlar H (1998a) Transcribing multilingual broadcast news using hypothesis driven lexical adaptation. In: Proceedings of DARPA broadcast news workshop, Herndon, VA

    Google Scholar 

  • Geutner P, Finke M, Waibel A (1998b) Phonetic-distance-based hypothesis driven lexical adaptation for transcribing multilingual broadcast news. In: Proceedings of ICSLP, Sydney, pp 2635–2638

    Google Scholar 

  • Geutner P, Finke M, Waibel A (1999) Selection criteria for hypothesis driven lexical adaptation. In: Proceedings of ICASSP, Phoenix, AZ, pp 617–619

    Google Scholar 

  • Hacıoğlu K, Pellom B, Çiloğlu T, Öztürk Ö, Kurimo M, Creutz M (2003) On lexicon creation for Turkish LVCSR. In: Proceedings of EUROSPEECH, Geneva, pp 1165–1168

    Google Scholar 

  • Hakkani-Tür DZ, Oflazer K, Tür G (2002) Statistical morphological disambiguation for agglutinative languages. Comput Hum 36(4):381–410

    Google Scholar 

  • Haznedaroğlu A, Arslan LM (2011) Confidence measures for Turkish call center conversations. In: Proceedings of INTERSPEECH, Florence, pp 1957–1960

    Google Scholar 

  • Haznedaroğlu A, Arslan LM (2014) Language model adaptation for automatic call transcription. In: Proceedings of ICASSP, Florence, pp 4102–4106

    Google Scholar 

  • Haznedaroğlu A, Arslan LM, Büyük O, Eden M (2010) Turkish LVCSR system for call center conversations. In: Proceedings of IEEE signal processing and communications applications conference, Diyarbakır, pp 372–375

    Google Scholar 

  • Hinton G, Deng L, Yu D, Dahl GE, Mohamed Ar, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97

    Google Scholar 

  • Huang X, Acero A, Hon HW (2001) Spoken language processing: a guide to theory, algorithm and system development. Prentice Hall, Upper Saddle River, NJ

    Google Scholar 

  • Ircing P, Psutka J (2001) Two-pass recognition of Czech speech using adaptive vocabulary. In: Proceedings of conference on text, speech and dialogue, Zelezna Ruda, pp 273–277

    Google Scholar 

  • Jelinek F (1997) Statistical methods for speech recognition. The MIT Press, Cambridge, MA

    Google Scholar 

  • Jurafsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall, Upper Saddle River, NJ

    Google Scholar 

  • Levenshtein V (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Dokl 10:707

    Google Scholar 

  • Mengüşoğlu E, Deroo O (2001) Turkish LVCSR: database preparation and language modeling for an agglutinative language. In: Proceedings of ICASSP, Salt Lake City, UT, pp 4018–4021

    Google Scholar 

  • Mikolov T, Karafiat M, Burget L, Cernocky J, Khudanpur S (2010) Recurrent neutral network based language model. In: Proceedings of INTERSPEECH, Saint-Malo, pp 1045–1048

    Google Scholar 

  • Mohri M, Pereira F, Riley M (2002) Weighted finite-state transducers in speech recognition. Comput Speech Lang 16(1):69–88

    Google Scholar 

  • Oflazer K (1994) Two-level description of Turkish morphology. Lit Linguist Comput 9(2):137–148

    Google Scholar 

  • Oflazer K, Inkelas S (2006) The architecture and the implementation of a finite state pronunciation lexicon for Turkish. Comput Speech Lang 20:80–106

    Google Scholar 

  • Pellom BL (2001) Sonic: The University of Colorado continuous speech recognizer. Tech. Rep. TR-CSLR-01, University of Colorado, Boulder, CO

    Google Scholar 

  • Rabiner L (1989) A tutorial on HMM and selected applications in speech recognition. Proc IEEE 77(2):257–286

    Google Scholar 

  • Sak H (2011) Integrating morphology into automatic speech recognition: morpholexical and discriminative language models for Turkish. PhD thesis, Boğaziçi University, Istanbul

    Google Scholar 

  • Sak H, Güngör T, Saraçlar M (2007) Morphological disambiguation of Turkish text with perceptron algorithm. In: Proceedings of CICLING, Mexico City, pp 107–118

    Google Scholar 

  • Sak H, Güngör T, Saraçlar M (2011) Resources for Turkish morphological processing. Lang Resour Eval 45(2):249–261

    Google Scholar 

  • Sak H, Saraçlar M, Güngör T (2012) Morpholexical and discriminative language models for Turkish automatic speech recognition. IEEE Trans Audio Speech Lang Process 20(8):2341–2351

    Google Scholar 

  • Salor Ö (2005) Voice transformation and development of related speech analysis tools for Turkish. PhD thesis, Middle East Technical University, Ankara

    Google Scholar 

  • Salor Ö, Pellom BL, Demirekler M (2003) Implementation and evaluation of a text-to-speech synthesis system for Turkish. In: Proceedings of EUROSPEECH, Geneva

    Google Scholar 

  • Salor Ö, Pellom BL, Çiloğlu T, Demirekler M (2007) Turkish speech corpora and recognition tools developed by porting sonic: towards multilingual speech recognition. Comput Speech Lang 21(4):580–593

    Google Scholar 

  • Saon G, Ramabhadran B, Zweig G (2006) On the effect of word error rate on automated quality monitoring. In: Proceedings of IEEE spoken language technology workshop, Palm Beach, pp 106–109

    Google Scholar 

  • Schalkwyk J, Beeferman D, Beaufays F, Byrne W, Chelba C, Cohen M, Kamvar M, Strope B (2010) “Your Word is my Command”: Google search by voice: a case study. In: Neustein A (ed) Advances in speech recognition: mobile environments, call centers and clinics, Springer, Boston, MA, pp 61–90

    Google Scholar 

  • Schultz T (2002) Globalphone: a multilingual speech and text database developed at Karlsruhe University. In: Proceedings of ICSLP, Denver, CO

    Google Scholar 

  • Schultz T, Waibel A (2001) Language-independent and language-adaptive acoustic modeling for speech recognition. Speech Commun 35:31–51

    Google Scholar 

  • Schultz T, Vu NT, Schlippe T (2013) Globalphone: a multilingual text and speech database in 20 languages. In: Proceedings of ICASSP, Vancouver

    Google Scholar 

  • Schwenk H (2007) Continuous space language models. Comput Speech Lang 21(3):492–518

    Google Scholar 

  • Stolcke A (1998) Entropy-based pruning of backoff language models. In: Proceedings of DARPA broadcast news workshop, Herndon, VA, pp 270–274

    Google Scholar 

  • Stolcke A (2002) SRILM – An extensible language modeling toolkit. In: Proceedings of ICSLP, Denver, CO, vol 2, pp 901–904

    Google Scholar 

  • Tuske Z, Golik P, Schluter R, Ney H (2014) Acoustic modeling with deep neural networks using raw time signal for LVCSR. In: Proceedings of INTERSPEECH, Singapore, pp 890–894

    Google Scholar 

  • Yuret D, Türe F (2006) Learning morphological disambiguation rules for Turkish. In: Proceedings of NAACL-HLT, New York, NY, pp 328–334

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Murat Saraçlar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Arısoy, E., Saraçlar, M. (2018). Turkish Speech Recognition. In: Oflazer, K., Saraçlar, M. (eds) Turkish Natural Language Processing. Theory and Applications of Natural Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-90165-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-90165-7_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-90163-3

  • Online ISBN: 978-3-319-90165-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics