Controlling the Uncertainty Area in the Real Time LVCSR Application

Merkin, Nickolay; Medennikov, Ivan; Romanenko, Alexei; Zatvornitskiy, Alexander

doi:10.1007/978-3-319-11581-8_19

Nickolay Merkin²²,
Ivan Medennikov^23,24,
Alexei Romanenko²³ &
…
Alexander Zatvornitskiy²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8773))

Included in the following conference series:

International Conference on Speech and Computer

1293 Accesses

Abstract

We propose an approach to improving the usability of an automatic speech recognition system in real time. We introduce the concept of an “uncertainty area” (UA): a time span within which the current recognition result may vary. By fixing the length of the UA we make it possible to start editing the recognized text without waiting for the phrase to end. We control the length of the UA by regularly pruning hypotheses using additional criteria. The approach was implemented in the software-hardware system for closed captioning of Russian live TV broadcasts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Evans, M.J.: Speech Recognition in Assisted and Live Subtitling for Television. R&D White Paper WHP 065, BBC Research & Development (2003)
Google Scholar
Pražák, A., Loose, Z., Trmal, J., Psutka, V.J., Psutka, J.: Novel Approach to Live Captioning Through Re-speaking: Tailoring Speech Recognition to Re-speaker’s Needs. In: Proc. of the INTERSPEECH, Portland, USA, September 9-13 (2012)
Google Scholar
Viterbi, A.J.: Convolutional codes and their performance in communication systems. IEEE Transactions on Communication Technology 19(5), 751–772 (1971)
Article MathSciNet Google Scholar
Mohri, M., Pereira, F., Riley, M.: Weighted Finite-State Transducers in Speech Recognition. Computer Speech and Language 16(1), 69–88 (2002)
Article Google Scholar
Chernykh, G., Korenevsky, M., Levin, K., Ponomareva, I., Tomashenko, N.: Cross-Validation State Control in Acoustic Model Training of Automatic Speech Recognition System. Scientific and Technical Journal Priborostroenie 57(2), 23–28 (2014)
Google Scholar
Yurkov, P., Korenevsky, M., Levin, K.: An Improvement of robustness to speech loudness change for an ASR system based on LC-RC features. In: Proc. of the SPECOM, Kazan, Russia, September 27-30, pp. 62–66 (2011)
Google Scholar
Prisyach, T., Khokhlov, Y.: Class acoustic models in automatic speech recognition. In: Proc. of the SPECOM, Kazan, Russia, September 27-30, pp. 67–72 (2011)
Google Scholar
Korenevsky, M., Bulusheva, A., Levin, K.: Unknown Words Modeling in Training and Using Language Models for Russian LVCSR System. In: Proc. of the SPECOM, Kazan, Russia, pp. 144–150 (2011)
Google Scholar
Tomashenko, N., Khokhlov, Y.: Fast Algorithm for Automatic Alignment of Speech and Imperfect Text Data. In: Proc. SPECOM, Plzen, Czech Republic, September 1-5, pp. 146–153 (2013)
Google Scholar
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing 20(1), 30–42 (2012)
Article Google Scholar
Schwarz, P.: Phoneme recognition based on long temporal context (PhD thesis). Faculty of Information Technology BUT, Brno (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Speech Technology Center, Saint-Petersburg, Russia
Nickolay Merkin & Alexander Zatvornitskiy
ITMO University, Saint-Petersburg, Russia
Ivan Medennikov & Alexei Romanenko
SPb State University, Saint-Petersburg, Russia
Ivan Medennikov

Authors

Nickolay Merkin
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Medennikov
View author publications
You can also search for this author in PubMed Google Scholar
Alexei Romanenko
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Zatvornitskiy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Speech and Multimodal Interfaces Laboratory, St. Petersburg Institute of Informatics and Automation of the Russian Academy of Sciences, 39, 14th line, 199178, St. Petersburg, Russia
Andrey Ronzhin
Institute of Applied and Mathematical Linguistics, Moscow State Linguistic University, 38, Ostozhenka, 119034, Moscow, Russia
Rodmonga Potapova
Faculty of Technical Sciences, University of Novi Sad, 6, Trg Dositeja Obradovića, 21000, Novi Sad, Serbia
Vlado Delic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Merkin, N., Medennikov, I., Romanenko, A., Zatvornitskiy, A. (2014). Controlling the Uncertainty Area in the Real Time LVCSR Application. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-11581-8_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11580-1
Online ISBN: 978-3-319-11581-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics