Skip to main content

Controlling the Uncertainty Area in the Real Time LVCSR Application

  • Conference paper
Book cover Speech and Computer (SPECOM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8773))

Included in the following conference series:

  • 1293 Accesses

Abstract

We propose an approach to improving the usability of an automatic speech recognition system in real time. We introduce the concept of an “uncertainty area” (UA): a time span within which the current recognition result may vary. By fixing the length of the UA we make it possible to start editing the recognized text without waiting for the phrase to end. We control the length of the UA by regularly pruning hypotheses using additional criteria. The approach was implemented in the software-hardware system for closed captioning of Russian live TV broadcasts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Evans, M.J.: Speech Recognition in Assisted and Live Subtitling for Television. R&D White Paper WHP 065, BBC Research & Development (2003)

    Google Scholar 

  2. Pražák, A., Loose, Z., Trmal, J., Psutka, V.J., Psutka, J.: Novel Approach to Live Captioning Through Re-speaking: Tailoring Speech Recognition to Re-speaker’s Needs. In: Proc. of the INTERSPEECH, Portland, USA, September 9-13 (2012)

    Google Scholar 

  3. Viterbi, A.J.: Convolutional codes and their performance in communication systems. IEEE Transactions on Communication Technology 19(5), 751–772 (1971)

    Article  MathSciNet  Google Scholar 

  4. Mohri, M., Pereira, F., Riley, M.: Weighted Finite-State Transducers in Speech Recognition. Computer Speech and Language 16(1), 69–88 (2002)

    Article  Google Scholar 

  5. Chernykh, G., Korenevsky, M., Levin, K., Ponomareva, I., Tomashenko, N.: Cross-Validation State Control in Acoustic Model Training of Automatic Speech Recognition System. Scientific and Technical Journal Priborostroenie 57(2), 23–28 (2014)

    Google Scholar 

  6. Yurkov, P., Korenevsky, M., Levin, K.: An Improvement of robustness to speech loudness change for an ASR system based on LC-RC features. In: Proc. of the SPECOM, Kazan, Russia, September 27-30, pp. 62–66 (2011)

    Google Scholar 

  7. Prisyach, T., Khokhlov, Y.: Class acoustic models in automatic speech recognition. In: Proc. of the SPECOM, Kazan, Russia, September 27-30, pp. 67–72 (2011)

    Google Scholar 

  8. Korenevsky, M., Bulusheva, A., Levin, K.: Unknown Words Modeling in Training and Using Language Models for Russian LVCSR System. In: Proc. of the SPECOM, Kazan, Russia, pp. 144–150 (2011)

    Google Scholar 

  9. Tomashenko, N., Khokhlov, Y.: Fast Algorithm for Automatic Alignment of Speech and Imperfect Text Data. In: Proc. SPECOM, Plzen, Czech Republic, September 1-5, pp. 146–153 (2013)

    Google Scholar 

  10. Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing 20(1), 30–42 (2012)

    Article  Google Scholar 

  11. Schwarz, P.: Phoneme recognition based on long temporal context (PhD thesis). Faculty of Information Technology BUT, Brno (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Merkin, N., Medennikov, I., Romanenko, A., Zatvornitskiy, A. (2014). Controlling the Uncertainty Area in the Real Time LVCSR Application. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11581-8_19

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11580-1

  • Online ISBN: 978-3-319-11581-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics