Skip to main content

Keyword Spotting Out of Continuous Speech

  • Chapter
  • First Online:
Book cover Phonetic Search Methods for Large Speech Databases

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

  • 835 Accesses

Abstract

Successful Automatic Speech Recognition (ASR) technology has been a research aspiration for the past five decades. Ideally, computers would be able to transform any type of human speech into an accurate textual transcription. Today’s ASR technology generates fairly good results using structured speech with relatively low Signal to Noise Ratios (SNR), but performance degrades when using spontaneous speech in real-life noisy environments (Murveit et al. 1992; Young 1996; Furui 2003; Deng and Huang 2004). Performance that is acceptable for commercial applications can be achieved using large training corpora of speech and text. However, there are still problems that need to be resolved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Alon G (2005) Key-word spotting—the base technology for speech analytics. Rishon lezion, NSC—natural speech communications

    Google Scholar 

  • Amir A, Efrat A et al (2001) Advances in phonetic word spotting. In: Tenth international conference on information and knowledge management, Atlanta

    Google Scholar 

  • Baker J, Deng L et al (2009) Developments and directions in speech recognition and understanding, Part 1 [DSP Education]. Signal Process Mag IEEE 26(3):75–80

    Article  Google Scholar 

  • Barras C, Allauzen A et al (2002) Transcribing audio-video archives. In: 2002 I.E. international conference on acoustics, speech, and signal processing (ICASSP), IEEE

    Google Scholar 

  • Bar-Yosef Y, Aloni-Lavi R et al (2012) Cross-language phonetic search for keyword spotting. In: Proceedings of 2012 speech processing conference, Tel-Aviv

    Google Scholar 

  • Burget L, Černocký J et al (2006) Indexing and search methods for spoken document. In: Text, speech and dialogue 4188/2006 of Lecture notes in computer science. pp 351–358

    Google Scholar 

  • Butzberger J, Murveit H et al (1992) Spontaneous speech effects in large vocabulary speech recognition applications. In: Workshop on speech and natural language, Association for Computational Linguistics

    Google Scholar 

  • Cardillo PS, Clements M et al (2002) Phonetic searching vs. LVCSR: how to find what you really want in audio archives. Int J Speech Technol 5(1):9–22

    Article  MATH  Google Scholar 

  • Deng L, Huang X (2004) Challenges in adopting speech recognition. Commun ACM 47(1):69–75

    Article  MathSciNet  Google Scholar 

  • Dubois C, Charlet D (2008) Using textual information from LVCSR transcripts for phonetic-based spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’08), Las Vegas

    Google Scholar 

  • Evermann G, Chan H et al (2005) Training LVCSR systems on thousands of hours of data. In: Submitted to ICASSP’05

    Google Scholar 

  • Furui S (2003) Recent advances in spontaneous speech recognition and understanding. ISCA & IEEE workshop on spontaneous speech processing and recognition

    Google Scholar 

  • Furui S, Deng L et al (2012) Fundamental technologies in modern speech recognition. IEEE Signal Process Mag (IEEE Signal Processing Society) 26:16–17

    Article  Google Scholar 

  • Gosztolya G, Tóth L (2011) Spoken term detection based on the most probable phoneme sequence. In: 2011 I.E. 9th international symposium on applied machine intelligence and informatics (SAMI), IEEE, Smolenice

    Google Scholar 

  • Heigold G, Nguyen P et al (2012) Investigations on exemplar-based features for speech recognition towards thousands of hours of unsupervised, noisy data. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE

    Google Scholar 

  • Hirsch HG, Pearce D (2000) The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ASR2000-automatic speech recognition: challenges for the new millennium ISCA tutorial and research workshop (ITRW)

    Google Scholar 

  • Huo Q, Jiang H et al (1997) A Bayesian predictive classification approach to robust speech recognition. In: 1997 I.E. international conference on acoustics, speech, and signal processing (ICASSP’97), vol. 2, IEEE Computer Society

    Google Scholar 

  • Kai T, Suzuki M et al (2012) Combination of SPLICE and feature normalization for noise robust speech recognition. In: International workshop on nonlinear circuits, communications and signal processing (NCSP’12), Honolulu

    Google Scholar 

  • Kamm TM, Meyer GGL (2002) Selective sampling of training data for speech recognition. In: Proceedings of the second international conference on human language technology research, Morgan Kaufmann Publishers Inc, San Francisco

    Google Scholar 

  • Mammone RJ, Zhang X et al (1996) Robust speaker recognition: a feature-based approach. IEEE Signal Process Mag 13:58

    Article  Google Scholar 

  • Mamou J, Ramabhadran B (2008) Phonetic query expansion for spoken document retrieval. In: Interspeech’08, Brisbane

    Google Scholar 

  • Matrouf D, Gauvain J-L (1997) Model compensation for noises in training and test data. In: 1997 I.E. international conference on acoustics, speech, and signal processing (ICASSP’97), IEEE Computer Society

    Google Scholar 

  • Mishne G, Carmel D et al (2005) Automatic analysis of call-center conversations. In: The 14th ACM international conference on information and knowledge management

    Google Scholar 

  • Motlicek P, Valente F et al (2012) Improving acoustic based keyword spotting using LVCSR lattices. In: International conference on acoustic speech and signal processing, Japan

    Google Scholar 

  • Parada C, Sethy A et al (2010) Balancing false alarms and hits in spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’10), IEEE, Dallas

    Google Scholar 

  • Park Y, Patwardhan S et al (2008) An empirical analysis of word error rate and keyword error rate. In: The international conference on spoken language processing (ICSLP), Brisbane

    Google Scholar 

  • Sankar A, Lee CH (1996) A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Trans Speech Audio Process 4(3):190–202

    Article  Google Scholar 

  • Saon G, Chien J-T (2012) Large vocabulary continuous speech recognition recognition systems. IEEE Signal Process Mag (IEEE Signal Processing Society) 29:18–33

    Article  Google Scholar 

  • Schneider D (2011) Holistic vocabulary independent spoken term detection. Ph.D. dissertation. Rheinischen Friedrich-Wilhelms-Universitaat Bonn, Bonn

    Google Scholar 

  • Šmídl, L, Psutka J (2006) Comparison of keyword spotting methods for searching in speech. In: Interspeech 2006, ISCA, Bonn

    Google Scholar 

  • Szöke I, Schwarz P et al (2005) Comparison of keyword spotting approaches for informal continuous speech. In: Eurospeech’05, Lisbon

    Google Scholar 

  • Szöke I, Fapšo M et al (2008) Spoken term detection system based on combination of LVCSR and phonetic search. In: The 4th international conference on machine learning for multimodal interaction, Springer, Berlin

    Google Scholar 

  • Thambiratnam K (2005) Acoustic keyword spotting in speech with applications to data mining. PhD, Speech and Audio Research Laboratory of the SAIVT Program—Center for Built Environment and Engineering Research. Queensland University of Technology, Brisbane, p 248

    Google Scholar 

  • Thambiratnam K, Sridharan S (2005) Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP’05), Philadelphia

    Google Scholar 

  • Thambiratnam K, Sridharan S (2007) Rapid yet accurate speech indexing using dynamic match lattice spotting. IEEE Trans Audio Speech Lang Process 15(1):346–357

    Article  Google Scholar 

  • Tsao Y, Li J et al (2009) Ensemble speaker and speaking environment modeling approach with advanced online estimation process. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’09), IEEE Computer Society, Taipei

    Google Scholar 

  • Viikki O, Laurila K (1998) Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Commun 25(1):133–147

    Article  Google Scholar 

  • Wallace R, Vogt R et al (2007) A phonetic search approach to the to the 2006 NIST spoken term detection evaluation. In: 8th annual conference of the international speech communication association (INTERSPEECH 2007), ISCA, Antwerp

    Google Scholar 

  • Wang, D, Tejedor J et al (2008) A comparison of phone and grapheme-based spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’08), Las Vegas

    Google Scholar 

  • Wilpon JG, Rabiner LR et al (1990) Automatic recognition of keywords in unconstrained speech using hidden Markov models. IEEE Trans Acoust Speech Signal Process 38(11):1870–1878

    Article  Google Scholar 

  • Witbrock MJ, Hauptmann AG (1997) Using words and phonetic strings for efficient information retrieval from imperfectly transcribed spoken documents. In: The second ACM international conference on digital libraries, ACM

    Google Scholar 

  • Young SJ (1993) The HTK hidden Markov model toolkit: design and philosophy. Technical Report TR 153, Department of Engineering, Cambridge University, Cambridge

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2013 The Author(s)

About this chapter

Cite this chapter

Moyal, A., Aharonson, V., Tetariy, E., Gishri, M. (2013). Keyword Spotting Out of Continuous Speech. In: Phonetic Search Methods for Large Speech Databases. SpringerBriefs in Electrical and Computer Engineering(). Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6489-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-6489-1_1

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-6488-4

  • Online ISBN: 978-1-4614-6489-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics