Keyword Spotting Out of Continuous Speech

Moyal, Ami; Aharonson, Vered; Tetariy, Ella; Gishri, Michal

doi:10.1007/978-1-4614-6489-1_1

Ami Moyal⁵,
Vered Aharonson⁵,
Ella Tetariy⁵ &
…
Michal Gishri⁵

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

835 Accesses

Abstract

Successful Automatic Speech Recognition (ASR) technology has been a research aspiration for the past five decades. Ideally, computers would be able to transform any type of human speech into an accurate textual transcription. Today’s ASR technology generates fairly good results using structured speech with relatively low Signal to Noise Ratios (SNR), but performance degrades when using spontaneous speech in real-life noisy environments (Murveit et al. 1992; Young 1996; Furui 2003; Deng and Huang 2004). Performance that is acceptable for commercial applications can be achieved using large training corpora of speech and text. However, there are still problems that need to be resolved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alon G (2005) Key-word spotting—the base technology for speech analytics. Rishon lezion, NSC—natural speech communications
Google Scholar
Amir A, Efrat A et al (2001) Advances in phonetic word spotting. In: Tenth international conference on information and knowledge management, Atlanta
Google Scholar
Baker J, Deng L et al (2009) Developments and directions in speech recognition and understanding, Part 1 [DSP Education]. Signal Process Mag IEEE 26(3):75–80
Article Google Scholar
Barras C, Allauzen A et al (2002) Transcribing audio-video archives. In: 2002 I.E. international conference on acoustics, speech, and signal processing (ICASSP), IEEE
Google Scholar
Bar-Yosef Y, Aloni-Lavi R et al (2012) Cross-language phonetic search for keyword spotting. In: Proceedings of 2012 speech processing conference, Tel-Aviv
Google Scholar
Burget L, Černocký J et al (2006) Indexing and search methods for spoken document. In: Text, speech and dialogue 4188/2006 of Lecture notes in computer science. pp 351–358
Google Scholar
Butzberger J, Murveit H et al (1992) Spontaneous speech effects in large vocabulary speech recognition applications. In: Workshop on speech and natural language, Association for Computational Linguistics
Google Scholar
Cardillo PS, Clements M et al (2002) Phonetic searching vs. LVCSR: how to find what you really want in audio archives. Int J Speech Technol 5(1):9–22
Article MATH Google Scholar
Deng L, Huang X (2004) Challenges in adopting speech recognition. Commun ACM 47(1):69–75
Article MathSciNet Google Scholar
Dubois C, Charlet D (2008) Using textual information from LVCSR transcripts for phonetic-based spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’08), Las Vegas
Google Scholar
Evermann G, Chan H et al (2005) Training LVCSR systems on thousands of hours of data. In: Submitted to ICASSP’05
Google Scholar
Furui S (2003) Recent advances in spontaneous speech recognition and understanding. ISCA & IEEE workshop on spontaneous speech processing and recognition
Google Scholar
Furui S, Deng L et al (2012) Fundamental technologies in modern speech recognition. IEEE Signal Process Mag (IEEE Signal Processing Society) 26:16–17
Article Google Scholar
Gosztolya G, Tóth L (2011) Spoken term detection based on the most probable phoneme sequence. In: 2011 I.E. 9th international symposium on applied machine intelligence and informatics (SAMI), IEEE, Smolenice
Google Scholar
Heigold G, Nguyen P et al (2012) Investigations on exemplar-based features for speech recognition towards thousands of hours of unsupervised, noisy data. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE
Google Scholar
Hirsch HG, Pearce D (2000) The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ASR2000-automatic speech recognition: challenges for the new millennium ISCA tutorial and research workshop (ITRW)
Google Scholar
Huo Q, Jiang H et al (1997) A Bayesian predictive classification approach to robust speech recognition. In: 1997 I.E. international conference on acoustics, speech, and signal processing (ICASSP’97), vol. 2, IEEE Computer Society
Google Scholar
Kai T, Suzuki M et al (2012) Combination of SPLICE and feature normalization for noise robust speech recognition. In: International workshop on nonlinear circuits, communications and signal processing (NCSP’12), Honolulu
Google Scholar
Kamm TM, Meyer GGL (2002) Selective sampling of training data for speech recognition. In: Proceedings of the second international conference on human language technology research, Morgan Kaufmann Publishers Inc, San Francisco
Google Scholar
Mammone RJ, Zhang X et al (1996) Robust speaker recognition: a feature-based approach. IEEE Signal Process Mag 13:58
Article Google Scholar
Mamou J, Ramabhadran B (2008) Phonetic query expansion for spoken document retrieval. In: Interspeech’08, Brisbane
Google Scholar
Matrouf D, Gauvain J-L (1997) Model compensation for noises in training and test data. In: 1997 I.E. international conference on acoustics, speech, and signal processing (ICASSP’97), IEEE Computer Society
Google Scholar
Mishne G, Carmel D et al (2005) Automatic analysis of call-center conversations. In: The 14th ACM international conference on information and knowledge management
Google Scholar
Motlicek P, Valente F et al (2012) Improving acoustic based keyword spotting using LVCSR lattices. In: International conference on acoustic speech and signal processing, Japan
Google Scholar
Parada C, Sethy A et al (2010) Balancing false alarms and hits in spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’10), IEEE, Dallas
Google Scholar
Park Y, Patwardhan S et al (2008) An empirical analysis of word error rate and keyword error rate. In: The international conference on spoken language processing (ICSLP), Brisbane
Google Scholar
Sankar A, Lee CH (1996) A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Trans Speech Audio Process 4(3):190–202
Article Google Scholar
Saon G, Chien J-T (2012) Large vocabulary continuous speech recognition recognition systems. IEEE Signal Process Mag (IEEE Signal Processing Society) 29:18–33
Article Google Scholar
Schneider D (2011) Holistic vocabulary independent spoken term detection. Ph.D. dissertation. Rheinischen Friedrich-Wilhelms-Universitaat Bonn, Bonn
Google Scholar
Šmídl, L, Psutka J (2006) Comparison of keyword spotting methods for searching in speech. In: Interspeech 2006, ISCA, Bonn
Google Scholar
Szöke I, Schwarz P et al (2005) Comparison of keyword spotting approaches for informal continuous speech. In: Eurospeech’05, Lisbon
Google Scholar
Szöke I, Fapšo M et al (2008) Spoken term detection system based on combination of LVCSR and phonetic search. In: The 4th international conference on machine learning for multimodal interaction, Springer, Berlin
Google Scholar
Thambiratnam K (2005) Acoustic keyword spotting in speech with applications to data mining. PhD, Speech and Audio Research Laboratory of the SAIVT Program—Center for Built Environment and Engineering Research. Queensland University of Technology, Brisbane, p 248
Google Scholar
Thambiratnam K, Sridharan S (2005) Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP’05), Philadelphia
Google Scholar
Thambiratnam K, Sridharan S (2007) Rapid yet accurate speech indexing using dynamic match lattice spotting. IEEE Trans Audio Speech Lang Process 15(1):346–357
Article Google Scholar
Tsao Y, Li J et al (2009) Ensemble speaker and speaking environment modeling approach with advanced online estimation process. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’09), IEEE Computer Society, Taipei
Google Scholar
Viikki O, Laurila K (1998) Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Commun 25(1):133–147
Article Google Scholar
Wallace R, Vogt R et al (2007) A phonetic search approach to the to the 2006 NIST spoken term detection evaluation. In: 8th annual conference of the international speech communication association (INTERSPEECH 2007), ISCA, Antwerp
Google Scholar
Wang, D, Tejedor J et al (2008) A comparison of phone and grapheme-based spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’08), Las Vegas
Google Scholar
Wilpon JG, Rabiner LR et al (1990) Automatic recognition of keywords in unconstrained speech using hidden Markov models. IEEE Trans Acoust Speech Signal Process 38(11):1870–1878
Article Google Scholar
Witbrock MJ, Hauptmann AG (1997) Using words and phonetic strings for efficient information retrieval from imperfectly transcribed spoken documents. In: The second ACM international conference on digital libraries, ACM
Google Scholar
Young SJ (1993) The HTK hidden Markov model toolkit: design and philosophy. Technical Report TR 153, Department of Engineering, Cambridge University, Cambridge
Google Scholar

Download references

Author information

Authors and Affiliations

Afeka Academic College of Engineering, Tel-Aviv, Israel
Ami Moyal, Vered Aharonson, Ella Tetariy & Michal Gishri

Authors

Ami Moyal
View author publications
You can also search for this author in PubMed Google Scholar
Vered Aharonson
View author publications
You can also search for this author in PubMed Google Scholar
Ella Tetariy
View author publications
You can also search for this author in PubMed Google Scholar
Michal Gishri
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Moyal, A., Aharonson, V., Tetariy, E., Gishri, M. (2013). Keyword Spotting Out of Continuous Speech. In: Phonetic Search Methods for Large Speech Databases. SpringerBriefs in Electrical and Computer Engineering(). Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6489-1_1

Download citation

DOI: https://doi.org/10.1007/978-1-4614-6489-1_1
Published: 17 January 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6488-4
Online ISBN: 978-1-4614-6489-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics