Abstract
Successful Automatic Speech Recognition (ASR) technology has been a research aspiration for the past five decades. Ideally, computers would be able to transform any type of human speech into an accurate textual transcription. Today’s ASR technology generates fairly good results using structured speech with relatively low Signal to Noise Ratios (SNR), but performance degrades when using spontaneous speech in real-life noisy environments (Murveit et al. 1992; Young 1996; Furui 2003; Deng and Huang 2004). Performance that is acceptable for commercial applications can be achieved using large training corpora of speech and text. However, there are still problems that need to be resolved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alon G (2005) Key-word spotting—the base technology for speech analytics. Rishon lezion, NSC—natural speech communications
Amir A, Efrat A et al (2001) Advances in phonetic word spotting. In: Tenth international conference on information and knowledge management, Atlanta
Baker J, Deng L et al (2009) Developments and directions in speech recognition and understanding, Part 1 [DSP Education]. Signal Process Mag IEEE 26(3):75–80
Barras C, Allauzen A et al (2002) Transcribing audio-video archives. In: 2002 I.E. international conference on acoustics, speech, and signal processing (ICASSP), IEEE
Bar-Yosef Y, Aloni-Lavi R et al (2012) Cross-language phonetic search for keyword spotting. In: Proceedings of 2012 speech processing conference, Tel-Aviv
Burget L, Černocký J et al (2006) Indexing and search methods for spoken document. In: Text, speech and dialogue 4188/2006 of Lecture notes in computer science. pp 351–358
Butzberger J, Murveit H et al (1992) Spontaneous speech effects in large vocabulary speech recognition applications. In: Workshop on speech and natural language, Association for Computational Linguistics
Cardillo PS, Clements M et al (2002) Phonetic searching vs. LVCSR: how to find what you really want in audio archives. Int J Speech Technol 5(1):9–22
Deng L, Huang X (2004) Challenges in adopting speech recognition. Commun ACM 47(1):69–75
Dubois C, Charlet D (2008) Using textual information from LVCSR transcripts for phonetic-based spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’08), Las Vegas
Evermann G, Chan H et al (2005) Training LVCSR systems on thousands of hours of data. In: Submitted to ICASSP’05
Furui S (2003) Recent advances in spontaneous speech recognition and understanding. ISCA & IEEE workshop on spontaneous speech processing and recognition
Furui S, Deng L et al (2012) Fundamental technologies in modern speech recognition. IEEE Signal Process Mag (IEEE Signal Processing Society) 26:16–17
Gosztolya G, Tóth L (2011) Spoken term detection based on the most probable phoneme sequence. In: 2011 I.E. 9th international symposium on applied machine intelligence and informatics (SAMI), IEEE, Smolenice
Heigold G, Nguyen P et al (2012) Investigations on exemplar-based features for speech recognition towards thousands of hours of unsupervised, noisy data. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE
Hirsch HG, Pearce D (2000) The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ASR2000-automatic speech recognition: challenges for the new millennium ISCA tutorial and research workshop (ITRW)
Huo Q, Jiang H et al (1997) A Bayesian predictive classification approach to robust speech recognition. In: 1997 I.E. international conference on acoustics, speech, and signal processing (ICASSP’97), vol. 2, IEEE Computer Society
Kai T, Suzuki M et al (2012) Combination of SPLICE and feature normalization for noise robust speech recognition. In: International workshop on nonlinear circuits, communications and signal processing (NCSP’12), Honolulu
Kamm TM, Meyer GGL (2002) Selective sampling of training data for speech recognition. In: Proceedings of the second international conference on human language technology research, Morgan Kaufmann Publishers Inc, San Francisco
Mammone RJ, Zhang X et al (1996) Robust speaker recognition: a feature-based approach. IEEE Signal Process Mag 13:58
Mamou J, Ramabhadran B (2008) Phonetic query expansion for spoken document retrieval. In: Interspeech’08, Brisbane
Matrouf D, Gauvain J-L (1997) Model compensation for noises in training and test data. In: 1997 I.E. international conference on acoustics, speech, and signal processing (ICASSP’97), IEEE Computer Society
Mishne G, Carmel D et al (2005) Automatic analysis of call-center conversations. In: The 14th ACM international conference on information and knowledge management
Motlicek P, Valente F et al (2012) Improving acoustic based keyword spotting using LVCSR lattices. In: International conference on acoustic speech and signal processing, Japan
Parada C, Sethy A et al (2010) Balancing false alarms and hits in spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’10), IEEE, Dallas
Park Y, Patwardhan S et al (2008) An empirical analysis of word error rate and keyword error rate. In: The international conference on spoken language processing (ICSLP), Brisbane
Sankar A, Lee CH (1996) A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Trans Speech Audio Process 4(3):190–202
Saon G, Chien J-T (2012) Large vocabulary continuous speech recognition recognition systems. IEEE Signal Process Mag (IEEE Signal Processing Society) 29:18–33
Schneider D (2011) Holistic vocabulary independent spoken term detection. Ph.D. dissertation. Rheinischen Friedrich-Wilhelms-Universitaat Bonn, Bonn
Šmídl, L, Psutka J (2006) Comparison of keyword spotting methods for searching in speech. In: Interspeech 2006, ISCA, Bonn
Szöke I, Schwarz P et al (2005) Comparison of keyword spotting approaches for informal continuous speech. In: Eurospeech’05, Lisbon
Szöke I, Fapšo M et al (2008) Spoken term detection system based on combination of LVCSR and phonetic search. In: The 4th international conference on machine learning for multimodal interaction, Springer, Berlin
Thambiratnam K (2005) Acoustic keyword spotting in speech with applications to data mining. PhD, Speech and Audio Research Laboratory of the SAIVT Program—Center for Built Environment and Engineering Research. Queensland University of Technology, Brisbane, p 248
Thambiratnam K, Sridharan S (2005) Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP’05), Philadelphia
Thambiratnam K, Sridharan S (2007) Rapid yet accurate speech indexing using dynamic match lattice spotting. IEEE Trans Audio Speech Lang Process 15(1):346–357
Tsao Y, Li J et al (2009) Ensemble speaker and speaking environment modeling approach with advanced online estimation process. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’09), IEEE Computer Society, Taipei
Viikki O, Laurila K (1998) Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Commun 25(1):133–147
Wallace R, Vogt R et al (2007) A phonetic search approach to the to the 2006 NIST spoken term detection evaluation. In: 8th annual conference of the international speech communication association (INTERSPEECH 2007), ISCA, Antwerp
Wang, D, Tejedor J et al (2008) A comparison of phone and grapheme-based spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’08), Las Vegas
Wilpon JG, Rabiner LR et al (1990) Automatic recognition of keywords in unconstrained speech using hidden Markov models. IEEE Trans Acoust Speech Signal Process 38(11):1870–1878
Witbrock MJ, Hauptmann AG (1997) Using words and phonetic strings for efficient information retrieval from imperfectly transcribed spoken documents. In: The second ACM international conference on digital libraries, ACM
Young SJ (1993) The HTK hidden Markov model toolkit: design and philosophy. Technical Report TR 153, Department of Engineering, Cambridge University, Cambridge
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2013 The Author(s)
About this chapter
Cite this chapter
Moyal, A., Aharonson, V., Tetariy, E., Gishri, M. (2013). Keyword Spotting Out of Continuous Speech. In: Phonetic Search Methods for Large Speech Databases. SpringerBriefs in Electrical and Computer Engineering(). Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6489-1_1
Download citation
DOI: https://doi.org/10.1007/978-1-4614-6489-1_1
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6488-4
Online ISBN: 978-1-4614-6489-1
eBook Packages: EngineeringEngineering (R0)