Skip to main content

Summary

In this chapter, the state of the art of word-spotting and rejection methods is presented. After an introduction to word-spotting, available algorithms are classified in several categories. This is followed by a description of template matching word-spotting systems, garbage modeling, and the use of a large vocabulary recognizer in a word-spotting task. Then, we address the issues of vocabulary-independent word-spotting, performance measures, and rejection. The rejection problem is associated to the notion of confidence measure indicating how well an hypothesis matches with the recognized result. Consequently, confidence measures and the related problem of detecting out-of-vocabulary words are considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Alleva, F. and Lee, K.-F. (1989). Automatic new word acquisition: Spelling from acoustics. In DARPA Speech and Natural Language Workshop, pages 266–270.

    Chapter  Google Scholar 

  • Antoniol, G., Cettolo, M., and Federico, M. (1993). Robust and reliable speech understanding in restricted domains. In IEEE ASR Workshop, pages 103–104.

    Google Scholar 

  • Asadi, A., Schwartz, R., and Makhoul, J. (1990). Automatic detection of new words in a large-vocabulary continuous speech recognition system. In ICASSP, pages 125–128.

    Google Scholar 

  • Asadi, A., Schwartz, R., and Makhoul, J. (1991). Automatic modeling for adding new words to a large-vocabulary continuous speech recognition system. In ICASSP, pages 305–308.

    Google Scholar 

  • Bahl, L., Brown, P., de Souza, P., Mercer, R., and Picheny, M. (1988). Acoustic Markov models in the TANGORA speech recognition system. In ICASSP, pages 497–500.

    Google Scholar 

  • Boite, J.-M., Boulard, H., D’hoore, B., and Haesen, M. (1993). A new approach towards keyword spotting. In EUROSPEECH, pages 1273–1276.

    Google Scholar 

  • Boulard, H., D’hoore, B., and Boite, J.-M. (1994). Optimizing recognition and rejection performance in wordspotting systems. In ICASSP, pages I.373-I.376.

    Google Scholar 

  • Bridle, J. (1973). An efficient elastic-template method for detecting given words in running speech. In Brit Acoust. Soc. Meeting, pages 1–4.

    Google Scholar 

  • Chigier, B. (1992). Rejection and keyword spotting algorithms for a directory assistance city name recognition application. In ICASSP, pages H.93-II.96.

    Google Scholar 

  • Christiansen, R. and Rushforth, C. (1977). Detecting and locating key words in continuous speech using linear predictive coding. IEEE Trans. ASSP, ASSP-25(5):361–367.

    Article  Google Scholar 

  • Cole, R., Novick, D., Fanty, M., and. S. Sutton, P. V., Burnett, D., and Schalkwyk, J. (1994). A prototype voice-response questionnaire for the U.S. census. In ICSLP, pages 683–686.

    Google Scholar 

  • De la Torre, C. and Acero, A. (1994). Discriminative training of garbage model for non-vocabulary utterance rejection. In ICSLP, pages 475–478.

    Google Scholar 

  • Feng, M.-W. and Mazor, B. (1992). Continuous word spotting for applications in telecommunications. In ICSLP, pages 21–24.

    Google Scholar 

  • Gillick, L., Baker, J., Baker, J., Bridle, J., Hunt, M., Ito, Y., Lowe, S., Orloff, J., Peskin, B., Roth, R., and Scattone, F. (1993). Application of large vocabulary continuous speech recognition to topic and speaker identification using telephone speech. In ICASSP, pages II.471-n.474.

    Google Scholar 

  • Gish, H., Ng, K., and Rohlicek, J. (1992). Secondary processing using speech segments for an HMM word spotting system. In ICSLP, pages 17–20.

    Google Scholar 

  • Godfrey, J., Holliman, E., and McDaniel, J. (1992). Switchboard: Telephone speech corpus for research and development. In ICASSP, pages 1.517–1.520.

    Google Scholar 

  • Haeb-Umbach, R., Beyerlein, P., and Thelen, E. (1995). Automatic transcription of unknown words in a speech recognition system. In ICASSP, pages 840–843.

    Google Scholar 

  • Higgins, A. and Wohlford, R. (1985). Keyword recognition using template concatenation. In ICASSP, pages 1233–1236.

    Google Scholar 

  • Hofstetter, E. and Rose, R. (1992). Techniques for task independent word spotting in continuous speech messages. In ICASSP, pages H101–11.104.

    Google Scholar 

  • Inamura, A. and Suzuki, Y. (1990). Speaker-independent word spotting and a transputer-based implementation. In ICSLP, pages 13.5.1–13.5.4.

    Google Scholar 

  • James, D. and Young, S. (1994). A fast lattice-based approach to vocabulary independent wordspotting. In ICASSP, pages I.377-I.380.

    Google Scholar 

  • Jones, G., Foote, J., Sparck-Jones, K., and Young, S. (1995). Video mail retieval: The effect of word spotting accuracy on precision. In ICASSP, pages 309–312.

    Google Scholar 

  • Kimura, T., Niyada, K., Hiraoka, S., Morii, S., and Watanabe, T. (1987). A telephone speech recognition system using word spotting technique based on statistical measure. In ICASSP, pages 1175–1178.

    Google Scholar 

  • Li, K., Naylor, J., and Rossen, M. (1992). A whole word recurrent neural network for keyword spotting. In ICASSP, pages II.81–II84.

    Google Scholar 

  • Lleida, E., MariJ., Salavedra, J., Bonafonte, A., Monte, E., and Martinez, A. (1993). Out-of-vocabulary word modelling and rejection for keyword spotting. In EU-ROSPEECH, pages 1265–1268.

    Google Scholar 

  • Marcus, J. (1992). A novel algorithm for HMM word spotting, performance evaluation and error analysis. In ICASSP, pages II.89–II.92.

    Google Scholar 

  • Masai, Y., Tanaka, S., and Nitta, T. (1992). Speaker-independent keyword recognition based on SMQ/HMM. In ICSLP, pages 619–622.

    Google Scholar 

  • Mathan, L. and Miclet, L. (1991). Rejection of extraneous speech input in speech recognition applications using multi-layer perceptrons and the trace of HMMs. In ICASSP, pages 93–96.

    Google Scholar 

  • Meng, H., Seneff, S., and Zue, V. (1994). Phonological parsing for reversible letter-to-sound/sound-to-letter generation. In ICASSP, pages II.1–II.4

    Google Scholar 

  • Mercier, G. (1989). Rules and strategies for syllabic segmentation, phoneme identification and tuning in continuous speech. In Lea, W., editor, Towards Robustness in Speech Recognition, pages 409–426. Speech Science Publications.

    Google Scholar 

  • Morgan, D., Scofield, C., Lorenzo, T., Real, E., and Loconto, D. (1990). A keyword spotter which incorporates neural networks for secondary processing. In ICASSP, pages 113–116.

    Google Scholar 

  • Myers, C., Rabiner, L., and Rosenberg, A. (1980). An investigation of the use of dynamic time warping for word spotting and connected speech recognition. In ICASSP, pages 173–177.

    Google Scholar 

  • Nakagawa, S. (1989). Speaker-independent continuous-speech recognition by phoneme-based word spotting and time-synchronous context-free parsing. Computer Speech and Language, 3(3):277–299.

    Article  Google Scholar 

  • Nakagawa, S., Hauptmann, A., and Tomita, M. (1986). On quick word spotting techniques. In ICASSP, pages 2311–2314.

    Google Scholar 

  • Nakamura, S., Akabane, T., and Hamaguchi, S. (1993). Robust word spotting in adverse car environments. In EUROSPEECH, pages 1045–1048.

    Google Scholar 

  • MST (1991). NIST speech disc 6–1.1.

    Google Scholar 

  • Rahim, M., Lee, C.-H., and Juang, B.-H. (1995). Robust utterance verification for connected digits recognition. In ICASSP, pages 285–288.

    Google Scholar 

  • Rohlicek, J., Jeanrenaud, P., Ng, K., Gish, H., Musicus, B., and Siu, M. (1993). Phonetic training and language modeling for word spotting. In ICASSP, pages II.459–II.462.

    Google Scholar 

  • Rohlicek, J., Russel, W., Roukos, S., and Gish, H. (1989). Continuous hidden Markov modeling for speaker-independent word-spotting. In ICASSP, pages 627–630.

    Google Scholar 

  • Rose, R. (1992). Discriminant wordspotting techniques for rejecting non-vocabulary utterances in unconstrained speech. In ICASSP, pages II.105–II.108.

    Google Scholar 

  • Rose, R. (1993). Definition of subword acoustic units for wordspotting. In EUROSPEECH, pages 1049–1052.

    Google Scholar 

  • Rose, R., Chang, E., and Lippmann, R. (1991). Techniques for information retrieval from voice messages. In ICASSP, pages 317–321.

    Google Scholar 

  • Rose, R., Juang, B.-H., and Lee, C.-H. (1995). A training procedure for verifying string hypotheses in continuous speech recognition. In ICASSP, pages 281–284.

    Google Scholar 

  • Rose, R. and Paul, D. (1990). A hidden Markov model based keyword recognition system. In ICASSP, pages 129–132.

    Google Scholar 

  • Rosenberg, A. and Collat, A. (1987). A connected speech recognition system based on spotting diphone-like segments - preliminary results. In ICASSP, pages 85–88.

    Google Scholar 

  • Song, J. (1993). Continuous HMM for word spotting and rejection of non vocabulary word in speech recognition over telephone networks. In EUROSPEECH, pages 1563–1566.

    Google Scholar 

  • Sukkar, R. (1994). Rejection for connected digit recognition based on GPD segmental discrimination. In ICASSP, pages I.393–I.396.

    Google Scholar 

  • Sukkar, R. and Wilpon, J. (1993). A two pass classifier for utterance rejection in keyword spotting. In ICASSP, pages II.451–II.454.

    Google Scholar 

  • Sunstar (1992). Sunstar Esprit Project 2094. Design and recording of the SAMOGO database, Doc. W-PIV.STC.009.

    Google Scholar 

  • Takebayashi, Y., Tsuboi, H., and Kanazawa, H. (1991). A robust speech recognition system using word-spotting with noise immunity learning. In ICASSP, pages 905–908.

    Google Scholar 

  • Teixeira, C., Trancoso, I., and Serralheiro, A. (1992). Single vs. multiple sink models for isolated and connected word recognition. In ETRW: Speech Processing in Adverse Conditions, pages 179–182.

    Google Scholar 

  • Tsuboi, H., Kanazawa, H., and Takebayashi, Y. (1990). An accelerator for a highspeed spoken word-spotting and noise immunity learning system. In ICSLP, pages 273–276.

    Google Scholar 

  • Villarubia, L. and Acero, A. (1993). Rejection techniques for digit recognition in telecommunication applications. In ICASSP, pages II.455–II.458.

    Google Scholar 

  • Weintraub, M. (1993). Keyword-spotting using SRI’s DECIPHER large-vocabulary speech recognition system. In ICASSP, pages II.463–II.466.

    Google Scholar 

  • Wilcox, L. and Bush, M. (1991). HMM-based wordspotting for voice editing and indexing. In EUROSPEECH, pages 25–28.

    Google Scholar 

  • Wilpon, J., Miller, L., and Modi, P. (1991). Improvements and applications for key word recognition using hidden Markov modeling techniques. In ICASSP. pages 309–312.

    Google Scholar 

  • Wilpon, J., Rabiner, L., Lee, C.-H., and Goldman, E. (1990). Automatic recognition of keywords in unconstrained speech using hidden Markov models. IEEE Trans. ASSP, ASSP-38(11):1870–1878.

    Article  Google Scholar 

  • Wright, J., Carey, M., and Parris, E. (1995). Improved topic spotting through statistical modelling of keyword dependencies. In ICASSP, pages 313–316.

    Google Scholar 

  • Yamada, M., Komori, Y., and Ohora, Y. (1994). Active/non-active word control using garbage model - unknown word re-evaluation in speech conversation. In ICSLP, pages 823–826.

    Google Scholar 

  • Young, S. R. (1994a). Detecting misrecognitions and out-of-vocabulary words. In ICASSP, pages II.21–II.24.

    Google Scholar 

  • Young, S. R. (1994b). Estimating recognition confidence: Methods for conjoining acoustics, semantics, pragmatics and discourse. In ICSLP, pages 2159–2162.

    Google Scholar 

  • Young, S. R. and Ward, W. (1993a). Learning new words from spontaneous speech. In ICASSP, pages II.590–II.591.

    Google Scholar 

  • Young, S. R. and Ward, W. (1993b). Recognition confidence measures for spontaneous spoken dialog. In EUROSPEECH, pages 1177–1179.

    Google Scholar 

  • Zeppenfeld, T., Houghton, R., and Waibel, A. (1993). Improving the MS-TDNN for word spotting. In ICASSP, pages II.475–II.478.

    Google Scholar 

  • Zeppenfeld, T. and Waibel, A. (1992). A hybrid neural network, dynamic programming word spotter. In ICASSP, pages II.77–II.80.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Kluwer Academic Publishers

About this chapter

Cite this chapter

Junqua, JC., Haton, JP. (1996). Word-Spotting and Rejection. In: Robustness in Automatic Speech Recognition. The Kluwer International Series in Engineering and Computer Science, vol 341. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1297-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-1297-0_10

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4612-8555-7

  • Online ISBN: 978-1-4613-1297-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics