Speech-Driven Text Retrieval: Using Target IR Collections for Statistical Language Model Adaptation in Speech Recognition

Fujii, Atsushi; Itou, Katunobu; Ishikawa, Tetsuya

doi:10.1007/3-540-45637-6_9

Atsushi Fujii⁶,
Katunobu Itou⁷ &
Tetsuya Ishikawa⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2273))

Included in the following conference series:

Workshop on Information Retrieval Techniques for Speech Applications

220 Accesses
8 Citations

Abstract

Speech recognition has of late become a practical technology for real world applications. Aiming at speech-driven text retrieval, which facilitates retrieving information with spoken queries, we propose a method to integrate speech recognition and retrieval methods. Since users speak contents related to a target collection, we adapt statistical language models used for speech recognition based on the target collection, so as to improve both the recognition and retrieval accuracy. Experiments using existing test collections combined with dictated queries showed the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

L. R. Bahl, F. Jelinek, and R. L. Mercer. A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5(2):179–190, 1983.
Article Google Scholar
J. Barnett, S. Anderson, J. Broglio, M. Singh, R. Hudson, and S. W. Kuo. Experiments in spoken queries for document retrieval. In Proceedings of Eurospeech97, pages 1323–1326, 1997.
Google Scholar
F. Crestani. Word recognition errors and relevance feedback in spoken query processing. In Proceedings of the Fourth International Conference on Flexible Query Answering Systems, pages 267–281, 2000.
Google Scholar
J. S. Garofolo, E. M. Voorhees, V. M. Stanford, and K. S. Jones. TREC-6 1997 spoken document retrieval track overview and results. In Proceedings of the 6th Text REtrieval Conference, pages 83–91, 1997.
Google Scholar
S. Johnson, P. Jourlin, G. Moore, K. S. Jones, and P. Woodland. The Cambridge University spoken document retrieval system. In Proceedings of ICASSP’99, pages 49–52, 1999.
Google Scholar
G. Jones, J. Foote, K. S. Jones, and S. Young. Retrieving spoken documents by combining multiple index sources. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 30–38, 1996.
Google Scholar
T. Kawahara, A. Lee, T. Kobayashi, K. Takeda, N. Minematsu, S. Sagayama, K. Itou, A. Ito, M. Yamamoto, A. Yamada, T. Utsuro, and K. Shikano. Free software toolkit for Japanese large vocabulary continuous speech recognition. In Proceedings of the 6th International Conference on Spoken Language Processing, pages 476–479, 2000.
Google Scholar
K. Kwok and M. Chan. Improving two-stage ad-hoc retrieval for short queries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 250–256, 1998.
Google Scholar
H. Masataki, Y. Sagisaka, K. Hisaki, and T. Kawahara. Task adaptation using MAP estimation in n-gram language modeling. In Proceedings of ICASSP’97, pages 783–786, 1997.
Google Scholar
Y. Matsumoto, A. Kitauchi, T. Yamashita, Y. Hirano, H. Matsuda, and M. Asahara. Japanese morphological analysis system ChaSen version 2.0 manual 2nd edition. Technical Report NAIST-IS-TR99009, NAIST, 1999.
Google Scholar
National Institute of Informatics. Proceedings of the 2nd NTCIR Workshop Meeting on Evaluation of Chinese & Japanese Text Retrieval and Text Summarization, 2001.
Google Scholar
S. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 232–241, 1994.
Google Scholar
K. Seymore and R. Rosenfeld. Using story topics for language model adaptation. In Proceedings of Eurospeech97, 1997.
Google Scholar
P. Sheridan, M. Wechsler, and P. Schäuble. Cross-language speech retrieval: Establishing a baseline performance. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 99–108, 1997.
Google Scholar
A. Singhal and F. Pereira. Document expansion for speech retrieval. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 34–41, 1999.
Google Scholar
S. Srinivasan and D. Petkovic. Phonetic confusion matrix based spoken document retrieval. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 81–87, 2000.
Google Scholar
E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 315–323, 1998.
Google Scholar
M. Wechsler, E. Munteanu, and P. Schäuble. New techniques for open-vocabulary spoken document retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 20–27, 1998.
Google Scholar
S. Whittaker, J. Hirschberg, J. Choi, D. Hindle, F. Pereira, and A. Singhal. SCAN: Designing and evaluating user interfaces to support retrieval from speech archives. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 26–33, 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Library and Information Science, 1-2 Kasuga, 305-8550, Tsukuba, Japan
Atsushi Fujii & Tetsuya Ishikawa
National Institute of Advanced Industrial Science and Technology, 1-1-1 Chuuou Daini Umezono, 305-8568, Tsukuba, Japan
Katunobu Itou

Authors

Atsushi Fujii
View author publications
You can also search for this author in PubMed Google Scholar
Katunobu Itou
View author publications
You can also search for this author in PubMed Google Scholar
Tetsuya Ishikawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IBM T.J. Watson Research Center, P.O.Box 704, 10598, Yorktown Heights, NY, USA
Anni R. Coden & Eric W. Brown &
IBM Almaden Research Center, 650 Harry Road, 95120, San Jose, CA, USA
Savitha Srinivasan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fujii, A., Itou, K., Ishikawa, T. (2002). Speech-Driven Text Retrieval: Using Target IR Collections for Statistical Language Model Adaptation in Speech Recognition. In: Coden, A.R., Brown, E.W., Srinivasan, S. (eds) Information Retrieval Techniques for Speech Applications. IRTSA 2001. Lecture Notes in Computer Science, vol 2273. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45637-6_9

Download citation

DOI: https://doi.org/10.1007/3-540-45637-6_9
Published: 22 January 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43156-5
Online ISBN: 978-3-540-45637-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics