Word Error Correction of Continuous Speech Recognition Using WEB Documents for Spoken Document Indexing

Nishizaki, Hiromitsu; Sekiguchi, Yoshihiro

doi:10.1007/11940098_23

Hiromitsu Nishizaki²² &
Yoshihiro Sekiguchi²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4285))

Included in the following conference series:

International Conference on Computer Processing of Oriental Languages

1020 Accesses
2 Citations

Abstract

This paper describes an error correction method of continuous speech recognition using WEB documents for spoken documents indexing. We performed an experiment of error correction for news speech automatically transcribed, where we focused on especially proper nouns. Two LVCSR systems were used to detect correctly and incorrectly recognized words. Keywords for the Internet search engine were selected among the correctly transcribed words, then correct candidates for the mis-recognized words were obtained in retrieved documents. A Dynamic Programming (DP) technique with a confusion matrix was utilized to compare the candidates with the mis-recognized words. In results of experiment of error correction, recognition rate of proper nouns achieved improvement of about 10% by using WEB documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Garofolo, J., Auzanne, C.G.P., Voorhees, E.: The TREC SDR Track: A Success Story. In: Proc. of the 8th Text Retrieval Conference, pp. 107–129 (2000)
Google Scholar
Robinson, T., Abberley, D., Kirby, D., Renals, S.: Recognition, indexing and retrieval of British broadcast news with the THISL system. In: Proc. of EuroSpeech 1999, pp. 1267–1270 (1999)
Google Scholar
Hauptmann, A.G., Wactlar, H.D.: Indexing and search of multimodal information. In: Proc. of ICASSP 1997, pp. 195–198 (1997)
Google Scholar
Jourlin, P., Johnson, S.E., Jones, K.S., Woodland, P.C.: Spoken document representations for probabilistic retrieval. Speech Communication 32(1-2), 21–36 (2000)
Article Google Scholar
Wechsler, M., Munteanu, E., Schauble, P.: New Techniques for Open-vocabulary Spoken Document Retrieval. In: Proceedings of the SIGIR 1998, pp. 20–27 (1998)
Google Scholar
Ng, K., Zue, V.W.: Subword-based Approaches for Spoken Document Retrieval. Speech Communication 32(3), 157–186 (2000)
Article Google Scholar
min Wang, H.: Experiments in Syllable-based Retrieval of Broadcast News Speech in Mandarin Chinese. Speech Communication 32(1-2), 49–60 (2000)
Article Google Scholar
Ng, C., Wilkinson, R., Zobel, J.: Experiments in Spoken Document Retrieval using Phoneme N-grams. Speech Communication 32(1-2), 61–77 (2000)
Article Google Scholar
Fiscus, J.G.: A Post-processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction (ROVER). In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 347–354 (1997)
Google Scholar
Nishizaki, H., Nakagawa, S.: Japanese Spoken Document Retrieval Considering OOV Keywords Using LVCSR System with OOV Detection Processing. In: Proc. of Human Language Technology Conference 2002, pp. 144–151 (March 2002)
Google Scholar
Kai, A., Hirose, Y., Nakagawa, S.: Dealing with out-of-vocabulary words and speech disfluencies in an n-gram based speech unde rstanding system. In: ICSLP 1998, pp. 2427–2430 (1998)
Google Scholar
Kawahara, T., Kobayashi, T., Takeda, K., Minematsu, N., Itoh, K., Yamamoto, M., Yamamoto, A., Utsuro, T., Shikano, K.: Sharable software repository for japanese large vocabulary continuous speech recognition. In: ICSLP 1998, pp. 763–766 (1998)
Google Scholar
Utsuro, T., Harada, T., Nishizaki, H., Nakagawa, S.: A Confidence Measure Based on Agreement among Multiple LVCSR Models – Correlation between Pair of Acoustic Models and Confidence. In: Proc. of ICSLP 2002, pp. 701–704 (September 2002)
Google Scholar
Nishizaki, H., Nakagawa, S.: A System for Retrieving Broadcast News Speech Documents Using Voice Input Keywords and Similarity between Words. In: Proc. of ICSLP 2000, vol. 3, pp. 1073–1076 (October 2000)
Google Scholar
Itoh, K., Yamamoto, M., Takeda, K., Takezawa, T., Matsuoka, T., Kobayashi, T., Shikano, K., Itahashi, S.: JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research. Journal of the Acoustical Society of Japan (E) 20(3), 199–206 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Interdisciplinary Graduate School of Medicine and Engineering, University of Yamanashi, Kofu, Yamanashi, 400-8511, Japan
Hiromitsu Nishizaki & Yoshihiro Sekiguchi

Authors

Hiromitsu Nishizaki
View author publications
You can also search for this author in PubMed Google Scholar
Yoshihiro Sekiguchi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Information Science, Nara Institute of Science and Technology, 630-0192, Takayama, Ikoma, Nara, Japan
Yuji Matsumoto
Dept of ECE, University of Illinois at Urbana Champaign, IL 61801, Urbana, USA
Richard W. Sproat
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Kam-Fai Wong
State Key Lab of Intelligent Tech. & Sys., Tsinghua University,
Min Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nishizaki, H., Sekiguchi, Y. (2006). Word Error Correction of Continuous Speech Recognition Using WEB Documents for Spoken Document Indexing. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_23

Download citation

DOI: https://doi.org/10.1007/11940098_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics