Skip to main content

Word Error Correction of Continuous Speech Recognition Using WEB Documents for Spoken Document Indexing

  • Conference paper
Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead (ICCPOL 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4285))

Included in the following conference series:

Abstract

This paper describes an error correction method of continuous speech recognition using WEB documents for spoken documents indexing. We performed an experiment of error correction for news speech automatically transcribed, where we focused on especially proper nouns. Two LVCSR systems were used to detect correctly and incorrectly recognized words. Keywords for the Internet search engine were selected among the correctly transcribed words, then correct candidates for the mis-recognized words were obtained in retrieved documents. A Dynamic Programming (DP) technique with a confusion matrix was utilized to compare the candidates with the mis-recognized words. In results of experiment of error correction, recognition rate of proper nouns achieved improvement of about 10% by using WEB documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Garofolo, J., Auzanne, C.G.P., Voorhees, E.: The TREC SDR Track: A Success Story. In: Proc. of the 8th Text Retrieval Conference, pp. 107–129 (2000)

    Google Scholar 

  2. Robinson, T., Abberley, D., Kirby, D., Renals, S.: Recognition, indexing and retrieval of British broadcast news with the THISL system. In: Proc. of EuroSpeech 1999, pp. 1267–1270 (1999)

    Google Scholar 

  3. Hauptmann, A.G., Wactlar, H.D.: Indexing and search of multimodal information. In: Proc. of ICASSP 1997, pp. 195–198 (1997)

    Google Scholar 

  4. Jourlin, P., Johnson, S.E., Jones, K.S., Woodland, P.C.: Spoken document representations for probabilistic retrieval. Speech Communication 32(1-2), 21–36 (2000)

    Article  Google Scholar 

  5. Wechsler, M., Munteanu, E., Schauble, P.: New Techniques for Open-vocabulary Spoken Document Retrieval. In: Proceedings of the SIGIR 1998, pp. 20–27 (1998)

    Google Scholar 

  6. Ng, K., Zue, V.W.: Subword-based Approaches for Spoken Document Retrieval. Speech Communication 32(3), 157–186 (2000)

    Article  Google Scholar 

  7. min Wang, H.: Experiments in Syllable-based Retrieval of Broadcast News Speech in Mandarin Chinese. Speech Communication 32(1-2), 49–60 (2000)

    Article  Google Scholar 

  8. Ng, C., Wilkinson, R., Zobel, J.: Experiments in Spoken Document Retrieval using Phoneme N-grams. Speech Communication 32(1-2), 61–77 (2000)

    Article  Google Scholar 

  9. Fiscus, J.G.: A Post-processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction (ROVER). In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 347–354 (1997)

    Google Scholar 

  10. Nishizaki, H., Nakagawa, S.: Japanese Spoken Document Retrieval Considering OOV Keywords Using LVCSR System with OOV Detection Processing. In: Proc. of Human Language Technology Conference 2002, pp. 144–151 (March 2002)

    Google Scholar 

  11. Kai, A., Hirose, Y., Nakagawa, S.: Dealing with out-of-vocabulary words and speech disfluencies in an n-gram based speech unde rstanding system. In: ICSLP 1998, pp. 2427–2430 (1998)

    Google Scholar 

  12. Kawahara, T., Kobayashi, T., Takeda, K., Minematsu, N., Itoh, K., Yamamoto, M., Yamamoto, A., Utsuro, T., Shikano, K.: Sharable software repository for japanese large vocabulary continuous speech recognition. In: ICSLP 1998, pp. 763–766 (1998)

    Google Scholar 

  13. Utsuro, T., Harada, T., Nishizaki, H., Nakagawa, S.: A Confidence Measure Based on Agreement among Multiple LVCSR Models – Correlation between Pair of Acoustic Models and Confidence. In: Proc. of ICSLP 2002, pp. 701–704 (September 2002)

    Google Scholar 

  14. Nishizaki, H., Nakagawa, S.: A System for Retrieving Broadcast News Speech Documents Using Voice Input Keywords and Similarity between Words. In: Proc. of ICSLP 2000, vol. 3, pp. 1073–1076 (October 2000)

    Google Scholar 

  15. Itoh, K., Yamamoto, M., Takeda, K., Takezawa, T., Matsuoka, T., Kobayashi, T., Shikano, K., Itahashi, S.: JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research. Journal of the Acoustical Society of Japan (E) 20(3), 199–206 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nishizaki, H., Sekiguchi, Y. (2006). Word Error Correction of Continuous Speech Recognition Using WEB Documents for Spoken Document Indexing. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_23

Download citation

  • DOI: https://doi.org/10.1007/11940098_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49667-0

  • Online ISBN: 978-3-540-49668-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics