Advertisement

Further Progress in Meeting Recognition: The ICSI-SRI Spring 2005 Speech-to-Text Evaluation System

  • Andreas Stolcke
  • Xavier Anguera
  • Kofi Boakye
  • Özgür Çetin
  • František Grézl
  • Adam Janin
  • Arindam Mandal
  • Barbara Peskin
  • Chuck Wooters
  • Jing Zheng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3869)

Abstract

We describe the development of our speech recognition system for the National Institute of Standards and Technology (NIST) Spring 2005 Meeting Rich Transcription (RT-05S) evaluation, highlighting improvements made since last year [1]. The system is based on the SRI-ICSI-UW RT-04F conversational telephone speech (CTS) recognition system, with meeting-adapted models and various audio preprocessing steps. This year’s system features better delay-sum processing of distant microphone channels and energy-based crosstalk suppression for close-talking microphones. Acoustic modeling is improved by virtue of various enhancements to the background (CTS) models, including added training data, decision-tree based state tying, and the inclusion of discriminatively trained phone posterior features estimated by multilayer perceptrons. In particular, we make use of adaptation of both acoustic models and MLP features to the meeting domain. For distant microphone recognition we obtained considerable gains by combining and cross-adapting narrow-band (telephone) acoustic models with broadband (broadcast news) models. Language models (LMs) were improved with the inclusion of new meeting and web data. In spite of a lack of training data, we created effective LMs for the CHIL lecture domain. Results are reported on RT-04S and RT-05S meeting data. Measured on RT-04S conference data, we achieved an overall improvement of 17% relative in both MDM and IHM conditions compared to last year’s evaluation system. Results on lecture data are comparable to the best reported results for that task.

Keywords

Acoustic Model Word Error Rate Broadcast News Maximum Likelihood Linear Regression Conference Meeting 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Stolcke, A., Wooters, C., Mirghafori, N., Pirinen, T., Bulyko, I., Gelbart, D., Graciarena, M., Otterson, S., Peskin, B., Ostendorf, M.: Progress in meeting recognition: The ICSI-SRI-UW Spring 2004 evaluation system. In: Proceedings NIST ICASSP 2004 Meeting Recognition Workshop, Montreal, National Institute of Standards and Technology (2004)Google Scholar
  2. 2.
    Adami, A., Burget, L., Dupont, S., Garudadri, H., Grezl, F., Hermansky, H., Jain, P., Kajarekar, S., Morgan, N., Sivadas, S.: Qualcomm-ICSI-OGI features for ASR. In: Hansen, J.H.L., Pellom, B. (eds.) Proc. ICSLP, Denver, vol. 1, pp. 4–7 (2002)Google Scholar
  3. 3.
    Anguera, X., Wooters, C., Peskin, B., Aguiló, M.: Robust speaker segmentation for meetings: The ICSI-SRI Spring 2005 diarization system. In: Proceedings of the Rich Transcription 2005 Spring Meeting Recognition Evaluation, Edinburgh, National Institute of Standards and Technology, pp. 26–38 (2005)Google Scholar
  4. 4.
    Flanagan, J.L., Johnston, J.D., Zahn, R., Elko, G.W.: Computer-steered microphone arrays for sound transduction in large rooms. J. Acoust. Soc. Am. 78, 1508–1518 (1985)CrossRefGoogle Scholar
  5. 5.
    Vergyri, D., Stolcke, A., Gadde, V.R.R., Ferrer, L., Shriberg, E.: Prosodic knowledge sources for automatic speech recognition. In: Proc. ICASSP, Hong Kong, vol. 1, pp. 208–211 (2003)Google Scholar
  6. 6.
    Povey, D., Woodland, P.C.: Minimum phone error and I-smoothing for improved discriminative training. In: Proc. ICASSP, Orlando, FL, vol. 1, pp. 105–108 (2002)Google Scholar
  7. 7.
    Graciarena, M., Franco, H., Zheng, J., Vergyri, D., Stolcke, A.: Voicing feature integration in SRI’s Decipher LVCSR system. In: Proc. ICASSP, Montreal, vol. 1, pp. 921–924 (2004)Google Scholar
  8. 8.
    Kumar, N.: Investigation of Silicon-Auditory Models and Generalization of Linear Discriminant Analysis for Improved Speech Recognition. PhD thesis, John Hopkins University, Baltimore (1997)Google Scholar
  9. 9.
    Morgan, N., Chen, B.Y., Zhu, Q., Stolcke, A.: TRAPping conversational speech: Extending TRAP/Tandem approaches to conversational telephone speech recognition. In: Proc. ICASSP, Montreal, vol. 1, pp. 536–539 (2004)Google Scholar
  10. 10.
    Zhu, Q., Stolcke, A., Chen, B.Y., Morgan, N.: Using MLP features in SRI’s conversational speech recognition system. In: Proc. Interspeech, Lisbon, pp. 2141–2144 (2005)Google Scholar
  11. 11.
    Jin, H., Matsoukas, S., Schwartz, R., Kubala, F.: Fast robust inverse transform SAT and multistage adaptation. In: Proceedings DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, VA, pp. 105–109. Morgan Kaufmann, San Francisco (1998)Google Scholar
  12. 12.
    Metze, F., Fügen, C., Pan, Y., Waibel, A.: Automatically transcribing meetings using distant microphones. In: Proc. ICASSP, Philadelphia, vol. 1, pp. 989–902 (2005)Google Scholar
  13. 13.
    Povey, D., Gales, M.J.F., Kim, D.Y., Woodland, P.C.: MMI-MAP and MPE-MAP for acoustic model adaptation. In: Proc. EUROSPEECH, Geneva, pp. 1981–1984 (2003)Google Scholar
  14. 14.
    Bulyko, I., Ostendorf, M., Stolcke, A.: Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures. In: Hearst, M., Ostendorf, M. (eds.) Proc. HLT-NAACL, Edmonton, Alberta, Canada. Association for Computational Linguistics, vol. 2, pp. 7–9 (2003)Google Scholar
  15. 15.
    Lamel, L., Adda, G., Bilinski, E., Gauvain, J.L.: Transcribing lectures and seminars. In: Proc. Interspeech, Lisbon (2005)Google Scholar
  16. 16.
    Çetin, Ö., Stolcke, A.: Language modeling in the ICSI-SRI Spring 2005 meeting speech recognition evaluation system. Technical Report TR-05-06, International Computer Science Institute (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Andreas Stolcke
    • 1
    • 2
  • Xavier Anguera
    • 1
    • 3
  • Kofi Boakye
    • 1
  • Özgür Çetin
    • 1
  • František Grézl
    • 1
    • 4
  • Adam Janin
    • 1
  • Arindam Mandal
    • 5
  • Barbara Peskin
    • 1
  • Chuck Wooters
    • 1
  • Jing Zheng
    • 2
  1. 1.International Computer Science InstituteBerkeleyUSA
  2. 2.SRI InternationalMenlo ParkUSA
  3. 3.Technical University of CataloniaBarcelonaSpain
  4. 4.Brno University of TechnologyCzech Republic
  5. 5.University of WashingtonSeattleUSA

Personalised recommendations