An Investigation of Single-Pass ASR System Combination for Spoken Language Understanding

Bougares, Fethi; Rouvier, Mickael; Camelin, Nathalie; Deléglise, Paul; Estève, Yannick

doi:10.1007/978-3-642-39593-2_5

Fethi Bougares²²,
Mickael Rouvier²²,
Nathalie Camelin²²,
Paul Deléglise²² &
…
Yannick Estève²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7978))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

2681 Accesses

Abstract

This paper studies the benefits provided by a single-pass Automatic Speech Recognition (ASR) exchange-based combination approach for spoken dialog system. Three famous open-source ASR systems are used to experiment this approach in the framework of Spoken Language Understanding (SLU). On the ASR side, single-pass ASR systems are used with an online acoustic model adaptation using the previous utterances said by a speaker. On the SLU side, a competitive CRF-based SLU system is applied on outputs of ASR system to obtain the semantic concepts. The evaluation is done on the French PORT-MEDIA test data in terms of both Word Error Rate (WER) and Concept Error Rate (CER). While the best single pass system used alone shows a CER of 29.8% for a WER of 22.8%, single-pass ASR exchange-based combination reaches a CER of 27.3% for a WER of 26%. This CER is only slightly higher than the one reached by a 5-passes ASR system which obtained a CER of 26.8% for a WER of 22.8% in better conditions, i.e. better acoustic model adaptation made on all the speech utterances said by a speaker, advanced feature extraction techniques and search graph rescoring using language model with higher order.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bougares, F., Estève, Y., Deléglise, P., Linarès, G.: Bag Of N-Gram driven decoding for LVCSR system harnessing. In: IEEE Automatic Speech Recognition and Understanding Workshop, Hawaii, USA (December 2011)
Google Scholar
Bougares, F., Rouvier, M., Estève, Y., Linarès, G.: Low latency combination of parallelized single-pass LVCSR systems. In: Interspeech, Portland, Oregon (USA), September 9-13 (2012)
Google Scholar
Deléglise, P., Estève, Y., Meignier, S., Merlin, T.: Improvements to the LIUM French ASR system based on CMU Sphinx: what helps to significantly reduce the word error rate?. In: Interspeech, Brighton, UK (September 2009)
Google Scholar
Deoras, A., Sarikaya, R., Tür, G., Hakkani-Tür, D.: Joint decoding for speech recognition and semantic tagging. In: Interspeech 2012, Portland, Oregon, USA (September 2012)
Google Scholar
Estève, Y., Bazillon, T., Antoine, J.-Y., Béchet, F., Farinas, J.: The EPAC corpus: manual and automatic annotations of conversational speech in french broadcast news. In: LREC 2010, Malta, May 17-23 (2010)
Google Scholar
Fiscus, J.: A post-processing system to yield reduced word error rates: recogniser output voting error reduction (ROVER). In: ASRU, pp. 347–354 (1997)
Google Scholar
Gales, M.J.F.: Maximum likelihood linear transformations for hmm-based speech recognition. Computer Speech and Language 12, 75–98 (1998)
Article Google Scholar
Gauvain, J.-l., Chin-hui, L.: Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Transactions on Speech and Audio Processing 2, 291–298 (1994)
Article Google Scholar
Hahn, S., Dinarelli, M., Raymond, C., Lefèvre, F., Lehnen, P., De Mori, R., Moschitti, A., Ney, H., Riccardi, G.: Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Transactions on Audio, Speech and Language Processing PP(99), 1 (2010)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning, Williamstown, MA, USA, pp. 282–289 (2001)
Google Scholar
Lecouteux, B., Linarès, G., Estève, Y., Mauclair, J.: System combination by driven decoding. In: ICASSP (2007)
Google Scholar
Lecouteux, B., Linarès, G., Estève, Y., Gravier, G.: Generalized driven decoding for speech recognition system combination. In: ICASSP, Las Vegas, Nevada, USA (2008)
Google Scholar
Lecouteux, B., Linarès, G., Nocera, P., Bonastre, J.-F.: Imperfect transcript driven speech recognition. In: ICSLP /INTERSPEECH, Pittsburgh, Pennsylvania, USA (2006)
Google Scholar
Lefèvre, F., Mostefa, D., Besacier, L., Estève, Y., Quignard, M., Camelin, N., Favre, B., Jabaian, B., Rojas-Barahona, L.M.: Leveraging study of robustness and portability of spoken language understanding systems across languages and domains: the Port-MEDIA corpora. In: LREC 2012, Istanbul, Turkey (2012)
Google Scholar
Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech & Language 9(2), 171–185 (1995)
Article Google Scholar
Liu, X., Gales, M.J.F., Woodland, P.C.: Language Model Cross Adaptation For LVCSR System Combination. In: Interspeech (2010)
Google Scholar
Ma, C., Kuo, H.-K.J., Soltau, H.: A comparative study on system combination schemes for LVCSR. In: ICASSP, pp. 4394–4397 (2010)
Google Scholar
Maynard, H., Rosset, S., Ayache, C., Kuhn, A., Mostefa, D.: Semantic annotation of the MEDIA corpus for spoken dialog. In: Proceedings of Eurospeech, Lisbon, pp. 3457–3460 (2005)
Google Scholar
Nguyen, L., Abdou, S., Afify, M., Makhoul, J., Matsoukas, S., Schwartz, R., Xiang, B., Lamel, L., Gauvain, J.-L., Adda, G., Schwenk, H.: The 2004 BBN/LIMSI 10xRT English Broadcast News Transcription System. In: 2004 Rich Transcriptions Workshop, Pallisades, NY (2004)
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, Hilton Waikoloa Village, Big Island, Hawaii, US (December 2011)
Google Scholar
Ravishankar, M., Singh, R., Raj, B., Stern, R.M.: The 1999 CMU 10x real time broadcast news transcription system. In: Proc. DARPA Workshop on Automatic Transcription of Broadcast News (2000)
Google Scholar
Rybach, D., Hahn, S., Lehnen, P., Nolden, D., Sundermeyer, M., Tüske, Z., Wiesler, S., Schlüter, R., Ney, H.: RASR - the RWTH Aachen University open source speech recognition toolkit. In: IEEE Automatic Speech Recognition and Understanding Workshop, Hawaii, USA (December 2011)
Google Scholar
Zhu, Q., Chen, B., Morgan, N., Stolcke, A.: On using mlp features in lvcsr. In: Proc. ICSLP, Jeju, Korea, pp. 921–924 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

LIUM - University of Le Mans, France
Fethi Bougares, Mickael Rouvier, Nathalie Camelin, Paul Deléglise & Yannick Estève

Authors

Fethi Bougares
View author publications
You can also search for this author in PubMed Google Scholar
Mickael Rouvier
View author publications
You can also search for this author in PubMed Google Scholar
Nathalie Camelin
View author publications
You can also search for this author in PubMed Google Scholar
Paul Deléglise
View author publications
You can also search for this author in PubMed Google Scholar
Yannick Estève
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research Group on Mathematical Linguistics, Universitat Rovira i Virgili, Avinguda Catalunya, 35, 43002, Tarragona, Spain
Adrian-Horia Dediu & Carlos Martín-Vide &
Research Institute for Information and Language Processing, Research Group in Computational Linguistics, University of Wolverhampton, WV1 1SB, Wolverhampton, UK
Ruslan Mitkov
Fakultät für Informatik, Institut für Wissens- und Sprachverarbeitung, Otto-von-Guericke-Universität Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Bianca Truthe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bougares, F., Rouvier, M., Camelin, N., Deléglise, P., Estève, Y. (2013). An Investigation of Single-Pass ASR System Combination for Spoken Language Understanding. In: Dediu, AH., Martín-Vide, C., Mitkov, R., Truthe, B. (eds) Statistical Language and Speech Processing. SLSP 2013. Lecture Notes in Computer Science(), vol 7978. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39593-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-39593-2_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39592-5
Online ISBN: 978-3-642-39593-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics