Lightweight Spoken Utterance Classification with CFG, tf-idf and Dynamic Programming

Rayner, Manny; Tsourakis, Nikos; Gerlach, Johanna

doi:10.1007/978-3-319-68456-7_12

Manny Rayner¹⁶,
Nikos Tsourakis¹⁶ &
Johanna Gerlach¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10583))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

714 Accesses
3 Citations

Abstract

We describe a simple spoken utterance classification method suitable for data-sparse domains which can be approximately described by CFG grammars. The central idea is to perform robust matching of CFG rules against output from a large-vocabulary recogniser, using a dynamic programming method which optimises the tf-idf score of the matched grammar string. We present results of experiments carried out on a substantial CFG-based medical speech translator and the publicly available Spoken CALL Shared Task. Robust utterance classification using the tf-idf method strongly outperforms plain CFG-based recognition for both domains. When comparing with Naive Bayes classifiers trained on data sampled from the CFG grammars, the tf-idf/dynamic programming method is much better on the complex speech translation domain, but worse on the simple Spoken CALL Shared Task domain.

The work described here was funded by the Fondation privée des Hôpitaux universitaires de Genève and Unitec. We would like to thank Pierrette Bouillon for semantic annotation of the test data and many helpful suggestions, and Nuance Inc for generously making their software available to us for research purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://regulus.unige.ch/spokencallsharedtask, “Results” tab.

References

Aho, A.V., Ullman, J.D.: Properties of syntax directed translations. J. Comput. Syst. Sci. 3(3), 319–334 (1969)
Article MathSciNet MATH Google Scholar
Baur, C., Chua, C., Gerlach, J., Rayner, E., Russell, M., Strik, H., Wei, X.: Overview of the 2017 spoken CALL shared task. In: Proceedings of the Seventh SLaTE Workshop, Stockholm, Sweden (2017)
Google Scholar
Bouillon, P., Gerlach, J., Spechbach, H., Tsourakis, N., Halimi, S.: BabelDr vs Google Translate: a user study at Geneva University Hospitals (HUG). In: Proceedings of the 20th Conference of the European Association for Machine Translation (EAMT), Prague, Czech Republic (2017)
Google Scholar
Bouillon, P., Spechbach, H.: BabelDr: a web platform for rapid construction of phrasebook-style medical speech translation applications. In: Proceedings of EAMT 2016, Vilnius, Latvia (2016)
Google Scholar
Hakkani-Tür, D., Béchet, F., Riccardi, G., Tur, G.: Beyond ASR 1-best: using word confusion networks in spoken language understanding. Comput. Speech Lang. 20(4), 495–514 (2006)
Article Google Scholar
Holmes, G., Donkin, A., Witten, I.H.: Weka: A machine learning workbench. In: Proceedings of the Second Australian and New Zealand Conference on Intelligent Information Systems, pp. 357–361. IEEE (1994)
Google Scholar
Kuo, H.K.J., Lee, C.H., Zitouni, I., Fosler-Lussier, E., Ammicht, E.: Discriminative training for call classification and routing. Training 8, 9 (2002)
Google Scholar
Mesnil, G., He, X., Deng, L., Bengio, Y.: Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: Interspeech, pp. 3771–3775 (2013)
Google Scholar
Patil, S., Davies, P.: Use of Google Translate in medical communication: evaluation of accuracy. BMJ 349, g7392 (2014)
Article Google Scholar
Qian, M., Wei, X., Jancovic, P., Russell, M.: The University of Birmingham 2017 SLaTE CALL shared task systems. In: Proceedings of the Seventh SLaTE Workshop, Stockholm, Sweden (2017)
Google Scholar
Rayner, M., Bouillon, P., Ebling, S., Strasly, I., Tsourakis, N.: A framework for rapid development of limited-domain speech-to-sign phrasal translators. In: Proceedings of the workshop on Future and Emerging Trends in Language Technology, Sevilla, Spain (2015)
Google Scholar
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)
Article Google Scholar

Download references

Author information

Authors and Affiliations

TIM/FTI, University of Geneva, Geneva, Switzerland
Manny Rayner, Nikos Tsourakis & Johanna Gerlach

Authors

Manny Rayner
View author publications
You can also search for this author in PubMed Google Scholar
Nikos Tsourakis
View author publications
You can also search for this author in PubMed Google Scholar
Johanna Gerlach
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manny Rayner .

Editor information

Editors and Affiliations

University of Le Mans, Le Mans, France
Nathalie Camelin
University of Le Mans, Le Mans, France
Yannick Estève
Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rayner, M., Tsourakis, N., Gerlach, J. (2017). Lightweight Spoken Utterance Classification with CFG, tf-idf and Dynamic Programming. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2017. Lecture Notes in Computer Science(), vol 10583. Springer, Cham. https://doi.org/10.1007/978-3-319-68456-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-68456-7_12
Published: 27 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68455-0
Online ISBN: 978-3-319-68456-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics