Combining Prosodic and Lexical Classifiers for Two-Pass Punctuation Detection in a Russian ASR System

Khomitsevich, Olga; Chistikov, Pavel; Krivosheeva, Tatiana; Epimakhova, Natalia; Chernykh, Irina

doi:10.1007/978-3-319-23132-7_20

Olga Khomitsevich^7,8,
Pavel Chistikov⁷,
Tatiana Krivosheeva⁹,
Natalia Epimakhova⁹ &
…
Irina Chernykh^8,9

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9319))

Included in the following conference series:

International Conference on Speech and Computer

1653 Accesses
3 Citations

Abstract

We propose a system for automatic punctuation prediction in recognized speech using prosodic, word and grammatical features. An SVM classifier is trained using prosody, and a CRF classifier is trained on a large text dataset using word-based features. The probabilities are then fused to produce a joint decision on comma and period placement, with a second classification pass for question mark detection. Training two classifiers separately enables us to avoid data sparseness for the lexical classifier, and to increase the overall robustness of the system. This works well for Russian and could be applied to other inflected languages. The system was tested on different speech styles. On manual transcripts, we achieved an F-score of 50–71 % for periods, 46–66 % for commas, 19–47 % for question marks, and 77–87 % for “mark/no mark” classification. The results for recognizer output are 46–66 % for periods, 43–60 % for commas, 10–38 % for questions, and 64–80 % for “mark/no mark”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Stolcke, A., Shriberg, E.: Automatic linguistic segmentation of conversational speech. In: Proceedings of the Fourth International Conference on Spoken Language, ICSLP 96, vol. 2, pp. 1005–1008 (1996)
Google Scholar
Huang, J., Zweig, G.: Maximum entropy model for punctuation annotation from speech. In: Proceedings of ICSLP, pp. 917–920 (2002)
Google Scholar
Shriberg, E., Stolcke, A., Hakkani-Tr, D., Tr, G.: Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun. 32(1), 127–154 (2000)
Article Google Scholar
Chistikov, P., Khomitsevich, O.: Online automatic sentence boundary detection in a Russian ASR System. In: Proceedings of the 14th International Conference Speech and Computer - Specom 2011, pp. 112–117 (2011)
Google Scholar
Korenevsky, M., Ponomareva, I., Levin, K.: Online topic segmentation of russian broadcast news. In: Proceedings of the 14th International Conference on Speech and Computer - SPECOM 2011, pp. 373–378 (2011)
Google Scholar
Christensen, H., Gotoh, Y., Renals, S.: Punctuation annotation using statistical prosody models. In: ISCA Tutorial and Research Workshop (ITRW) on Prosody in Speech Recognition and Understanding (2001)
Google Scholar
Wang, X., Ng, H.T., Sim, K.C.: Dynamic conditional random fields for joint sentence boundary and punctuation prediction. In: INTERSPEECH 2012 - Proceedings of th 13th Annual Conference of the International Speech Communication Association, pp. 281–286 (2012)
Google Scholar
Hasan, M., Doddipatla, R., Hain, T.: Multipass sentence end detection of lecture speech. In: INTERSPEECH 2014 - Proceedings of the 15th Annual Conference of the International Speech Communication Association (2014)
Google Scholar
Kolar, J., Lamel, L.: Development and evaluation of automatic punctuation for french and english speech-to-text. In: INTERSPEECH 2012 - Proceedings of the 13th Annual Conference of the International Speech Communication Association (2012)
Google Scholar
Boakye, K., Favre, B., Hakkani-Tr, D.: Any questions? Automatic question detection in meetings. In: ASRU 2009 - IEEE Workshop on Automatic Speech Recognition & Under-standing, pp. 485–489 (2009)
Google Scholar
Margolis, A., Ostendorf, M.: Question detection in spoken conversations using textual conversations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 2, pp. 118–124 (2011)
Google Scholar
Kudo, T.: CRF++: Yet another CRF toolkit (2005). http://crfpp.sourceforge.net
Chernykh, G., Korenevsky, M., Levin, K., Ponomareva, I., Tomashenko, N.: State level control for acoustic model training. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS, vol. 8773, pp. 435–442. Springer, Heidelberg (2014)
Google Scholar
Tomashenko, N., Khokhlov, Y.: Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing. In: INTERSPEECH 2014 - Proceedings of the 15th Annual Conference of the International Speech Communication Association, pp. 2997–3001 (2014)
Google Scholar
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural net-works for large-vocabulary speech recognition. IEEE Trans. Audio, Speech and Lan-guage Proc. 20(1), 30–42 (2012)
Article Google Scholar
Schwarz, P.: Phoneme recognition based on long temporal context. Doctoral thesis, Brno, Brno University of Technology, Faculty of Information Technology (2008)
Google Scholar
Ueffing, N., Bisani, M., Vozila, P.: Improved models for automatic punctuation prediction for spoken and written text. In: INTERSPEECH 2013 - Proceedings of the 14th Annual Conference of the International Speech Communication Association (2013)
Google Scholar
Zhang, D., Wu, S., Yang, N., Li, M.: Punctuation prediction with transition-based parsing. ACL (1), 752–760 (2013)
Google Scholar

Download references

Acknowledgements

The work was financially supported by the Ministry of Education and Science of the Russian Federation, Contract 14.579.21.0008, ID RFMEFI57914X0008, and by the Government of the Russian Federation, Grant 074-U01.

Author information

Authors and Affiliations

Speech Technology Center, Saint-Petersburg, Russia
Olga Khomitsevich & Pavel Chistikov
ITMO University, Saint-Petersburg, Russia
Olga Khomitsevich & Irina Chernykh
STC-Innovations Ltd, Saint-Petersburg, Russia
Tatiana Krivosheeva, Natalia Epimakhova & Irina Chernykh

Authors

Olga Khomitsevich
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Chistikov
View author publications
You can also search for this author in PubMed Google Scholar
Tatiana Krivosheeva
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Epimakhova
View author publications
You can also search for this author in PubMed Google Scholar
Irina Chernykh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Irina Chernykh .

Editor information

Editors and Affiliations

SPIIRAS, Saint-Petersburg, Russia
Andrey Ronzhin
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Patras, Patras, Greece
Nikos Fakotakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khomitsevich, O., Chistikov, P., Krivosheeva, T., Epimakhova, N., Chernykh, I. (2015). Combining Prosodic and Lexical Classifiers for Two-Pass Punctuation Detection in a Russian ASR System. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-23132-7_20
Published: 04 September 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23131-0
Online ISBN: 978-3-319-23132-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics