Abstract
We propose a system for automatic punctuation prediction in recognized speech using prosodic, word and grammatical features. An SVM classifier is trained using prosody, and a CRF classifier is trained on a large text dataset using word-based features. The probabilities are then fused to produce a joint decision on comma and period placement, with a second classification pass for question mark detection. Training two classifiers separately enables us to avoid data sparseness for the lexical classifier, and to increase the overall robustness of the system. This works well for Russian and could be applied to other inflected languages. The system was tested on different speech styles. On manual transcripts, we achieved an F-score of 50–71 % for periods, 46–66 % for commas, 19–47 % for question marks, and 77–87 % for “mark/no mark” classification. The results for recognizer output are 46–66 % for periods, 43–60 % for commas, 10–38 % for questions, and 64–80 % for “mark/no mark”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Stolcke, A., Shriberg, E.: Automatic linguistic segmentation of conversational speech. In: Proceedings of the Fourth International Conference on Spoken Language, ICSLP 96, vol. 2, pp. 1005–1008 (1996)
Huang, J., Zweig, G.: Maximum entropy model for punctuation annotation from speech. In: Proceedings of ICSLP, pp. 917–920 (2002)
Shriberg, E., Stolcke, A., Hakkani-Tr, D., Tr, G.: Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun. 32(1), 127–154 (2000)
Chistikov, P., Khomitsevich, O.: Online automatic sentence boundary detection in a Russian ASR System. In: Proceedings of the 14th International Conference Speech and Computer - Specom 2011, pp. 112–117 (2011)
Korenevsky, M., Ponomareva, I., Levin, K.: Online topic segmentation of russian broadcast news. In: Proceedings of the 14th International Conference on Speech and Computer - SPECOM 2011, pp. 373–378 (2011)
Christensen, H., Gotoh, Y., Renals, S.: Punctuation annotation using statistical prosody models. In: ISCA Tutorial and Research Workshop (ITRW) on Prosody in Speech Recognition and Understanding (2001)
Wang, X., Ng, H.T., Sim, K.C.: Dynamic conditional random fields for joint sentence boundary and punctuation prediction. In: INTERSPEECH 2012 - Proceedings of th 13th Annual Conference of the International Speech Communication Association, pp. 281–286 (2012)
Hasan, M., Doddipatla, R., Hain, T.: Multipass sentence end detection of lecture speech. In: INTERSPEECH 2014 - Proceedings of the 15th Annual Conference of the International Speech Communication Association (2014)
Kolar, J., Lamel, L.: Development and evaluation of automatic punctuation for french and english speech-to-text. In: INTERSPEECH 2012 - Proceedings of the 13th Annual Conference of the International Speech Communication Association (2012)
Boakye, K., Favre, B., Hakkani-Tr, D.: Any questions? Automatic question detection in meetings. In: ASRU 2009 - IEEE Workshop on Automatic Speech Recognition & Under-standing, pp. 485–489 (2009)
Margolis, A., Ostendorf, M.: Question detection in spoken conversations using textual conversations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 2, pp. 118–124 (2011)
Kudo, T.: CRF++: Yet another CRF toolkit (2005). http://crfpp.sourceforge.net
Chernykh, G., Korenevsky, M., Levin, K., Ponomareva, I., Tomashenko, N.: State level control for acoustic model training. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS, vol. 8773, pp. 435–442. Springer, Heidelberg (2014)
Tomashenko, N., Khokhlov, Y.: Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing. In: INTERSPEECH 2014 - Proceedings of the 15th Annual Conference of the International Speech Communication Association, pp. 2997–3001 (2014)
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural net-works for large-vocabulary speech recognition. IEEE Trans. Audio, Speech and Lan-guage Proc. 20(1), 30–42 (2012)
Schwarz, P.: Phoneme recognition based on long temporal context. Doctoral thesis, Brno, Brno University of Technology, Faculty of Information Technology (2008)
Ueffing, N., Bisani, M., Vozila, P.: Improved models for automatic punctuation prediction for spoken and written text. In: INTERSPEECH 2013 - Proceedings of the 14th Annual Conference of the International Speech Communication Association (2013)
Zhang, D., Wu, S., Yang, N., Li, M.: Punctuation prediction with transition-based parsing. ACL (1), 752–760 (2013)
Acknowledgements
The work was financially supported by the Ministry of Education and Science of the Russian Federation, Contract 14.579.21.0008, ID RFMEFI57914X0008, and by the Government of the Russian Federation, Grant 074-U01.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Khomitsevich, O., Chistikov, P., Krivosheeva, T., Epimakhova, N., Chernykh, I. (2015). Combining Prosodic and Lexical Classifiers for Two-Pass Punctuation Detection in a Russian ASR System. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-23132-7_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23131-0
Online ISBN: 978-3-319-23132-7
eBook Packages: Computer ScienceComputer Science (R0)