Abstract
This paper addresses speech summarization of highly spontaneous speech. Speech is converted into text using an ASR, then segmented into tokens. Human made and automatic, prosody based tokenization are compared. The obtained sentence-like units are analysed by a syntactic parser to help automatic sentence selection for the summary. The preprocessed sentences are ranked based on thematic terms and sentence position. The thematic term is expressed in two ways: TF-IDF and Latent Semantic Indexing. The sentence score is calculated as linear combination of the thematic term score and a sentence position score. To generate the summary, the top 10 candidates for the most informative/best summarizing sentences are selected. The system performance showed comparable results (recall: 0.62, precision: 0.79 and F-measure 0.68) with the prosody based tokenization approach. A subjective test is also carried out on a Likert scale.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Babarczy, A., Gábor, B., Hamp, G., Rung, A., Szakadát, I.: Hunpars: a rule-based sentence parser for hungarian. In: Proceedings of the 6th International Symposium on Computational Intelligence. Citeseer (2005)
Campr, M., Ježek, K.: Comparing semantic models for evaluating automatic document summarization. In: Král, P., Matoušek, V. (eds.) TSD 2015. LNCS, vol. 9302, pp. 252–260. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24033-6_29
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25(2–3), 259–284 (1998)
Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Proceedings of the ACL-04 Workshop, Text Summarization Branches Out, vol. 8 (2004)
Liu, Y., Xie, S.: Impact of automatic sentence segmentation on meeting summarization. In: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2008, pp. 5009–5012. IEEE (2008)
Maskey, S., Hirschberg, J.: Comparing lexical, acoustic/prosodic, structural and discourse features for speech summarization. In: INTERSPEECH, pp. 621–624 (2005)
Nenkova, A.: Summarization evaluation for text and speech: issues and approaches. In: INTERSPEECH. pp. 1527–1530 (2006)
Neuberger, T., Gyarmathy, D., Gráczi, T.E., Horváth, V., Gósy, M., Beke, A.: Development of a large spontaneous speech database of agglutinative hungarian language. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS, vol. 8655, pp. 424–431. Springer, Heidelberg (2014)
Sarkar, K.: Bengali text summarization by sentence extraction. In: Proceedings of International Conference on Business and Information Management ICBIM12, pp. 233–245 (2012)
Szarvas, M., Fegyó, T., Mihajlik, P., Tatai, P.: Automatic recognition of Hungarian: Theory and practice. Int. J. Speech Technol. 3(3), 237–251 (2000)
Szaszák, G., Beke, A.: Exploiting prosody for automatic syntactic phrase boundary detection in speech. J. Lang. Model. 0(1), 143–172 (2012)
Tündik, M.A., Szaszák, G.: Szövegalapú nyelvi elemzö kiértékelése gépi beszédfelismerö hibákkal terhelt kimenetén. In: Proceedings of the 12th Hungarian Conference on Computational Linguistics (MSZNY), pp. 111–120 (2016)
Zsibrita, J., Vincze, V., Farkas, R.: magyarlanc: A toolkit for morphological and dependency parsing of hungarian. In: Proceedings of RANLP, pp. 763–771 (2013)
Acknowledgment
The authors would like to thank the support of the Hungarian National Innovation Office (NKFI) under contract IDs PD-112598 and PD-108762.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Beke, A., Szaszák, G. (2016). Automatic Summarization of Highly Spontaneous Speech. In: Ronzhin, A., Potapova, R., Németh, G. (eds) Speech and Computer. SPECOM 2016. Lecture Notes in Computer Science(), vol 9811. Springer, Cham. https://doi.org/10.1007/978-3-319-43958-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-43958-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43957-0
Online ISBN: 978-3-319-43958-7
eBook Packages: Computer ScienceComputer Science (R0)