Automatic Summarization of Highly Spontaneous Speech

Beke, András; Szaszák, György

doi:10.1007/978-3-319-43958-7_16

András Beke¹⁶ &
György Szaszák¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9811))

Included in the following conference series:

International Conference on Speech and Computer

2224 Accesses
2 Citations

Abstract

This paper addresses speech summarization of highly spontaneous speech. Speech is converted into text using an ASR, then segmented into tokens. Human made and automatic, prosody based tokenization are compared. The obtained sentence-like units are analysed by a syntactic parser to help automatic sentence selection for the summary. The preprocessed sentences are ranked based on thematic terms and sentence position. The thematic term is expressed in two ways: TF-IDF and Latent Semantic Indexing. The sentence score is calculated as linear combination of the thematic term score and a sentence position score. To generate the summary, the top 10 candidates for the most informative/best summarizing sentences are selected. The system performance showed comparable results (recall: 0.62, precision: 0.79 and F-measure 0.68) with the prosody based tokenization approach. A subjective test is also carried out on a Likert scale.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Babarczy, A., Gábor, B., Hamp, G., Rung, A., Szakadát, I.: Hunpars: a rule-based sentence parser for hungarian. In: Proceedings of the 6th International Symposium on Computational Intelligence. Citeseer (2005)
Google Scholar
Campr, M., Ježek, K.: Comparing semantic models for evaluating automatic document summarization. In: Král, P., Matoušek, V. (eds.) TSD 2015. LNCS, vol. 9302, pp. 252–260. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24033-6_29
Chapter Google Scholar
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25(2–3), 259–284 (1998)
Article Google Scholar
Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Proceedings of the ACL-04 Workshop, Text Summarization Branches Out, vol. 8 (2004)
Google Scholar
Liu, Y., Xie, S.: Impact of automatic sentence segmentation on meeting summarization. In: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2008, pp. 5009–5012. IEEE (2008)
Google Scholar
Maskey, S., Hirschberg, J.: Comparing lexical, acoustic/prosodic, structural and discourse features for speech summarization. In: INTERSPEECH, pp. 621–624 (2005)
Google Scholar
Nenkova, A.: Summarization evaluation for text and speech: issues and approaches. In: INTERSPEECH. pp. 1527–1530 (2006)
Google Scholar
Neuberger, T., Gyarmathy, D., Gráczi, T.E., Horváth, V., Gósy, M., Beke, A.: Development of a large spontaneous speech database of agglutinative hungarian language. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS, vol. 8655, pp. 424–431. Springer, Heidelberg (2014)
Google Scholar
Sarkar, K.: Bengali text summarization by sentence extraction. In: Proceedings of International Conference on Business and Information Management ICBIM12, pp. 233–245 (2012)
Google Scholar
Szarvas, M., Fegyó, T., Mihajlik, P., Tatai, P.: Automatic recognition of Hungarian: Theory and practice. Int. J. Speech Technol. 3(3), 237–251 (2000)
Article MATH Google Scholar
Szaszák, G., Beke, A.: Exploiting prosody for automatic syntactic phrase boundary detection in speech. J. Lang. Model. 0(1), 143–172 (2012)
Article Google Scholar
Tündik, M.A., Szaszák, G.: Szövegalapú nyelvi elemzö kiértékelése gépi beszédfelismerö hibákkal terhelt kimenetén. In: Proceedings of the 12th Hungarian Conference on Computational Linguistics (MSZNY), pp. 111–120 (2016)
Google Scholar
Zsibrita, J., Vincze, V., Farkas, R.: magyarlanc: A toolkit for morphological and dependency parsing of hungarian. In: Proceedings of RANLP, pp. 763–771 (2013)
Google Scholar

Download references

Acknowledgment

The authors would like to thank the support of the Hungarian National Innovation Office (NKFI) under contract IDs PD-112598 and PD-108762.

Author information

Authors and Affiliations

Research Institute for Linguistics, Hungarian Academy of Sciences, Budapest, Hungary
András Beke
Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary
György Szaszák

Authors

András Beke
View author publications
You can also search for this author in PubMed Google Scholar
György Szaszák
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to György Szaszák .

Editor information

Editors and Affiliations

SPIIRAS , Saint-Petersburg, Russia
Andrey Ronzhin
Moscow State Linguistic University , Moscow, Russia
Rodmonga Potapova
Budapest University of Technology and Economics, Budapest, Hungary
Géza Németh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Beke, A., Szaszák, G. (2016). Automatic Summarization of Highly Spontaneous Speech. In: Ronzhin, A., Potapova, R., Németh, G. (eds) Speech and Computer. SPECOM 2016. Lecture Notes in Computer Science(), vol 9811. Springer, Cham. https://doi.org/10.1007/978-3-319-43958-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-43958-7_16
Published: 13 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43957-0
Online ISBN: 978-3-319-43958-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics