Calculating the Upper Bounds for Portuguese Automatic Text Summarization Using Genetic Algorithm

Rojas-Simón, Jonathan; Ledeneva, Yulia; García-Hernández, René Arnulfo

doi:10.1007/978-3-030-03928-8_36

Jonathan Rojas-Simón¹⁷,
Yulia Ledeneva¹⁷ &
René Arnulfo García-Hernández¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11238))

Included in the following conference series:

Ibero-American Conference on Artificial Intelligence

1274 Accesses
2 Citations

Abstract

Over the last years, Automatic Text Summarization (ATS) has been considered as one of the main tasks in Natural Language Processing (NLP) that generates summaries in several languages (e.g., English, Portuguese, Spanish, etc.). One of the most significant advances in ATS is developed for Portuguese reflected with the proposals of various state-of-art methods. It is essential to know the performance of different state-of-the-art methods with respect to the upper bounds (Topline), lower bounds (Baseline-random), and other heuristics (Baseline-first). In recent works, the significance and upper bounds for Single-Document Summarization (SDS) and Multi-Document Summarization (MDS) using corpora from Document Understanding Conferences (DUC) were calculated. In this paper, a calculus of upper bounds for SDS in Portuguese using Genetic Algorithms (GA) is performed. Moreover, we present a comparison of some state-of-the-art methods with respect to the upper bounds, lower bounds, and heuristics to determinate their level of significance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
DUC website: https://www-nlpir.nist.gov/projects/duc/, TAC website: https://tac.nist.gov/.
2.
http://www.nilc.icmc.usp.br/nilc/index.php.
3.
https://www.linguateca.pt/Repositorio/TeMario/.
4.
Each segmentation can be downloaded from https://gitlab.com/JohnRojas/Corpus-TeMario.
5.
http://conteudo.icmc.usp.br/pessoas/taspardo/SENTER_Por.zip.
6.
http://www.shvoong.com/summarizer/. (URL viewed May 7th, 2017).
7.
https://github.com/neopunisher/Open-Text-Summarizer/ (URL viewed February 10th, 2018).

References

Pardo, T.A.S., Rino, L.H.M., Nunes, M.G.V.: NeuralSumm: Uma Abordagem Conexionista para a Sumarização Automática de Textos. An. do IV Encontro Nac. Inteligência Artif., no. 1 (2003)
Google Scholar
Orrú, T., Rosa, J.L.G., de Andrade Netto, M.L.: SABio: an automatic portuguese text summarizer through artificial neural networks in a more biologically plausible model. In: Vieira, R., et al. (eds.) PROPOR 2006. LNCS (LNAI), vol. 3960, pp. 11–20. Springer, Heidelberg (2006). https://doi.org/10.1007/11751984_2
Chapter Google Scholar
Pardo, T.A.S., Rino, L.H.M.: DMSumm: review and assessment. In: Ranchhod, E., Mamede, N.J. (eds.) PorTAL 2002. LNCS (LNAI), vol. 2389, pp. 263–273. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45433-0_36
Chapter Google Scholar
Cardoso, P.C.F.: Exploração de métodos de sumarização automática multidocumento com base em conhecimento semântico- discursivo. USP (2014)
Google Scholar
Nunes, M.D.G.V., Aluisio, S.M., Pardo, T.A.S.: Um panorama do Núcleo Interinstitucional de Linguística Computacional às vésperas de sua maioridade. Linguamática 2(2), 13–27 (2010)
Google Scholar
Pardo, T.A.S., Rino, L.H.M., Nunes, M.D.G.V.: GistSumm: a summarization tool based on a new extractive method. In: Mamede, N.J., Trancoso, I., Baptista, J., das Graças Volpe Nunes, M. (eds.) PROPOR 2003. LNCS (LNAI), vol. 2721, pp. 210–218. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-45011-4_34
Chapter Google Scholar
Margarido, P.R., et al.: Automatic summarization for text simplification. In: Companion Proceedings of the XIV Brazilian Symposium on Multimedia and the Web, pp. 310–315 (2008)
Google Scholar
Pardo, T.A.S., Antiqueira, L., Nunes, M.D.G.V., Oliveira, O.N., Costa, L.D.F.: Using complex networks for language processing: the case of summary evaluation. In: International Conference on Communications, Circuits and Systems, pp. 2678–2682 (2006)
Google Scholar
Antiqueira, L.: Desenvolvimento de técnicas baseadas em redes complexas para sumarização extrativa de textos. USP – São Carlos (2007)
Google Scholar
Amancio, D.R., Nunes, M.G., Oliveira, O.N., Costa, L.D.F.: Extractive summarization using complex networks and syntactic dependency. Physica A: Stat. Mech. Appl. 391(4), 1855–1864 (2012)
Article Google Scholar
Mihalcea, R., Tarau, P.: A language independent algorithm for single and multiple document summarization. Department of Computer Science and Engineering, vol. 5, pp. 19–24 (2005)
Google Scholar
Leite, D., Rino, L.: A genetic fuzzy automatic text summarizer. In: CSBC 2009. Inf. UFRGS, Brazil, vol. 2007, pp. 779–788 (2009)
Google Scholar
Matías, G.A.: Generación Automática de Resúmenes Independientes del Lenguaje. Universidad Autónoma del Estado de México (2016)
Google Scholar
Oliveira, M.A.D., Guelpeli, M.V.: BLMSumm – Métodos de Busca Local e Metaheurísticas na Sumarização de Textos. In: Proceedings of ENIA - VIII Encontro Nac. Inteligência Artif., vol. 1, no. 1, pp. 287–298 (2011)
Google Scholar
Oliveira, M.A., Guelpeli, M.V.C.: The performance of BLMSumm: distinct languages with antagonistic domains and varied compressions. In: Information Science and Technology, ICIST 2012, pp. 609–614 (2012)
Google Scholar
Pardo, T., Rino, L.: TeMário: Um Corpus para Sumarização Automática de Textos. NILC - ICMC-USP, São Carlos (2003)
Google Scholar
Maziero, E.G., Volpe, G.: TeMário 2006 : Estendendo o Córpus TeMário (2007)
Google Scholar
Aleixo, P., Pardo, T.A.S.: CSTNews: um Córpus de Textos Jornalísticos Anotados segundo a Teoria Discursiva Multidocumento CST (cross-document structure theory), Structure, pp. 1–12 (2008)
Google Scholar
Rojas-Simón, J., Ledeneva, Y., García-Hernández, R.A.: Calculating the significance of automatic extractive text summarization using a genetic algorithm. J. of Intell. Fuzzy Syst. 35(1), 293–304 (2018)
Article Google Scholar
Rojas Simón, J., Ledeneva, Y., García Hernández, R.A.: Calculating the upper bounds for multi-document summarization using genetic algorithms. Comput. y Sist. 22(1), 11–26 (2018)
Google Scholar
Verma, R., Lee, D.: Extractive summarization: limits, compression, generalized model and heuristics, p. 19 (2017)
Google Scholar
Sidorov, G.: Non-linear construction of n-grams in computational linguistics, 1st edn. Sociedad Mexicana de Inteligencia Artificial, México (2013)
Google Scholar
Louis, A., Nenkova, A.: Automatically evaluating content selection in summarization without human models. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, no. August, pp. 306–314 (2009)
Google Scholar
Torres-Moreno, J.M., Saggion, H., Cunha, I.D., SanJuan, E., Velázquez-Morales, P.: Summary evaluation with and without references. Polibits Res. J. Comput. Sci. Comput. Eng. Appl. 42, 13–20 (2010)
Google Scholar
Ceylan, H., Mihalcea, R., Özertem, U., Lloret, E., Palomar, M.: Quantifying the limits and success of extractive summarization systems across domains. In: Human Language Technologies, no. June, pp. 903–911 (2010)
Google Scholar
Lin, C.-Y., Hovy, E.: The potential and limitations of automatic sentence extraction for summarization. In: Proceedings of the HLT-NAACL 2003 on Text Summarization Workshop, vol. 5, pp. 73–80 (2003)
Google Scholar
Hong, K., Marcus, M., Nenkova, A.: System combination for multi-document summarization, pp. 107–117, September 2015
Google Scholar
Wang, W.M., Li, Z., Wang, J.W., Zheng, Z.H.: How far we can go with extractive text summarization? Heuristic methods to obtain near upper bounds. Expert Syst. Appl. 90, 439–463 (2017)
Article Google Scholar
Ledeneva, Y., García-Hernández, R.A.: Generación automática de resúmenes Retos, propuestas y experimentos (2017)
Google Scholar
Lin, C.: Rouge: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), no. 1, pp. 25–26 (2004)
Google Scholar

Download references

Acknowledgements

Work done under partial support of Mexican Government CONACyT Thematic Network program (Language Technologies Thematic Network project 295022). We also thank UAEMex for their support.

Author information

Authors and Affiliations

Autonomous University of the State of Mexico, Instituto Literario no. 100, 50000, Toluca, State of Mexico, Mexico
Jonathan Rojas-Simón, Yulia Ledeneva & René Arnulfo García-Hernández

Authors

Jonathan Rojas-Simón
View author publications
You can also search for this author in PubMed Google Scholar
Yulia Ledeneva
View author publications
You can also search for this author in PubMed Google Scholar
René Arnulfo García-Hernández
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jonathan Rojas-Simón , Yulia Ledeneva or René Arnulfo García-Hernández .

Editor information

Editors and Affiliations

Universidad Nacional del Sur, Bahía Blanca, Buenos Aires, Argentina
Guillermo R. Simari
University of Madeira, Funchal, Portugal
Eduardo Fermé
Universidad Nacional de Piura, Castilla-Piura, Peru
Flabio Gutiérrez Segura
Universidad Nacional de Trujillo, Trujillo, Peru
José Antonio Rodríguez Melquiades

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rojas-Simón, J., Ledeneva, Y., García-Hernández, R.A. (2018). Calculating the Upper Bounds for Portuguese Automatic Text Summarization Using Genetic Algorithm. In: Simari, G., Fermé, E., Gutiérrez Segura, F., Rodríguez Melquiades, J. (eds) Advances in Artificial Intelligence - IBERAMIA 2018. IBERAMIA 2018. Lecture Notes in Computer Science(), vol 11238. Springer, Cham. https://doi.org/10.1007/978-3-030-03928-8_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-03928-8_36
Published: 09 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03927-1
Online ISBN: 978-3-030-03928-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics