Abstract
Since the access to information is increasing every day, and we can quickly acquire knowledge from many sources such as news websites, blogs, and social networks, the capacity of processing all this information becomes increasingly difficult. So, tools are needed to automatically extract the most relevant sentences, aiming to reduce the volume of text into a shorter version. One alternative to achieve this process while preserving the core information content by using a process called Automatic Text Summarization. One relevant issue in this context is the presence of typos, synonyms, and other orthographic variations since some extractive techniques are not prepared to handle them. This work presents an evaluation of different similarity approaches to minimize these problems, selecting the most appropriate sentences to represent a document in an automatically generated summary.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cardoso, P.C., Maziero, E.G., Jorge, M.L., Seno, E.M., Di Felippo, A., Rino, L.H., Nunes, M.G., Pardo, T.A.: CSTnews-a discourse-annotated corpus for single and multi-document summarization of news texts in Brazilian Portuguese. In: Proceedings of the 3rd RST Brazilian Meeting, pp. 88–105 (2011)
Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 42, 457–479 (2004)
Hearst, M.A.: Texttiling: segmenting text into multi-paragraph subtopic passages. Comput. Linguist. 23(1), 33–64 (1997)
Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, Barcelona, Spain, vol. 8 (2004)
Murgante, B., Misra, S., Rocha, A., Torre, C., Rocha, J.G., Falcão, M.I., Taniar, D., Apduhan, B.O., Gervasi, O. (eds.): Computational Science and Its Applications - ICCSA 2014. LNCS, vol. 8583. Springer, Cham (2014). doi:10.1007/978-3-319-09156-3
Nenkova, A., Maskey, S., Liu, Y.: Automatic summarization. In: Proceedings Annual Meeting of the Association for Computational Linguistics, p. 3. Association for Computational Linguistics (2011)
Oliveira, H.M.: Seleção de entes complexos usando lógica difusa. Instituto de Informática da PUC-RS, dissertation (Masters in Computer Science) (1996)
Prado, H.A.D., de Oliveira, J.P.M., Ferneda, E., Wives, L.K., Silva, E.M., Loh, S.: Text mining in the context of business intelligence. In: Khosrow-Pour, M. (ed.) Encyclopedia of Information Science and Technology, 1st edn, pp. 2793–798. IGI Global, Hershey (2005)
Ribaldo, R., Cardoso, P.C.F., Pardo, T.A.S.: Exploring the subtopic-based relationship map strategy for multi-document summarization. Revista de Informática Teórica e Aplicada 23(1), 183–211 (2016)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975). http://doi.acm.org/10.1145/361219.361220
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)
Wilcoxon, F., Katti, S., Wilcox, R.A.: Critical values and probability levels for the wilcoxon rank sum test and the wilcoxon signed rank test. Sel. Tables Math. Stat. 1, 171–259 (1970)
Wives, L.K.: Utilizando conceitos como descritores de textos para o processo de identificação de conglomerados (clustering) de documentos. Ph.D. thesis, Universidade Federal do Rio Grande do Sul (2004)
Wives, L.K., Loh, S.: Recuperação de informaçães usando a expansão semântica e a lógica difusa. In: Congreso Internacional de Ingeniería Informática, pp. 201–211. CITA, Faculdad de Ingenieria (1998)
Wives, L.K., Loh, S., de Oliveira, J. P.M.: A comparative study of clustering versus classification over reuters collection. In: Proceedings of the 8th International Workshop on Pattern Recognition in Information Systems, pp. 231–236 (2009)
Wives, L.K., de Oliveira, J.P.M., Loh, S.: Conceptual clustering of textual documents and some insights for knowledge discovery. In: Prado, H.d., Ferneda, E. (eds.) Text Mining: Techniques and Applications, pp. 223–243. Information Science Reference, Hershey (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Ramos, A.M.S., Woloszyn, V., Wives, L.K. (2017). An Experimental Analysis of Feature Selection and Similarity Assessment for Textual Summarization. In: Solano, A., Ordoñez, H. (eds) Advances in Computing. CCC 2017. Communications in Computer and Information Science, vol 735. Springer, Cham. https://doi.org/10.1007/978-3-319-66562-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-66562-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66561-0
Online ISBN: 978-3-319-66562-7
eBook Packages: Computer ScienceComputer Science (R0)