Abstract
Extracting knowledge from an increasing information flow is one of the main challenges of modern information society. The paper considers the possibilities and means for intellectualization of this process concerning such an important information source as the academic texts. In this case the user is faced with the task of finding fragments relevant to the subject of interest, within the vast textual documents often written in a foreign language. We experimentally investigated the comparative effectiveness of TS algorithms for extended coherent academic texts. The procedure of instrumental effectiveness evaluation was substantiated. The influence of the most significant characteristics of the text, including original language, structural organization (levels of heading), subjects of research (technique, information technologies and medicine) was considered. We have shown that for the intellectualization of knowledge acquisition from academic texts it is necessary to present to the reader the results of the TS fulfilled by different algorithms, in a complex. A system of complex visualization of TS results is proposed, and an appropriate software solution is developed. The visualization system for extended coherent texts explicitly demonstrates the semantic structure of the text, which allows the user to detect and analyze not the whole text, but only fragments corresponding to his current information needs and thus getting a complete idea of the subject of interest.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
LNCS homepage. http://www.springer.com/lncs. Accessed 21 Nov 2016
Atkins, S., Clear, J., Ostler, N.: Corpus design criteria. Literary Linguist. Comput. 7(1), 1–16 (1992)
Avdeeva, N., Artemova, G., Boyarsky, K., Gusarova, N., Dobrenko, N., Kanevsky, E.: Subtopic segmentation of scientific texts: parameter optimisation. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2015. CCIS, vol. 518, pp. 3–15. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24543-0_1
Aysina, R.: Survey of visualization tools for topic models of text corpora. Mach. Learn. Data Anal. 1(11), 1584–1618 (2015)
Biber, D.: Representativeness in corpus design. Literary Linguist. Comput. 8(4), 243–257 (1993)
Boyarsky, K., Gusarova, N.F., Avdeeva, N., et al.: Specifics of applying topic segmentation algorithms to scientific texts In: Proceedings of XVII International Conference on DAMDID/RCDL (2015)
Burrough-Boenisch, J.: Culture and conventions: writing and reading Dutch scientific English. Netherlands Graduate School of Linguistics (2002)
Cardoso, P.C., Taboada, M., Pardo, T.A.: Subtopic annotation in a corpus of news texts: steps towards automatic subtopic segmentation. In: Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology (2013)
Choi, F.Y., Wiemer-Hastings, P., Moore, J.: Latent semantic analysis for text segmentation. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing (2001)
Halliday, M.A.K., Hasan, R.: Cohesion in English. Routledge, London (2014)
Hearst, M.A.: Multi-paragraph segmentation of expository text. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 9–16. Association for Computational Linguistics (1994)
Lloret, E.: Topic detection and segmentation in automatic text summarization (2009)
Martin, J.H., Jurafsky, D.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Pearson/Prentice Hall, Upper Saddle River (2009)
Moens, M.F., Angheluta, R., De Busser, R., Jeuniaux, P.: Summarizing texts at various levels of detail. In: Coupling Approaches, Coupling Media and Coupling Languages for Information Retrieval, pp. 597–609. Le centre de hautes etudes internationales d’informatique documentaire (2004)
Myers, G.: Lexical cohesion and specialized knowledge in science and popular science texts. Discourse Processes 14(1), 1–26 (1991)
Pak, I., Teh, P.L.: Text segmentation techniques: a critical review. In: Zelinka, I., Vasant, P., Duy, V.H., Dao, T.T. (eds.) Innovative Computing, Optimization and Its Applications. SCI, vol. 741, pp. 167–181. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-66984-7_10
Randaccio, M.: Language change in scientific discourse. JCOM 3(2), 1–15 (2004)
Riedl, M., Biemann, C.: Text segmentation with topic models. J. Lang. Technol. Comput. Linguist. 27(1), 47–69 (2012)
Ries, K.: Segmenting Conversations by Topic, Initiative, and Style. In: Coden, Anni R., Brown, Eric W., Srinivasan, S. (eds.) IRTSA 2001. LNCS, vol. 2273, pp. 51–66. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45637-6_5
Song, F., Darling, W.M., Duric, A., Kroon, F.W.: An iterative approach to text segmentation. In: Clough, P., et al. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 629–640. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20161-5_63
Van Dijk, T.A., Kintsch, W.: Strategies of discourse comprehension. Academic Press, New York (1983)
Vorontsov, K., Potapenko, A.: Additive regularization of topic models. Mach. Learn. 101(1–3), 303–323 (2015)
Yaari, Y.: Segmentation of expository texts by hierarchical agglomerative clustering. arXiv preprint cmp-lg/9709015 (1997)
Acknowledgement
This work was financially supported by the Government of Russian Federation, Grant 08-08.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Vatian, A. et al. (2019). Intellectualization of Knowledge Acquisition of Academic Texts as an Answer to Challenges of Modern Information Society. In: Chugunov, A., Misnikov, Y., Roshchin, E., Trutnev, D. (eds) Electronic Governance and Open Society: Challenges in Eurasia. EGOSE 2018. Communications in Computer and Information Science, vol 947. Springer, Cham. https://doi.org/10.1007/978-3-030-13283-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-13283-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13282-8
Online ISBN: 978-3-030-13283-5
eBook Packages: Computer ScienceComputer Science (R0)