Abstract
Classifying texts by their content complexity is important for applications like adaptive foreign language reading recommender systems and information retrieval. The goal of this paper is to propose a computational model of technical texts’ content complexity based on three criteria: knowledge depth, required knowledge, and content focus. To implement this model, 28 features of content and lexical complexity were extracted from 1702 texts of three types: general blogs, science journalistic texts and research papers. The machine learning experiments showed that content features alone can provide high classification accuracy.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
CORE (COnnecting REpositories) is an aggregation of papers from open access journals https://www.jisc.ac.uk/core.
- 3.
Based on the shortest path that connects the senses and the maximum depth of the hierarchy in which the senses occur.
- 4.
References
Webb, N.: Alignment of science and mathematics standards and assessments in four states, Washington, D.C. CCSSO. Research Monograph No. 18: August 1999. https://www.researchgate.net/publication/239925507_Alignment_of_science_and_mathematics_standards_and_assessments_in_four_states
Webb, N.: 28 March, Depth-of-Knowledge Levels for four content areas, unpublished paper (2002)
Wise, S.L., Kingsbury, G.G., Webb, N.L.: Evaluating content alignment in computerized adaptive testing. Educ. Measur. Issues Pract. 34(4), 41–48 (2015)
Fahmi, I., Bouma, G.: Learning to Identify Definitions using Syntactic Features, Workshop of Learning Structured Information in Natural Language Applications, EACL, Italy (2006)
Fiser, D., Pollak S., Vintar S.: Learning to mine definitions from Slovene structured and unstructured knowledge-rich resources. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation, LREC 2010, pp. 2932–2936 (2010)
Pollak, S., Vavpetic, A., Kranjc, J., Lavrac N., Vinta, S.: NLP workflow for on-line definition extraction from English and Slovene Text Corpora. In: Proceedings of KONVENS, Vienna, 19 September (2012)
Rose, S., Dave, E., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. In: Berry, M., Kogan, J. (eds.) Text Mining: Applications and Theory. Wiley, Hoboken (2010). ISBN 978-0-470-74982-1
Guiraud, P.: Problèmes et Méthodes de la Statistique Linguistique. D. Reidel, Dordrecht (1960)
Kurdi, M.Z.: Lexical and syntactic features selection for an adaptive reading recommendation system based on text complexity. In: Proceedings of the 2017 International Conference on Information System and Data Mining, ICISDM 2017, pp. 66–69 (2017)
Francis, W.N., Kucera, H.: Frequency Analysis of English Usage: Lexicon and Grammar. Houghton Mifflin, Boston (1982)
Nickerson, C.A., Cartwright, D.S.: Behavior Research Methods. Instrum. Comput. 16, 355 (1984). https://doi.org/10.3758/BF03202462
Kurdi, M.Z.: Natural Language Processing and Computational Linguistics 2: Semantics, Discourse, and Applications, ISTE. ISTE-Wiley, London (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Kurdi, M.Z. (2019). Measuring Content Complexity of Technical Texts: Machine Learning Experiments. In: Isotani, S., Millán, E., Ogan, A., Hastings, P., McLaren, B., Luckin, R. (eds) Artificial Intelligence in Education. AIED 2019. Lecture Notes in Computer Science(), vol 11626. Springer, Cham. https://doi.org/10.1007/978-3-030-23207-8_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-23207-8_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23206-1
Online ISBN: 978-3-030-23207-8
eBook Packages: Computer ScienceComputer Science (R0)