Measuring Content Complexity of Technical Texts: Machine Learning Experiments

Kurdi, M. Zakaria

doi:10.1007/978-3-030-23207-8_28

Measuring Content Complexity of Technical Texts: Machine Learning Experiments

M. Zakaria Kurdi²⁰

Conference paper
First Online: 21 June 2019

2998 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11626))

Abstract

Classifying texts by their content complexity is important for applications like adaptive foreign language reading recommender systems and information retrieval. The goal of this paper is to propose a computational model of technical texts’ content complexity based on three criteria: knowledge depth, required knowledge, and content focus. To implement this model, 28 features of content and lexical complexity were extracted from 1702 texts of three types: general blogs, science journalistic texts and research papers. The machine learning experiments showed that content features alone can provide high classification accuracy.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://www.kaggle.com/rtatman/blog-authorship-corpus.
2.
CORE (COnnecting REpositories) is an aggregation of papers from open access journals https://www.jisc.ac.uk/core.
3.
Based on the shortest path that connects the senses and the maximum depth of the hierarchy in which the senses occur.
4.
http://websites.psychology.uwa.edu.au/school/MRCDatabase/uwa_mrc.htm.

References

Webb, N.: Alignment of science and mathematics standards and assessments in four states, Washington, D.C. CCSSO. Research Monograph No. 18: August 1999. https://www.researchgate.net/publication/239925507_Alignment_of_science_and_mathematics_standards_and_assessments_in_four_states
Webb, N.: 28 March, Depth-of-Knowledge Levels for four content areas, unpublished paper (2002)
Google Scholar
Wise, S.L., Kingsbury, G.G., Webb, N.L.: Evaluating content alignment in computerized adaptive testing. Educ. Measur. Issues Pract. 34(4), 41–48 (2015)
Article Google Scholar
Fahmi, I., Bouma, G.: Learning to Identify Definitions using Syntactic Features, Workshop of Learning Structured Information in Natural Language Applications, EACL, Italy (2006)
Google Scholar
Fiser, D., Pollak S., Vintar S.: Learning to mine definitions from Slovene structured and unstructured knowledge-rich resources. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation, LREC 2010, pp. 2932–2936 (2010)
Google Scholar
Pollak, S., Vavpetic, A., Kranjc, J., Lavrac N., Vinta, S.: NLP workflow for on-line definition extraction from English and Slovene Text Corpora. In: Proceedings of KONVENS, Vienna, 19 September (2012)
Google Scholar
Rose, S., Dave, E., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. In: Berry, M., Kogan, J. (eds.) Text Mining: Applications and Theory. Wiley, Hoboken (2010). ISBN 978-0-470-74982-1
Google Scholar
Guiraud, P.: Problèmes et Méthodes de la Statistique Linguistique. D. Reidel, Dordrecht (1960)
Google Scholar
Kurdi, M.Z.: Lexical and syntactic features selection for an adaptive reading recommendation system based on text complexity. In: Proceedings of the 2017 International Conference on Information System and Data Mining, ICISDM 2017, pp. 66–69 (2017)
Google Scholar
Francis, W.N., Kucera, H.: Frequency Analysis of English Usage: Lexicon and Grammar. Houghton Mifflin, Boston (1982)
Google Scholar
Nickerson, C.A., Cartwright, D.S.: Behavior Research Methods. Instrum. Comput. 16, 355 (1984). https://doi.org/10.3758/BF03202462
Article Google Scholar
Kurdi, M.Z.: Natural Language Processing and Computational Linguistics 2: Semantics, Discourse, and Applications, ISTE. ISTE-Wiley, London (2017)
Book Google Scholar

Download references

Author information

Authors and Affiliations

University of Lynchburg, Lynchburg, USA
M. Zakaria Kurdi

Authors

M. Zakaria Kurdi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Zakaria Kurdi .

Editor information

Editors and Affiliations

University of Sao Paulo, Sao Paulo, Brazil
Seiji Isotani
University of Malaga, Málaga, Spain
Eva Millán
Carnegie Mellon University, Pittsburgh, PA, USA
Amy Ogan
DePaul University, Chicago, IL, USA
Peter Hastings
Carnegie Mellon University, Pittsburgh, PA, USA
Bruce McLaren
University College London, London, UK
Rose Luckin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kurdi, M.Z. (2019). Measuring Content Complexity of Technical Texts: Machine Learning Experiments. In: Isotani, S., Millán, E., Ogan, A., Hastings, P., McLaren, B., Luckin, R. (eds) Artificial Intelligence in Education. AIED 2019. Lecture Notes in Computer Science(), vol 11626. Springer, Cham. https://doi.org/10.1007/978-3-030-23207-8_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-23207-8_28
Published: 21 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23206-1
Online ISBN: 978-3-030-23207-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics