Skip to main content

Simple or Not Simple? A Readability Question

  • Chapter
  • First Online:
Language Production, Cognition, and the Lexicon

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 48))

Abstract

Text Simplification (TS) has taken off as an important Natural Language Processing (NLP) application which promises to offer a significant societal impact in that it can be employed to the benefit of users with limited language comprehension skills such as children, foreigners who do not have a good command of a language, and readers struggling with a language disability. With the recent emergence of various TS systems, the question we are faced with is how to automatically evaluate their performance given that access to target users might be difficult. This chapter addresses one aspect of this issue by exploring whether existing readability formulae could be applied to assess the level of simplification offered by a TS system. It focuses on three readability indices for Spanish. The indices are first adapted in a way that allows them to be computed automatically and then applied to two corpora of original and manually simplified texts. The first corpus has been compiled as part of the Simplext project targeting people with Down syndrom, and the second corpus as part of the FIRST project, where the users are people with autism spectrum disorder. The experiments show that there is a significant correlation between each of the readability indices and eighteen linguistically motivated features which might be seen as reading obstacles for various target populations, thus indicating the possibility of using those indices as a measure of the degree of simplification achieved by TS systems. Various ways they can be used in TS are further illustrated by comparing their values when applied to four different corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Aphasia is a language disorder usually caused by a stroke or a head injury. The impairments in language processing experienced by people with aphasia are quite diverse, but many aphasic people are very likely to encounter problems in understanding written text at some point (Carroll et al. 1998).

  2. 2.

    Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterised by qualitative impairment in communication and stereotyped repetitive behaviour (American Psychiatric Association 2013). People with ASD have deficits in the comprehension of speech and writing (Štajner et al. 2012).

  3. 3.

    http://www.plainlanguage.gov/.

  4. 4.

    http://inclusion-europe.org/.

  5. 5.

    http://www.weeklyreader.com/.

  6. 6.

    http://literacynet.org/cnnsf/.

  7. 7.

    Available at: http://www.first-asd.eu/?q=system/files/FIRST_D7.2_20130228_annex.pdf.

  8. 8.

    http://www.first-asd.eu/.

  9. 9.

    www.servimedia.es.

  10. 10.

    www.simplext.es.

  11. 11.

    www.noticiasfacil.es.

  12. 12.

    http://corpus.rae.es/lfrecuencias.html.

  13. 13.

    In this study, both lists (from the Reference Corpus of Contemporary Spanish (CREA) and the Spaulding's list of 1500 most common Spanish words) were lemmatised using Connexor's parser in order to retrieve the frequency of the lemma and not a word form (action carried out manually in the two cited works), and to enable a fully automatic computation of both indices.

  14. 14.

    www.connexor.eu.

  15. 15.

    http://openthes-es.berlios.de.

References

  • Alu´ısio, S. M., Specia, L., Pardo, T. A. S., Maziero, E. G., Caseli, H. M., & Fortes, R. P. M. (2008). A corpus analysis of simple account texts and the proposal of simplification strategies: first steps towards text simplification systems. In Proceedings of the 26th annual ACM international conference on Design of communication, SIGDOC ’08, (pp. 15–22). New York, NY, USA: ACM.

    Google Scholar 

  • Anula, A. (2007). Tipos de textos, complejidad lingüística y facilicitaci´on lectora. In Actas del Sexto Congreso de Hispanistas de Asia, (pp. 45–61).

    Google Scholar 

  • Aranzabe, M. J., D´ıaz De Ilarraza, A., & Gonz´alez, I. (2012). First approach to automatic text simplification in basque. In Proceedings of the first Natural Language Processing for Improving Textual Accessibility Workshop (NLP4ITA).

    Google Scholar 

  • American Psychiatric Association. (2013). Diagnostic and Statistical Manual of Mental Disorders (5th ed.). Arlington, VA: American Psychiatric Publishing.

    Google Scholar 

  • Balota, D., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D. H., & Yap, M. J. (2004). Visual word recognition of single-syllabe words. Journal of Experimental Psychology: General, 133, 283–316.

    Article  Google Scholar 

  • Barlacchi, G., & Tonelli, S. (2013). ERNESTA: A sentence simplification tool for childrens stories in italian. In Computational Linguistics and Intelligent Text Processing.

    Google Scholar 

  • Barzilay, R., & Elhadad, N. (2003). Sentence alignment for monolingual comparable corpora. In Proceedings of the 2003 conference on Empirical methods in natural language processing, EMNLP ’03 (pp. 25–32). Stroudsburg, PA, USA: Association for Computational Linguistics.

    Google Scholar 

  • Brouwer, R. H. M. (1963). Onderzoek naar de leesmoeilijkheden van nederlands proza. Pedagogische studiën, 40, 454–464.

    Google Scholar 

  • Carroll, J., Minnen, G., Canning, Y., Devlin, S., & Tait, J. (1998). Practical simplification of english newspaper text to assist aphasic readers. In Proceedings of AAAI-98 Workshop on Integrating Artificial Intelligence and Assistive Technology (pp. 7–10).

    Google Scholar 

  • Chomsky, N. (1986). Knowledge of language: Its nature, origin, and use. Santa Barbara, CA: Greenwood Publishing Group.

    Google Scholar 

  • Coleman, M., & Liau, T. L. (1975). A computer readability formula designed for machine scoring. Journal of Applied Psychology, 60(2), 283–284.

    Article  Google Scholar 

  • Coster, W., & Kauchak, D. (2011). Learning to simplify sentences using Wikipedia. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (pp. 1–9).

    Google Scholar 

  • Cuetos, F., Domínguez, A., & de Vega, M. (1997). El efecto de la polisemia: ahora lo ves otra vez. Cognitiva, 9(2), 175–194.

    Google Scholar 

  • Dale, E., & Chall, J. S. (1948). A formula for predicting readability. Educational research bulletin, 27, 11–20.

    Google Scholar 

  • Devlin, S. (1999). Simplifying natural language text for aphasic readers. Ph.D. thesis, University of Sunderland, UK.

    Google Scholar 

  • Douma, W. H. (1960). De leesbaarheid van landbouwbladen: een onderzoek naar en een toepassing van leesbaarheidsformules. Landbouwhogeschool Wageningen, Afdeling Sociologie en Sociografie, Bulletin nr. 17.

    Google Scholar 

  • Drndarević, B., Štajner, S., Bott, S., Bautista, S. & Saggion, H. (2013). Automatic text simplication in spanish: A comparative evaluation of complementing components. In Proceedings of the 12th International Conference on Intelligent Text Processing and Computational Linguistics. Lecture Notes in Computer Science. Samos, Greece, 24–30 March, (pp. 488–500).

    Google Scholar 

  • DuBay, W. H. (2004). The principles of readability. California: Impact Information.

    Google Scholar 

  • Feng, L. (2009). Automatic readability assessment for people with intellectual disabilities. In SIGACCESS Accessibility and Computers. number 93, (pp. 84–91). New York, NY, USA: ACM.

    Google Scholar 

  • Feng, L., Elhadad, N., & Huenerfauth, M. (2009). Cognitively motivated features for readability assessment. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, EACL ’09, (pp. 229–237), Stroudsburg, PA, USA: Association for Computational Linguistics.

    Google Scholar 

  • Flesch, R. (1948). A new readability yardstick. The journal of applied psychology, 32(3), 221–233.

    Article  Google Scholar 

  • Freyhoff, G., Hess, G., Kerr, L., Tronbacke, B., & Van Der Veken, K. (1998). Make it simple, European guidelines for the production of easy-to read information for people with learning disability. Brussels: ILSMH European Association.

    Google Scholar 

  • Glanzer, M., & Bowles, N. (1976). Analysis of the word frequency effect in recognition memory. Journal of Experimental Psychology: Human Learning and Memory, 2, 21–31.

    Google Scholar 

  • Glavaš, G., & Štajner, S. (2013). Event-centered simplication of news stories. In Proceedings of the Student Workshop held in conjunction with RANLP 2013, Hissar, Bulgaria (pp. 71–78).

    Google Scholar 

  • Gunning, R. (1952). The technique of clear writing. New York: McGraw-Hill.

    Google Scholar 

  • Inui, K., Fujita, A., Takahashi, T., Iida, R., & Iwakura, T. (2003). Text simplification for reading assistance: a project note. In Proceedings of the second international workshop on Paraphrasing—Volume 16, PARAPHRASE ’03, (pp. 9–16), Stroudsburg, PA, USA: Association for Computational Linguistics.

    Google Scholar 

  • Jastrzembski, J. (1981). Multiple meaning, number or related meanings, frequency of occurrence and the lexicon. Cognitive Psychology, 13, 278–305.

    Article  Google Scholar 

  • Kincaid, J. P., Fishburne, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas for navy enlisted personnel. Research Branch Report 8–75.

    Google Scholar 

  • Martos, J., Freire, S., González, A., Gil, D., & Sebastian, M. (2012). D2.1: Functional requirements specifications and user preference survey. Technical report, FIRST technical report.

    Google Scholar 

  • McLaughlin, G. H. (1969). SMOG grading—a new readability formula. Journal of Reading, 22, 639–646.

    Google Scholar 

  • Norbury, C. F. (2005). Barking up the wrong tree? lexical ambiguity resolution in children with language impairments and autistic spectrum disorders. Journal of Experimental Child Psychology, 90, 142–171.

    Article  Google Scholar 

  • Orasan, C., Evans, R., & Dornescu, I. (2013). Towards multilingual Europe 2020: A romanian perspective, chapter text simplification for people with autistic spectrum disorders (pp. 287–312). Bucharest: Romanian Academy Publishing House.

    Google Scholar 

  • Petersen, S., & Ostendorf, M. (2009). A machine learning approach to reading level assessment. Computer Speech and Language, 23(1), 89–106.

    Article  Google Scholar 

  • Petersen, S. E., & Ostendorf, M. (2007). Text simplification for language learners: A corpus analysis. In Proceedings of Workshop on Speech and Language Technology for Education(SLaTE), 69–72.

    Google Scholar 

  • PlainLanguage. (2011). Federal plain language guidelines.

    Google Scholar 

  • Rello, L. (2012). Dyswebxia: a model to improve accessibility of the textual web for dyslexic users. In SIGACCESS Accessibility and Computers, number 102, (pp. 41–44) New York, NY, USA: ACM.

    Google Scholar 

  • Rello, L., Baeza-Yates, R., Bott, S., & Saggion, H. (2013b). Simplify or help? Text simplification strategies for people with dyslexia. In Proceedings of W4A conference, Article no. 15

    Google Scholar 

  • Rello, L., Baeza-Yates, R., Dempere, L., & Saggion, H. (2013a). Frequent words improve readability and short words improve understandability for people with dyslexia. In Proceedings of the INTERACT 2013: 14th IFIP TC13 Conference on Human-Computer Interaction. Cape Town, South Africa, pp. 203–219.

    Google Scholar 

  • Ruiter, M. B., Rietveld, T. C. M., Cucchiarini C., Krahmer E. J., & Strik, H. (2010). Human language technology and communicative disabilities: requirements and possibilities for the future. In Proceedings of the the seventh international conference on Language Resources and Evaluation (LREC).

    Google Scholar 

  • Rybing, J., Smithr, C., & Silvervarg, A. (2010). Towards a rule based system for automatic simplification of texts. In The Third Swedish Language Technology Conference.

    Google Scholar 

  • Saggion, H., Gómez Martínez, E., Etayo, E., Anula, A., & Bourg, L. (2011). Text simplification in simplext: Making text more accessible. Revista de la Sociedad Española para el Procesamiento del Lenguaje Natural.

    Google Scholar 

  • Schwarm, S. E, & Ostendorf, M. (2005). Reading level assessment using support vector machines and statistical language models. In Proceedings of the 43rd annual meeting of the Association of Computational Linguistics (ACL), pp. 523–530.

    Google Scholar 

  • Siddharthan, A. (2006). Syntactic simplification and text cohesion. Research on Language and Computation, 4(1), 77–109.

    Article  Google Scholar 

  • Smith, E. A., & Senter R. J. (1967) Automated readability index. Technical report, Aerospace Medical Research Laboratories, Wright-Patterson Air Force Base, Ohio.

    Google Scholar 

  • Spaulding, S. (1956). A Spanish readability formula. Modern Language Journal 40, 433–441.

    Article  Google Scholar 

  • UN. (2006) Convention on the rigths of persons with disabilities.

    Google Scholar 

  • van Oosten, P., Tanghe, D., & Hoste, V. (2010). Towards an improved methodology for automated readability prediction. In Proceedings of the seventh international conference on language resources and evaluation (LREC10). Valletta, Malta: European Language Resources Association (ELRA), pp. 775–782.

    Google Scholar 

  • Vossen, P. (Ed.). (1998) EuroWordNet: A multilingual database with lexical semantic networks. Dordrecht: Kluwer Academic Publishers.

    Google Scholar 

  • Štajner, S., Drndarević, B., & Saggion, H. (2013). Corpus-based sentence deletion and split decisions for spanish text simplification. Computación y Systemas, 17(2), 251–262.

    Google Scholar 

  • Štajner, S., Evans, R., Orasan, C., & Mitkov, R. (2012). What can readability measures really tell us about text complexity? In Proceedings of the LREC’12 Workshop: Natural Language Processing for Improving Textual Accessibility (NLP4ITA), Istanbul, Turkey.

    Google Scholar 

  • Štajner, S., & Saggion, H. (2013b). Adapting text simplification decisions to different text genres and target users. Procesamiento del Lenguaje Natural, 51, 135–142.

    Google Scholar 

  • Štajner, S., & Saggion, H. (2013a). Readability indices for automatic evaluation of text simplification systems: A feasability study for spanish. In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP 2013), Nagoya, Japan, October 14–18, 2013. pp. 374–382.

    Google Scholar 

  • Woodsend, K., & Lapata, M. (2011). Learning to simplify sentences with Quasi-synchronous grammar and integer programming. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP).

    Google Scholar 

  • Woodsend, K. & Lapata, M. (2011). WikiSimple: automatic simplification of Wikipedia articles. In Proceedings of the 25th AAI Coference on Artificial Intelligence, pp. 374–382.

    Google Scholar 

  • Wubben, S., van den Bosch, A., & Krahmer, E. (2012). Sentence simplification by monolingual machine translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers—Volume 1, ACL ’12, (pp. 1015–1024) Stroudsburg, PA, USA: Association for Computational Linguistics.

    Google Scholar 

  • Zhu, Z., Berndard, D., & Gurevych, I. (2010). A monolingual tree-based translation model for sentence simplification. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), (pp. 1353–1361).

    Google Scholar 

Download references

Acknowledgements

This work has been partially supported by TRADICOR (Ref: PIE 13-054), EXPERT (Ref: 317471-FP7-PEOPLE-2012-ITN), LATEST (Ref: 327197-FP7-PEOPLE-2012-IEF) and FIRST (Ref: 287607-FP7-ICT-2011-7). The authors would also like to express their gratitude to Horacio Saggion for his very helpful comments and input.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanja Štajner .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Štajner, S., Mitkov, R., Corpas Pastor, G. (2015). Simple or Not Simple? A Readability Question. In: Gala, N., Rapp, R., Bel-Enguix, G. (eds) Language Production, Cognition, and the Lexicon. Text, Speech and Language Technology, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-319-08043-7_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08043-7_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08042-0

  • Online ISBN: 978-3-319-08043-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics