Skip to main content

An Innovative BERT-Based Readability Model

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11937))

Abstract

Readability is referred to as the degree of difficulty to which an given text (article) can be understood by readers. When readers are reading a text with high readability, they will achieve better comprehension and learning retention. However, it has been a long-standing critical challenge to develop effective readability prediction models that can automatically and accurately assess the readability of a given text. When building readability prediction models for the Chinese language, word segmentation ambiguity is often a knotty problem that will inevitably happen in the pre-processing of texts. In view of this, we present in this paper a novel readability prediction approach for the Chinese language, building on a recently proposed, so-called Bidirectional Encoder Representation from Transformers (BERT) model that can capture both syntactic and semantic information of a text directly from its character-level representation. With the BERT-based readability prediction model that takes consecutive character-level representations as its input, we effectively assess the readability of a given text without the need of performing error-prone word segmentation. We empirically evaluate the performance of our BERT-based readability prediction model on a benchmark task, by comparing it with a strong baseline that utilizes a celebrated classification model (named fastText) in conjunction with word-level presentations. The results demonstrate that the BERT-based model with character-level representations can perform on par with the fastText-based model with word-level representations, yielding the accuracy of 78.45% on average. This finding also offers the promise of conducting readability assessment of a text in Chinese directly based on character-level representations.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Legault, J., Fang, S.Y., Lan, Y.J., Li, P.: Structural brain changes as a function of second language vocabulary training: effects of learning context. Brain Cogn. 134, 90–102 (2019)

    Article  Google Scholar 

  2. Chang, Y.T., Ku, L.C., Wu, C.L., Chen, H.C.: Event-related potential (ERP) evidence for the differential cognitive processing of semantic jokes and pun jokes. J. Cogn. Psychol. 31(2), 131–144 (2019)

    Article  Google Scholar 

  3. Lan, Y.J., Lyu, B.N., Chin, C.K.: Does a 3D immersive experience enhance Mandarin writing by CSL students?. Lang. Learn. Technol. 23(2), 125–144 (2019)

    Google Scholar 

  4. Chen, L., Perfetti, C.A., Fang, X., Chang, L.Y., Fraundorf, S.: Reading Pinyin activates sublexcial character orthography for skilled Chinese readers. Lang. Cogn. Neurosci. 34(6), 736–746 (2019)

    Article  Google Scholar 

  5. Wang, C.P., Lan, Y.J., Tseng, W.T., Lin, Y.T.R., Gupta, K.C.L.: On the effects of 3D virtual worlds in language learning–a meta-analysis. Comput. Assist. Lang. Learn., 1–25 (2019)

    Google Scholar 

  6. Chang, L.Y., Chen, Y.C., Perfetti, C.A.: GraphCom: a multidimensional measure of graphic complexity applied to 131 written languages. Behav. Res. Methods 50(1), 427–449 (2018)

    Article  Google Scholar 

  7. Tsai, Y.H., Hendryx, J.D.: Overseas Chinese–Heritage students learning to be Chinese language teachers in Taiwan: a journey of comparisons and affirmations. In: Educating Chinese–Heritage Students in the Global–Local Nexus, pp. 227–246. Routledge (2017)

    Google Scholar 

  8. Dale, E., Chall, J.S.: The concept of readability. Elementary Engl. 26(1), 19–26 (1949)

    Google Scholar 

  9. Klare, G.R.: Measurement of Readability. Iowa State University Press, Ames (1963)

    Google Scholar 

  10. Collins-Thompson, K.: Computational assessment of text readability: a survey of current and future research. ITL-Int. J. Appl. Linguist. 165(2), 97–135 (2014)

    Article  Google Scholar 

  11. Gyllstrom, K., Moens, M. F.: Wisdom of the ages: toward delivering the children’s web with the link-based AgeRank algorithm. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM 2010), pp. 159–168. ACM, New York (2010)

    Google Scholar 

  12. Fourney, A., Morris, M.R., Ali, A., Vonessen, L.: Assessing the readability of web search results for searchers with Dyslexia. In: SIGIR, Ann Arbor, Michigan, pp. 1069–1072 (2018)

    Google Scholar 

  13. Graesser, A.C., McNamara, D.S., Louwerse, M.M., Cai, Z.: Coh-Metrix: analysis of text on cohesion and language. Behav. Res. Methods 36(2), 193–202 (2004)

    Article  Google Scholar 

  14. McNamara, D.S., Louwerse, M.M., McCarthy, P.M., Graesser, A.C.: Coh-Metrix: capturing linguistic features of cohesion. Discourse Process. 47(4), 292–330 (2010)

    Article  Google Scholar 

  15. Sung, Y.T., Chen, J.L., Cha, J.H., Tseng, H.C., Chang, T.H., Chang, K.E.: Constructing and validating readability models: the method of integrating multilevel linguistic features with machine learning. Behav. Res. Methods 47(2), 340–354 (2015)

    Article  Google Scholar 

  16. Sung, Y.T., Lin, W.C., Dyson, S.B., Chang, K.E., Chen, Y.C.: Leveling L2 texts through readability: Combining multilevel linguistic features with the CEFR. Mod. Lang. J. 99(2), 371–391 (2015)

    Article  Google Scholar 

  17. Sung, Y.T., Chang, T.H., Lin, W.C., Hsieh, K.S., Chang, K.E.: CRIE: an automated analyzer for Chinese texts. Behav. Res. Methods 48(4), 1238–1251 (2016)

    Article  Google Scholar 

  18. De Clercq, O., Hoste, V.: All mixed up? Finding the optimal feature set for general readability prediction and its application to English and Dutch. Comput. Linguist. 42(3), 457–490 (2016)

    Article  MathSciNet  Google Scholar 

  19. Tanaka-Ishii, K., Tezuka, S., Terada, H.: Sorting texts by readability. Comput. Linguist. 36(2), 203–227 (2010)

    Article  Google Scholar 

  20. Tseng, H.C., Chen, B., Chang, T.H., Sung, Y.T.: Integrating LSA-based hierarchical conceptual space and machine learning methods for leveling the readability of domain-specific texts. Nat. Lang. Eng. 25(3), 331–361 (2019)

    Article  Google Scholar 

  21. Redish, J.: Readability formulas have even more limitations than Klare discusses. ACM J. Comput. Doc. 24(3), 132–137 (2000)

    Article  Google Scholar 

  22. Yan, X., Song, D., Li, X.: Concept-based document readability in domain-specific information retrieval, In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, ACM, New York, NY, pp. 540–549 (2006)

    Google Scholar 

  23. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  24. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)

    Article  Google Scholar 

  25. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification (2016)

    Google Scholar 

  26. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceeding of the International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1301.3781. Scottsdale, Arizona, pp. 1–12 (2013)

  27. Nan, I.: https://trans.nani.com.tw/NaniTeacher/. Accessed 31 July 2019

  28. Lin, H.: https://www.hle.com.tw/. Accessed 31 July 2019

  29. Hsuan, K.: Homepage. https://www.knsh.com.tw/. Accessed 31 July 2019

  30. Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  31. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), Haifa, Israel, pp. 807–814 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yao-Ting Sung .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tseng, HC., Chen, HC., Chang, KE., Sung, YT., Chen, B. (2019). An Innovative BERT-Based Readability Model. In: Rønningsbakk, L., Wu, TT., Sandnes, F., Huang, YM. (eds) Innovative Technologies and Learning. ICITL 2019. Lecture Notes in Computer Science(), vol 11937. Springer, Cham. https://doi.org/10.1007/978-3-030-35343-8_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-35343-8_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-35342-1

  • Online ISBN: 978-3-030-35343-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics