An Innovative BERT-Based Readability Model

Tseng, Hou-Chiang; Chen, Hsueh-Chih; Chang, Kuo-En; Sung, Yao-Ting; Chen, Berlin

doi:10.1007/978-3-030-35343-8_32

Hou-Chiang Tseng^12,15,
Hsueh-Chih Chen¹³,
Kuo-En Chang¹⁴,
Yao-Ting Sung¹³ &
…
Berlin Chen¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11937))

Included in the following conference series:

International Conference on Innovative Technologies and Learning

3167 Accesses
6 Citations

Abstract

Readability is referred to as the degree of difficulty to which an given text (article) can be understood by readers. When readers are reading a text with high readability, they will achieve better comprehension and learning retention. However, it has been a long-standing critical challenge to develop effective readability prediction models that can automatically and accurately assess the readability of a given text. When building readability prediction models for the Chinese language, word segmentation ambiguity is often a knotty problem that will inevitably happen in the pre-processing of texts. In view of this, we present in this paper a novel readability prediction approach for the Chinese language, building on a recently proposed, so-called Bidirectional Encoder Representation from Transformers (BERT) model that can capture both syntactic and semantic information of a text directly from its character-level representation. With the BERT-based readability prediction model that takes consecutive character-level representations as its input, we effectively assess the readability of a given text without the need of performing error-prone word segmentation. We empirically evaluate the performance of our BERT-based readability prediction model on a benchmark task, by comparing it with a strong baseline that utilizes a celebrated classification model (named fastText) in conjunction with word-level presentations. The results demonstrate that the BERT-based model with character-level representations can perform on par with the fastText-based model with word-level representations, yielding the accuracy of 78.45% on average. This finding also offers the promise of conducting readability assessment of a text in Chinese directly based on character-level representations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Legault, J., Fang, S.Y., Lan, Y.J., Li, P.: Structural brain changes as a function of second language vocabulary training: effects of learning context. Brain Cogn. 134, 90–102 (2019)
Article Google Scholar
Chang, Y.T., Ku, L.C., Wu, C.L., Chen, H.C.: Event-related potential (ERP) evidence for the differential cognitive processing of semantic jokes and pun jokes. J. Cogn. Psychol. 31(2), 131–144 (2019)
Article Google Scholar
Lan, Y.J., Lyu, B.N., Chin, C.K.: Does a 3D immersive experience enhance Mandarin writing by CSL students?. Lang. Learn. Technol. 23(2), 125–144 (2019)
Google Scholar
Chen, L., Perfetti, C.A., Fang, X., Chang, L.Y., Fraundorf, S.: Reading Pinyin activates sublexcial character orthography for skilled Chinese readers. Lang. Cogn. Neurosci. 34(6), 736–746 (2019)
Article Google Scholar
Wang, C.P., Lan, Y.J., Tseng, W.T., Lin, Y.T.R., Gupta, K.C.L.: On the effects of 3D virtual worlds in language learning–a meta-analysis. Comput. Assist. Lang. Learn., 1–25 (2019)
Google Scholar
Chang, L.Y., Chen, Y.C., Perfetti, C.A.: GraphCom: a multidimensional measure of graphic complexity applied to 131 written languages. Behav. Res. Methods 50(1), 427–449 (2018)
Article Google Scholar
Tsai, Y.H., Hendryx, J.D.: Overseas Chinese–Heritage students learning to be Chinese language teachers in Taiwan: a journey of comparisons and affirmations. In: Educating Chinese–Heritage Students in the Global–Local Nexus, pp. 227–246. Routledge (2017)
Google Scholar
Dale, E., Chall, J.S.: The concept of readability. Elementary Engl. 26(1), 19–26 (1949)
Google Scholar
Klare, G.R.: Measurement of Readability. Iowa State University Press, Ames (1963)
Google Scholar
Collins-Thompson, K.: Computational assessment of text readability: a survey of current and future research. ITL-Int. J. Appl. Linguist. 165(2), 97–135 (2014)
Article Google Scholar
Gyllstrom, K., Moens, M. F.: Wisdom of the ages: toward delivering the children’s web with the link-based AgeRank algorithm. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM 2010), pp. 159–168. ACM, New York (2010)
Google Scholar
Fourney, A., Morris, M.R., Ali, A., Vonessen, L.: Assessing the readability of web search results for searchers with Dyslexia. In: SIGIR, Ann Arbor, Michigan, pp. 1069–1072 (2018)
Google Scholar
Graesser, A.C., McNamara, D.S., Louwerse, M.M., Cai, Z.: Coh-Metrix: analysis of text on cohesion and language. Behav. Res. Methods 36(2), 193–202 (2004)
Article Google Scholar
McNamara, D.S., Louwerse, M.M., McCarthy, P.M., Graesser, A.C.: Coh-Metrix: capturing linguistic features of cohesion. Discourse Process. 47(4), 292–330 (2010)
Article Google Scholar
Sung, Y.T., Chen, J.L., Cha, J.H., Tseng, H.C., Chang, T.H., Chang, K.E.: Constructing and validating readability models: the method of integrating multilevel linguistic features with machine learning. Behav. Res. Methods 47(2), 340–354 (2015)
Article Google Scholar
Sung, Y.T., Lin, W.C., Dyson, S.B., Chang, K.E., Chen, Y.C.: Leveling L2 texts through readability: Combining multilevel linguistic features with the CEFR. Mod. Lang. J. 99(2), 371–391 (2015)
Article Google Scholar
Sung, Y.T., Chang, T.H., Lin, W.C., Hsieh, K.S., Chang, K.E.: CRIE: an automated analyzer for Chinese texts. Behav. Res. Methods 48(4), 1238–1251 (2016)
Article Google Scholar
De Clercq, O., Hoste, V.: All mixed up? Finding the optimal feature set for general readability prediction and its application to English and Dutch. Comput. Linguist. 42(3), 457–490 (2016)
Article MathSciNet Google Scholar
Tanaka-Ishii, K., Tezuka, S., Terada, H.: Sorting texts by readability. Comput. Linguist. 36(2), 203–227 (2010)
Article Google Scholar
Tseng, H.C., Chen, B., Chang, T.H., Sung, Y.T.: Integrating LSA-based hierarchical conceptual space and machine learning methods for leveling the readability of domain-specific texts. Nat. Lang. Eng. 25(3), 331–361 (2019)
Article Google Scholar
Redish, J.: Readability formulas have even more limitations than Klare discusses. ACM J. Comput. Doc. 24(3), 132–137 (2000)
Article Google Scholar
Yan, X., Song, D., Li, X.: Concept-based document readability in domain-specific information retrieval, In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, ACM, New York, NY, pp. 540–549 (2006)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Article Google Scholar
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification (2016)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceeding of the International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1301.3781. Scottsdale, Arizona, pp. 1–12 (2013)
Nan, I.: https://trans.nani.com.tw/NaniTeacher/. Accessed 31 July 2019
Lin, H.: https://www.hle.com.tw/. Accessed 31 July 2019
Hsuan, K.: Homepage. https://www.knsh.com.tw/. Accessed 31 July 2019
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), Haifa, Israel, pp. 807–814 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Research Center for Psychological and Educational Testing, National Taiwan Normal University, Taipei, Taiwan
Hou-Chiang Tseng
Department of Educational Psychology and Counseling, National Taiwan Normal University, Taipei, Taiwan
Hsueh-Chih Chen & Yao-Ting Sung
Graduate Institute of Information and Computer Education, National Taiwan Normal University, Taipei, Taiwan
Kuo-En Chang
Department of Computer Science and Information Engineering, National Taiwan Normal University, Taipei, Taiwan
Hou-Chiang Tseng & Berlin Chen

Authors

Hou-Chiang Tseng
View author publications
You can also search for this author in PubMed Google Scholar
Hsueh-Chih Chen
View author publications
You can also search for this author in PubMed Google Scholar
Kuo-En Chang
View author publications
You can also search for this author in PubMed Google Scholar
Yao-Ting Sung
View author publications
You can also search for this author in PubMed Google Scholar
Berlin Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yao-Ting Sung .

Editor information

Editors and Affiliations

The Arctic University of Norway, Tromsø, Norway
Lisbet Rønningsbakk
National Yunlin University of Science and Technology, Douliu, Taiwan
Ting-Ting Wu
Oslo Metropolitan University, Oslo, Norway
Frode Eika Sandnes
National Cheng Kung University, Tainan City, Taiwan
Yueh-Min Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tseng, HC., Chen, HC., Chang, KE., Sung, YT., Chen, B. (2019). An Innovative BERT-Based Readability Model. In: Rønningsbakk, L., Wu, TT., Sandnes, F., Huang, YM. (eds) Innovative Technologies and Learning. ICITL 2019. Lecture Notes in Computer Science(), vol 11937. Springer, Cham. https://doi.org/10.1007/978-3-030-35343-8_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-35343-8_32
Published: 18 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35342-1
Online ISBN: 978-3-030-35343-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics