Readability Classification of Bangla Texts

Islam, Zahurul; Rahman, Md. Rashedur; Mehler, Alexander

doi:10.1007/978-3-642-54903-8_42

Readability Classification of Bangla Texts

Zahurul Islam¹⁷,
Md. Rashedur Rahman¹⁷ &
Alexander Mehler¹⁷

Conference paper

1695 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8404))

Abstract

Readability classification is an important application of Natural Language Processing. It aims at judging the quality of documents and to assist writers to identify possible problems. This paper presents a readability classifier for Bangla textbooks using information-theoretic and lexical features. All together 18 features are explored to achieve an F-score of 86.46%. The paper is an extension of our previous work [1].

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Islam, Z., Mehler, A., Rahman, R.: Text readability classification of textbooks of a low-resource language. In: Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation (2012)
Google Scholar
Mikk, J.: Text comprehensibility. In: Quantitative Linguistics: An International Handbook, pp. 909–921. Walter de Gruyter (2005)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar
Dale, E., Chall, J.S.: A formula for predicting readability. Educational Research Bulletin 27(1), 11–20+28 (1948)
Google Scholar
Dale, E., Chall, J.S.: Readability Revisited: The New Dale-Chall Readability formula. Brookline Books (1995)
Google Scholar
Gunning, R.: The Technique of clear writing, Fourh Printing Edition. McGraw-Hill (1952)
Google Scholar
Kincaid, J., Fishburne, R., Rodegers, R., Chissom, B.: Derivation of new readability formulas for Navy enlisted personnel. Technical report, US Navy, Branch Report 8-75, Cheif of Naval Traning, Millington (1975)
Google Scholar
Senter, R., Smith, E.A.: Automated readability index. Technical report, Wright-Patterson Air Force Base (1967)
Google Scholar
McLaughlin, G.H.: SMOG grading – a new readability formula. Journal of Reading 12(8), 639–646 (1969)
Google Scholar
Hancke, J., Vajjala, S., Meurers, D.: Readability classification for German using lexical, syntactic, and morphological features. In: 24th International Conference on Computational Linguistics (COLING), Mumbai, India (2012)
Google Scholar
François, T., Fairon, C.: An AI readability formula for french as a foreign language. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 466–477. Association for Computational Linguistics (2012)
Google Scholar
Sato, S., Matsuyoshi, S., Kondoh, Y.: Automatic assessment of japanese text readability based on a textbook corpus. In: LREC (2008)
Google Scholar
Chen, Y.T., Chen, Y.H., Cheng, Y.C.: Assessing chinese readability using term frequency and lexical chain. Computational Linguistics and Chinese Language Processing 18(2), 1–17 (2013)
Google Scholar
Islam, M.Z., Tiedemann, J., Eisele, A.: English to bangla phrase-based machine translation. In: The 14th Annual Conference of The European Association for Machine Translation, Saint-Raphaël, France, May 27-28 (2010)
Google Scholar
Karim, M., Kaykobad, M., Murshed, M.: Technical Challenges and Design Issues in Bangla Language Processing. IGI Global (2013)
Google Scholar
Das, S., Roychoudhury, R.: Testing level of readability in Bangla novels of Bankim Chandra Chattopodhay w.r.t the density of polysyllabic words. Indian Journal of Linguistics 22, 41–51 (2004)
Google Scholar
Das, S., Roychoudhury, R.: Readabilit modeling and comparison of one and two parametric fit: a case study in Bangla. Journal of Quantative Linguistics 13(1) (2006)
Google Scholar
Sinha, M., Sakshi, S., Dasgupta, T., Basu, A.: New readability measures for Bangla and Hindi texts. In: Proceedings of COLING, pp. 1141–1150 (2012)
Google Scholar
Fitzsimmons, P., Michael, B., Hulley, J., Scott, G.: A readability assessment of online Parkinson disease information. The Journal of the Royal College of Physicians of Edinburgh 40, 292–296 (2010)
Article Google Scholar
Petersen, S.E., Ostendorf, M.: A machine learning approach to reading level assesment. Computer Speech and Language 23(1), 89–106 (2009)
Article Google Scholar
Feng, L., Elhadad, N., Huenerfauth, M.: Cognitively motivated features for readability assessment. In: Proceedings of the 12th Conference of the European Chapter of the ACL (2009)
Google Scholar
Collins-Thompson, K., Callan, J.P.: A language modeling approach to predicting reading difficulty. In: HLT-NAACL (2004)
Google Scholar
Schwarm, S.E., Ostendorf, M.: Reading level assessment using support vector machines and statistical language models. In: The Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL 2005) (2005)
Google Scholar
Aluisio, R., Specia, L., Gasperin, C., Scarton, C.: Readability assessment for text simplification. In: NAACL-HLT 2010: The 5th Workshop on Innovative Use of NLP for Building Educational Applications (2010)
Google Scholar
Kate, R.J., Luo, X., Patwardhan, S., Franz, M., Florian, R., Mooney, R.J., Roukos, S., Welty, C.: Learning to predict readability using diverse linguistic features. In: 23rd International Conference on Computational Linguistics, COLING 2010 (2010)
Google Scholar
Eickhoff, C., Serdyukov, P., de Vries, A.P.: A combined topical/non-topical approach to identifying web sites for children. In: Proceedings of the fourth ACM International Conference on Web Search and Data Mining (2011)
Google Scholar
Pitler, E., Nenkova, A.: Revisiting readability: A unified framework for predicting text quality. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP (2008)
Google Scholar
Feng, L., Janche, M., Huenerfauth, M., Elhadad, N.: A comparison of features for automatic readability assessment. In: The 23rd International Conference on Computational Linguistics, COLING (2010)
Google Scholar
Barzilay, R., Lapata, M.: Modeling local coherence: An entity-based approach. Computational Linguistics 21(3), 285–301 (2008)
Google Scholar
Heilman, M., Collins-Thompson, K., Eskenazi, M.: Combining lexical and grammatical features to improve readavility measures for first and second language text. In: Proceedings of the Human Language Technology Conference (2007)
Google Scholar
Heilman, M., Collins-Thompson, K., Eskenazi, M.: An analysis of statistical models and features for reading difficulty prediction. In: Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications, EANL (2008)
Google Scholar
Islam, Z., Mehler, A.: Automatic readability classification of crowd-sourced data based on linguistic and information-theoretic features. Computación y Sistemas 17(2), 113–123 (2013)
Google Scholar
Vajjala, S., Meurers, D.: On improving the accuracy of readability classification using insights from second language acquisition. In: Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, pp. 163–173. Association for Computational Linguistics (2012)
Google Scholar
Temnikova, I.: Text Complexity and Text Simplification in the Crisis Management Domain. PhD thesis, University of Wolverhampton (2012)
Google Scholar
Carroll, J.B.: Language and thought. Prentice-Hall, Englewood Cliffs (1964)
Google Scholar
Herdan, G.: Quantitative linguistics. Butterworths (1964)
Google Scholar
Köhler, R., Galle, M.: Dynamic aspects of text characteristics. Quantitative Text Analysis, 46–53 (1993)
Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley Interscience, Hoboken (2006)
MATH Google Scholar
Klir, G.J.: Uncertainty and Information. Wiley Interscience (2005)
Google Scholar
Borst, A., Theunissen, F.E.: Information theory and neural coding. Nature Neuroscience 2, 947–957 (1999)
Article Google Scholar
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. MIT Press (1998)
Google Scholar
Keerthi, S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Computation 13(3), 637–649 (2001)
Article MATH Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explorations 11(1), 10–18 (2009)
Article Google Scholar
Üstün, B., Melssen, W., Buydens, L.: Facilitating the application of support vector regression by using a universal Pearson VII function based kernel. Chemometrics and Intelligent Laboratory Systems 81(1), 29–40 (2006)
Article Google Scholar
Genzel, D., Charniak, E.: Entropy rate constancy in text. In: Proceedings of the 40st Meeting of the Association for Computational Linguistics, ACL 2002 (2002)
Google Scholar
Genzel, D., Charniak, E.: Variation of entropy and parse trees of sentences as a function of the sentence number. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

WG Text-Technology Computer Science, Goethe-University Frankfurt, Germany
Zahurul Islam, Md. Rashedur Rahman & Alexander Mehler

Authors

Zahurul Islam
View author publications
You can also search for this author in PubMed Google Scholar
Md. Rashedur Rahman
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Mehler
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Av. Juan Dios Bátiz, Col. Nueva Industrial Vallejo, 07738, Mexico D.F, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Islam, Z., Rahman, M.R., Mehler, A. (2014). Readability Classification of Bangla Texts. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_42

Download citation

DOI: https://doi.org/10.1007/978-3-642-54903-8_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54902-1
Online ISBN: 978-3-642-54903-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics