Abstract
Readability classification is an important application of Natural Language Processing. It aims at judging the quality of documents and to assist writers to identify possible problems. This paper presents a readability classifier for Bangla textbooks using information-theoretic and lexical features. All together 18 features are explored to achieve an F-score of 86.46%. The paper is an extension of our previous work [1].
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Islam, Z., Mehler, A., Rahman, R.: Text readability classification of textbooks of a low-resource language. In: Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation (2012)
Mikk, J.: Text comprehensibility. In: Quantitative Linguistics: An International Handbook, pp. 909–921. Walter de Gruyter (2005)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Dale, E., Chall, J.S.: A formula for predicting readability. Educational Research Bulletin 27(1), 11–20+28 (1948)
Dale, E., Chall, J.S.: Readability Revisited: The New Dale-Chall Readability formula. Brookline Books (1995)
Gunning, R.: The Technique of clear writing, Fourh Printing Edition. McGraw-Hill (1952)
Kincaid, J., Fishburne, R., Rodegers, R., Chissom, B.: Derivation of new readability formulas for Navy enlisted personnel. Technical report, US Navy, Branch Report 8-75, Cheif of Naval Traning, Millington (1975)
Senter, R., Smith, E.A.: Automated readability index. Technical report, Wright-Patterson Air Force Base (1967)
McLaughlin, G.H.: SMOG grading – a new readability formula. Journal of Reading 12(8), 639–646 (1969)
Hancke, J., Vajjala, S., Meurers, D.: Readability classification for German using lexical, syntactic, and morphological features. In: 24th International Conference on Computational Linguistics (COLING), Mumbai, India (2012)
François, T., Fairon, C.: An AI readability formula for french as a foreign language. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 466–477. Association for Computational Linguistics (2012)
Sato, S., Matsuyoshi, S., Kondoh, Y.: Automatic assessment of japanese text readability based on a textbook corpus. In: LREC (2008)
Chen, Y.T., Chen, Y.H., Cheng, Y.C.: Assessing chinese readability using term frequency and lexical chain. Computational Linguistics and Chinese Language Processing 18(2), 1–17 (2013)
Islam, M.Z., Tiedemann, J., Eisele, A.: English to bangla phrase-based machine translation. In: The 14th Annual Conference of The European Association for Machine Translation, Saint-Raphaël, France, May 27-28 (2010)
Karim, M., Kaykobad, M., Murshed, M.: Technical Challenges and Design Issues in Bangla Language Processing. IGI Global (2013)
Das, S., Roychoudhury, R.: Testing level of readability in Bangla novels of Bankim Chandra Chattopodhay w.r.t the density of polysyllabic words. Indian Journal of Linguistics 22, 41–51 (2004)
Das, S., Roychoudhury, R.: Readabilit modeling and comparison of one and two parametric fit: a case study in Bangla. Journal of Quantative Linguistics 13(1) (2006)
Sinha, M., Sakshi, S., Dasgupta, T., Basu, A.: New readability measures for Bangla and Hindi texts. In: Proceedings of COLING, pp. 1141–1150 (2012)
Fitzsimmons, P., Michael, B., Hulley, J., Scott, G.: A readability assessment of online Parkinson disease information. The Journal of the Royal College of Physicians of Edinburgh 40, 292–296 (2010)
Petersen, S.E., Ostendorf, M.: A machine learning approach to reading level assesment. Computer Speech and Language 23(1), 89–106 (2009)
Feng, L., Elhadad, N., Huenerfauth, M.: Cognitively motivated features for readability assessment. In: Proceedings of the 12th Conference of the European Chapter of the ACL (2009)
Collins-Thompson, K., Callan, J.P.: A language modeling approach to predicting reading difficulty. In: HLT-NAACL (2004)
Schwarm, S.E., Ostendorf, M.: Reading level assessment using support vector machines and statistical language models. In: The Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL 2005) (2005)
Aluisio, R., Specia, L., Gasperin, C., Scarton, C.: Readability assessment for text simplification. In: NAACL-HLT 2010: The 5th Workshop on Innovative Use of NLP for Building Educational Applications (2010)
Kate, R.J., Luo, X., Patwardhan, S., Franz, M., Florian, R., Mooney, R.J., Roukos, S., Welty, C.: Learning to predict readability using diverse linguistic features. In: 23rd International Conference on Computational Linguistics, COLING 2010 (2010)
Eickhoff, C., Serdyukov, P., de Vries, A.P.: A combined topical/non-topical approach to identifying web sites for children. In: Proceedings of the fourth ACM International Conference on Web Search and Data Mining (2011)
Pitler, E., Nenkova, A.: Revisiting readability: A unified framework for predicting text quality. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP (2008)
Feng, L., Janche, M., Huenerfauth, M., Elhadad, N.: A comparison of features for automatic readability assessment. In: The 23rd International Conference on Computational Linguistics, COLING (2010)
Barzilay, R., Lapata, M.: Modeling local coherence: An entity-based approach. Computational Linguistics 21(3), 285–301 (2008)
Heilman, M., Collins-Thompson, K., Eskenazi, M.: Combining lexical and grammatical features to improve readavility measures for first and second language text. In: Proceedings of the Human Language Technology Conference (2007)
Heilman, M., Collins-Thompson, K., Eskenazi, M.: An analysis of statistical models and features for reading difficulty prediction. In: Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications, EANL (2008)
Islam, Z., Mehler, A.: Automatic readability classification of crowd-sourced data based on linguistic and information-theoretic features. Computación y Sistemas 17(2), 113–123 (2013)
Vajjala, S., Meurers, D.: On improving the accuracy of readability classification using insights from second language acquisition. In: Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, pp. 163–173. Association for Computational Linguistics (2012)
Temnikova, I.: Text Complexity and Text Simplification in the Crisis Management Domain. PhD thesis, University of Wolverhampton (2012)
Carroll, J.B.: Language and thought. Prentice-Hall, Englewood Cliffs (1964)
Herdan, G.: Quantitative linguistics. Butterworths (1964)
Köhler, R., Galle, M.: Dynamic aspects of text characteristics. Quantitative Text Analysis, 46–53 (1993)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley Interscience, Hoboken (2006)
Klir, G.J.: Uncertainty and Information. Wiley Interscience (2005)
Borst, A., Theunissen, F.E.: Information theory and neural coding. Nature Neuroscience 2, 947–957 (1999)
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. MIT Press (1998)
Keerthi, S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Computation 13(3), 637–649 (2001)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explorations 11(1), 10–18 (2009)
Üstün, B., Melssen, W., Buydens, L.: Facilitating the application of support vector regression by using a universal Pearson VII function based kernel. Chemometrics and Intelligent Laboratory Systems 81(1), 29–40 (2006)
Genzel, D., Charniak, E.: Entropy rate constancy in text. In: Proceedings of the 40st Meeting of the Association for Computational Linguistics, ACL 2002 (2002)
Genzel, D., Charniak, E.: Variation of entropy and parse trees of sentences as a function of the sentence number. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Islam, Z., Rahman, M.R., Mehler, A. (2014). Readability Classification of Bangla Texts. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_42
Download citation
DOI: https://doi.org/10.1007/978-3-642-54903-8_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54902-1
Online ISBN: 978-3-642-54903-8
eBook Packages: Computer ScienceComputer Science (R0)