Skip to main content

Readability Classification of Bangla Texts

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8404))

Abstract

Readability classification is an important application of Natural Language Processing. It aims at judging the quality of documents and to assist writers to identify possible problems. This paper presents a readability classifier for Bangla textbooks using information-theoretic and lexical features. All together 18 features are explored to achieve an F-score of 86.46%. The paper is an extension of our previous work [1].

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Islam, Z., Mehler, A., Rahman, R.: Text readability classification of textbooks of a low-resource language. In: Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation (2012)

    Google Scholar 

  2. Mikk, J.: Text comprehensibility. In: Quantitative Linguistics: An International Handbook, pp. 909–921. Walter de Gruyter (2005)

    Google Scholar 

  3. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  4. Dale, E., Chall, J.S.: A formula for predicting readability. Educational Research Bulletin 27(1), 11–20+28 (1948)

    Google Scholar 

  5. Dale, E., Chall, J.S.: Readability Revisited: The New Dale-Chall Readability formula. Brookline Books (1995)

    Google Scholar 

  6. Gunning, R.: The Technique of clear writing, Fourh Printing Edition. McGraw-Hill (1952)

    Google Scholar 

  7. Kincaid, J., Fishburne, R., Rodegers, R., Chissom, B.: Derivation of new readability formulas for Navy enlisted personnel. Technical report, US Navy, Branch Report 8-75, Cheif of Naval Traning, Millington (1975)

    Google Scholar 

  8. Senter, R., Smith, E.A.: Automated readability index. Technical report, Wright-Patterson Air Force Base (1967)

    Google Scholar 

  9. McLaughlin, G.H.: SMOG grading – a new readability formula. Journal of Reading 12(8), 639–646 (1969)

    Google Scholar 

  10. Hancke, J., Vajjala, S., Meurers, D.: Readability classification for German using lexical, syntactic, and morphological features. In: 24th International Conference on Computational Linguistics (COLING), Mumbai, India (2012)

    Google Scholar 

  11. François, T., Fairon, C.: An AI readability formula for french as a foreign language. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 466–477. Association for Computational Linguistics (2012)

    Google Scholar 

  12. Sato, S., Matsuyoshi, S., Kondoh, Y.: Automatic assessment of japanese text readability based on a textbook corpus. In: LREC (2008)

    Google Scholar 

  13. Chen, Y.T., Chen, Y.H., Cheng, Y.C.: Assessing chinese readability using term frequency and lexical chain. Computational Linguistics and Chinese Language Processing 18(2), 1–17 (2013)

    Google Scholar 

  14. Islam, M.Z., Tiedemann, J., Eisele, A.: English to bangla phrase-based machine translation. In: The 14th Annual Conference of The European Association for Machine Translation, Saint-Raphaël, France, May 27-28 (2010)

    Google Scholar 

  15. Karim, M., Kaykobad, M., Murshed, M.: Technical Challenges and Design Issues in Bangla Language Processing. IGI Global (2013)

    Google Scholar 

  16. Das, S., Roychoudhury, R.: Testing level of readability in Bangla novels of Bankim Chandra Chattopodhay w.r.t the density of polysyllabic words. Indian Journal of Linguistics 22, 41–51 (2004)

    Google Scholar 

  17. Das, S., Roychoudhury, R.: Readabilit modeling and comparison of one and two parametric fit: a case study in Bangla. Journal of Quantative Linguistics 13(1) (2006)

    Google Scholar 

  18. Sinha, M., Sakshi, S., Dasgupta, T., Basu, A.: New readability measures for Bangla and Hindi texts. In: Proceedings of COLING, pp. 1141–1150 (2012)

    Google Scholar 

  19. Fitzsimmons, P., Michael, B., Hulley, J., Scott, G.: A readability assessment of online Parkinson disease information. The Journal of the Royal College of Physicians of Edinburgh 40, 292–296 (2010)

    Article  Google Scholar 

  20. Petersen, S.E., Ostendorf, M.: A machine learning approach to reading level assesment. Computer Speech and Language 23(1), 89–106 (2009)

    Article  Google Scholar 

  21. Feng, L., Elhadad, N., Huenerfauth, M.: Cognitively motivated features for readability assessment. In: Proceedings of the 12th Conference of the European Chapter of the ACL (2009)

    Google Scholar 

  22. Collins-Thompson, K., Callan, J.P.: A language modeling approach to predicting reading difficulty. In: HLT-NAACL (2004)

    Google Scholar 

  23. Schwarm, S.E., Ostendorf, M.: Reading level assessment using support vector machines and statistical language models. In: The Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL 2005) (2005)

    Google Scholar 

  24. Aluisio, R., Specia, L., Gasperin, C., Scarton, C.: Readability assessment for text simplification. In: NAACL-HLT 2010: The 5th Workshop on Innovative Use of NLP for Building Educational Applications (2010)

    Google Scholar 

  25. Kate, R.J., Luo, X., Patwardhan, S., Franz, M., Florian, R., Mooney, R.J., Roukos, S., Welty, C.: Learning to predict readability using diverse linguistic features. In: 23rd International Conference on Computational Linguistics, COLING 2010 (2010)

    Google Scholar 

  26. Eickhoff, C., Serdyukov, P., de Vries, A.P.: A combined topical/non-topical approach to identifying web sites for children. In: Proceedings of the fourth ACM International Conference on Web Search and Data Mining (2011)

    Google Scholar 

  27. Pitler, E., Nenkova, A.: Revisiting readability: A unified framework for predicting text quality. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP (2008)

    Google Scholar 

  28. Feng, L., Janche, M., Huenerfauth, M., Elhadad, N.: A comparison of features for automatic readability assessment. In: The 23rd International Conference on Computational Linguistics, COLING (2010)

    Google Scholar 

  29. Barzilay, R., Lapata, M.: Modeling local coherence: An entity-based approach. Computational Linguistics 21(3), 285–301 (2008)

    Google Scholar 

  30. Heilman, M., Collins-Thompson, K., Eskenazi, M.: Combining lexical and grammatical features to improve readavility measures for first and second language text. In: Proceedings of the Human Language Technology Conference (2007)

    Google Scholar 

  31. Heilman, M., Collins-Thompson, K., Eskenazi, M.: An analysis of statistical models and features for reading difficulty prediction. In: Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications, EANL (2008)

    Google Scholar 

  32. Islam, Z., Mehler, A.: Automatic readability classification of crowd-sourced data based on linguistic and information-theoretic features. Computación y Sistemas 17(2), 113–123 (2013)

    Google Scholar 

  33. Vajjala, S., Meurers, D.: On improving the accuracy of readability classification using insights from second language acquisition. In: Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, pp. 163–173. Association for Computational Linguistics (2012)

    Google Scholar 

  34. Temnikova, I.: Text Complexity and Text Simplification in the Crisis Management Domain. PhD thesis, University of Wolverhampton (2012)

    Google Scholar 

  35. Carroll, J.B.: Language and thought. Prentice-Hall, Englewood Cliffs (1964)

    Google Scholar 

  36. Herdan, G.: Quantitative linguistics. Butterworths (1964)

    Google Scholar 

  37. Köhler, R., Galle, M.: Dynamic aspects of text characteristics. Quantitative Text Analysis, 46–53 (1993)

    Google Scholar 

  38. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley Interscience, Hoboken (2006)

    MATH  Google Scholar 

  39. Klir, G.J.: Uncertainty and Information. Wiley Interscience (2005)

    Google Scholar 

  40. Borst, A., Theunissen, F.E.: Information theory and neural coding. Nature Neuroscience 2, 947–957 (1999)

    Article  Google Scholar 

  41. Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. MIT Press (1998)

    Google Scholar 

  42. Keerthi, S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Computation 13(3), 637–649 (2001)

    Article  MATH  Google Scholar 

  43. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explorations 11(1), 10–18 (2009)

    Article  Google Scholar 

  44. Üstün, B., Melssen, W., Buydens, L.: Facilitating the application of support vector regression by using a universal Pearson VII function based kernel. Chemometrics and Intelligent Laboratory Systems 81(1), 29–40 (2006)

    Article  Google Scholar 

  45. Genzel, D., Charniak, E.: Entropy rate constancy in text. In: Proceedings of the 40st Meeting of the Association for Computational Linguistics, ACL 2002 (2002)

    Google Scholar 

  46. Genzel, D., Charniak, E.: Variation of entropy and parse trees of sentences as a function of the sentence number. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Islam, Z., Rahman, M.R., Mehler, A. (2014). Readability Classification of Bangla Texts. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54903-8_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54902-1

  • Online ISBN: 978-3-642-54903-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics