Education is acknowledged to be the primary vehicle for improving the economic well-being of people [1,6]. Textbooks have a direct bearing on the quality of education imparted to the students as they are the primary conduits for delivering content knowledge [9]. They are also indispensable for fostering teacher learning and constitute a key component of the ongoing professional development of the teachers [5,8]. Many textbooks, particularly from emerging countries, lack clear and adequate coverage of important concepts [7]. In this talk, we present our early explorations into developing a data mining based approach for enhancing the quality of textbooks. We discuss techniques for algorithmically augmenting different sections of a book with links to selective content mined from the Web. For finding authoritative articles, we first identify the set of key concept phrases contained in a section. Using these phrases, we find web (Wikipedia) articles that represent the central concepts presented in the section and augment the section with links to them [4]. We also describe a framework for finding images that are most relevant to a section of the textbook, while respectingglobal relevancy to the entire chapter to which the section belongs. We pose this problem of matching images to sections in a textbook chapter as an optimization problem and present an efficient algorithm for solving it [2].

We also present a diagnostic tool for identifying those sections of a book that are notwell-written and hence should be candidates for enrichment. We propose a probabilistic decision model for this purpose, which is based on syntactic complexity of the writing and the newly introduced notion of the dispersion of key concepts mentioned in the section. The model is learned using a tune set which is automatically generated in a novel way. This procedure maps sampled text book sections to the closest versions of Wikipedia articles having similar content and uses the maturity of those versions to assign need-for-enrichment labels. The maturity of a version is computed by considering the revision history of the corresponding Wikipedia article and convolving the changes in size with a smoothing filter [3].

We also provide the results of applying the proposed techniques to a corpus of widely-used, high school textbooks published by the National Council of Educational Research and Training (NCERT), India. We consider books from grades IX–XII, covering four broad subject areas, namely, Sciences, Social Sciences, Commerce, and Mathematics. The preliminary results are encouraging and indicate that developing technological approaches to enhancing the quality of textbooks could be a promising direction for research for our field.


Syntactic Complexity Entire Chapter Ongoing Professional Development Textbook Chapter Revision History 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    World Bank Knowledge for Development. World Development Report 1998/1999 (1998)Google Scholar
  2. 2.
    Agrawal, R., Gollapudi, S., Kannan, A., Kenthapadi, K.: Enriching Textbooks with Web Images. Working paper (2011)Google Scholar
  3. 3.
    Agrawal, R., Gollapudi, S., Kannan, A., Kenthapadi, K.: Identifying Enrichment Candidates in Textbooks. In: WWW (2011)Google Scholar
  4. 4.
    Agrawal, R., Gollapudi, S., Kannan, A., Kenthapadi, K., Srivastava, N., Velu, R.: Enriching Textbooks Through Data Mining. In: First Annual ACM Symposium on Computing for Development, ACM DEV (2010)Google Scholar
  5. 5.
    Gillies, J., Quijada, J.: Opportunity to Learn: A High Impact Strategy for Improving Educational Outcomes in Developing Countries. In: USAID Educational Quality Improvement Program, EQUIP2 (2008)Google Scholar
  6. 6.
    Hanushek, E.A., Woessmann, L.: The Role of Education Quality for Economic Growth. Policy Research Department Working Paper 4122, World Bank (2007)Google Scholar
  7. 7.
    Mohammad, R., Kumari, R.: Effective Use of Textbooks: A Neglected Aspect of Education in Pakistan. Journal of Education for International Development 3(1) (2007)Google Scholar
  8. 8.
    Oakes, J., Saunders, M.: Education’s Most Basic Tools: Access to Textbooks and Instructional Materials in California’s Public Schools. Teachers College Record 106(10) (2004)Google Scholar
  9. 9.
    Stein, M., Stuen, C., Carnine, D., Long, R.M.: Textbook Evaluation and Adoption. Reading & Writing Quarterly 17(1) (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Rakesh Agrawal
    • 1
  1. 1.Microsoft Search LabsUSA

Personalised recommendations