Metro maps for efficient knowledge learning by summarizing massive electronic textbooks

  • Weiming LuEmail author
  • Pengkun Ma
  • Jiale Yu
  • Yangfan Zhou
  • Baogang Wei
Original Paper


As the number of textbooks soars, people may be stuck into thousands of books when learning knowledge. In order to provide a concise yet comprehensive picture for learning, we propose a novel framework, called MM4Books, to automatically build metro maps for efficient knowledge learning by summarizing massive electronic textbooks. We represent each book in digital libraries as a sequence of chapters, and then obtain learning objects by clustering the semantically similar chapters via an unsupervised clustering method to create a learning graph, and then build the metro map by applying an integer linear programming-based technique to select a collection of high informative and fluent but low redundant learning paths from the learning graph. To the best of our knowledge, it is the first work to address this task. Experiments show that our proposed approach outperforms all the state-of-the-art baseline approaches, and we also implemented a practical MM4Books system to prove that users can really benefit from the proposed approach for knowledge learning.


Knowledge learning Massive book summarization Learning path Digital library 



This work is supported by the Zhejiang Provincial Natural Science Foundation of China (No. LY17F020015), the Chinese Knowledge Center of Engineering Science and Technology (CKCEST), the Fundamental Research Funds for the Central Universities (No. 2017FZA5016), and MOE-Engineering Research Center of Digital Library.


  1. 1.
    Agrawal, R., Gollapudi, S., Kannan, A., Kenthapadi, K.: Enriching textbooks with images. In: CIKM (2011)Google Scholar
  2. 2.
    Agrawal, R., Gollapudi, S., Kannan, A., Kenthapadi, K.: Data mining for improving textbooks. ACM SIGKDD Explor. Newsl. 13(2), 7–19 (2012)CrossRefGoogle Scholar
  3. 3.
    Agrawal, R., Gollapudi, S., Kannan, A., Kenthapadi, K.: Study navigator: an algorithmically generated aid for learning from electronic textbooks. In: EDM (2014)Google Scholar
  4. 4.
    Chen, Z., Zhang, X., Boedihardjo, A.P., Dai, J., Lu, C.T.: Multimodal storytelling via generative adversarial imitation learning. In: IJCAI (2017)Google Scholar
  5. 5.
    Csomai, A., Mihalcea, R.: Linking educational materials to encyclopedic knowledge. In: AIED (2007)Google Scholar
  6. 6.
    Dou, W., Yu, L., Wang, X., Ma, Z., Ribarsky, W.: Hierarchicaltopics: visually exploring large text collections using topic hierarchies. IEEE Trans. Vis. Comput. Graph. 19, 2002–2011 (2013)CrossRefGoogle Scholar
  7. 7.
    Filippova, K.: Multi-sentence compression: finding shortest paths in word graphs. In: COLING (2010)Google Scholar
  8. 8.
    Gillies, J., Quijada, J.J.: Opportunity to learn: a high impact strategy for improving educational outcomes in developing countries. Working Paper. Academy for Educational Development (2008)Google Scholar
  9. 9.
    He, Z., Chen, C., Bu, J., Wang, C., Zhang, L., Cai, D., He, X.: Document summarization based on data reconstruction. In: AAAI (2012)Google Scholar
  10. 10.
    Hu, B., Lu, Z., Li, H., Chen, Q.: (2014a) Convolutional neural network architectures for matching natural language sentences. In: NIPSGoogle Scholar
  11. 11.
    Hu, P., Huang, M., Zhu, X.: Exploring the interactions of storylines from informative news events. J. Comput. Sci. Technol. 29, 502–518 (2014b)CrossRefGoogle Scholar
  12. 12.
    Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: ACL (2014)Google Scholar
  13. 13.
    Kenter, T., de Rijke, M.: Short text similarity with word embeddings. In: CIKM (2015)Google Scholar
  14. 14.
    Kokkodis, M., Kannan, A., Kenthapadi, K.: Assigning educational videos at appropriate locations in textbooks. In: EDM (2014)Google Scholar
  15. 15.
    Larranaga, M., Conde, A., Calvo, I., Elorriaga, J.A., Arruarte, A.: Automatic generation of the domain module from electronic textbooks: method and validation. IEEE Trans. Knowl. Data Eng. 26(1), 69–82 (2014)CrossRefGoogle Scholar
  16. 16.
    Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: ICML (2014)Google Scholar
  17. 17.
    Liang, C., Wang, S., Wu, Z., Williams, K., Pursel, B., Brautigam, B., Saul, S., Williams, H., Bowen, K., Giles, C.L.: Bbookx: an automatic book creation framework. In: Proceedings of the 2015 ACM Symposium on Document Engineering, pp 121–124. ACM (2015)Google Scholar
  18. 18.
    Lu, Z., Li, H.: A deep architecture for matching short texts. In: NIPS (2013)Google Scholar
  19. 19.
    von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17, 395–416 (2007)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Mei, Q., Guo, J., Radev, D.R.: Divrank: the interplay of prestige and diversity in information networks. In: KDD (2010)Google Scholar
  21. 21.
    Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: EMNLP (2004)Google Scholar
  22. 22.
    Pang, L., Lan, Y., Guo, J., Xu, J., Wan, S., Cheng, X.: Text matching as image recognition. In: AAAI (2016)Google Scholar
  23. 23.
    Shahaf, D., Guestrin, C., Horvitz, E.: Metro maps of science. In: KDD (2012a)Google Scholar
  24. 24.
    Shahaf, D., Guestrin, C., Horvitz, E.: Trains of thought: generating information maps. In: WWW (2012b)Google Scholar
  25. 25.
    Sigurdsson, G.A., Chen, X., Gupta, A.: Learning visual storylines with skipping recurrent neural networks. In: ECCV (2016)Google Scholar
  26. 26.
    Tang, S., Wu, F., Li, S., Lu, W., Zhang, Z., Zhuang, Y.: Sketch the storyline with charcoal: a non-parametric approach. In: IJCAI (2015)Google Scholar
  27. 27.
    Tran, T.A., Niederée, C., Kanhabua, N., Gadiraju, U., Anand, A.: Balancing novelty and salience: adaptive learning to rank entities for timeline summarization of high-impact events. In: CIKM (2015)Google Scholar
  28. 28.
    Wang, D., Li, T., Ogihara, M.: Generating pictorial storylines via minimum-weight connected dominating set approximation in multi-view graphs. In: AAAI (2012)Google Scholar
  29. 29.
    Wang, L., Cardie, C., Marchetti, G.: Socially-informed timeline generation for complex events. In: HLT-NAACL (2015a)Google Scholar
  30. 30.
    Wang, S., Liang, C., Wu, Z., Williams, K., Pursel, B., Brautigam, B., Saul, S., Williams, H., Bowen, K., Giles, C.L.: Concept hierarchy extraction from textbooks. In: Proceedings of the 2015 ACM Symposium on Document Engineering, pp. 147–156. ACM (2015b)Google Scholar
  31. 31.
    Wang, S., Ororbia, A., Wu, Z., Williams, K., Liang, C., Pursel, B., Giles, C.L.: (2016) Using prerequisites to extract concept maps from textbooks. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 317–326. ACMGoogle Scholar
  32. 32.
    Wang, Z., Shou, L., Chen, K., Chen, G., Mehrotra, S.: On summarization and timeline generation for evolutionary tweet streams. IEEE Trans. Knowl. Data Eng. 27, 1301–1315 (2015c)CrossRefGoogle Scholar
  33. 33.
    Wang, Z., Hamza, W., Florian, R.: Bilateral multi-perspective matching for natural language sentences. CoRR arXiv:1702.03814 (2017)
  34. 34.
    Wu, Y., Wu, W., Li, Z., Zhou, M.: Response selection with topic clues for retrieval-based chatbots. arXiv:160500090 (2016)
  35. 35.
    Wu, Z., Li, Z., Mitra, P., Giles, C.L.: Can back-of-the-book indexes be automatically created? In: CIKM (2013)Google Scholar
  36. 36.
    Yang, S., Lu, W., Yang, D., Li, X., Wu, C., Wei, B.: Keyphraseds: automatic generation of survey by exploiting keyphrase information. Neurocomputing 224, 58–70 (2017)CrossRefGoogle Scholar
  37. 37.
    Yu, S., Li, X., Zhao, X., Zhang, Z., Wu, F.: Tracking news article evolution by dense subgraph learning. Neurocomputing 168, 1076–1084 (2015)CrossRefGoogle Scholar
  38. 38.
    Zhang, L., Li, L., Li, T., Zhang, Q.: Patentline: analyzing technology evolution on multi-view patent graphs. In: SIGIR (2014)Google Scholar
  39. 39.
    Zhao, Y., Karypis, G.: Evaluation of hierarchical clustering algorithms for document datasets. In: CIKM (2002)Google Scholar
  40. 40.
    Zhou, D., Xu, H., He, Y.: An unsupervised Bayesian modelling approach for storyline detection on news articles. In: EMNLP (2015)Google Scholar
  41. 41.
    Zhou, D., Xu, H., Dai, X.Y., He, Y.: Unsupervised storyline extraction from news articles. In: IJCAI (2016)Google Scholar
  42. 42.
    Zhu, X., Ming, Z., Zhu, X., Chua, T.S.: Topic hierarchy construction for the organization of multi-source user generated contents. In: SIGIR (2013)Google Scholar
  43. 43.
    Zhu, X., Ming, Z., Hao, Y., Zhu, X., Chua, T.S.: Customized organization of social media contents using focused topic hierarchy. In: CIKM (2014)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Weiming Lu
    • 1
    Email author
  • Pengkun Ma
    • 1
  • Jiale Yu
    • 1
  • Yangfan Zhou
    • 1
  • Baogang Wei
    • 1
  1. 1.College of Computer Science and TechnologyZhejiang UniversityHangzhouChina

Personalised recommendations