Skip to main content

A Survey of Literature Analysis Methods Based on Representation Learning

  • Conference paper
  • First Online:
Image and Graphics Technologies and Applications (IGTA 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1611))

Included in the following conference series:

  • 513 Accesses

Abstract

The analysis of massive scientific literature collections can help researchers capture potential knowledge, understand disciplinary insights, and discover opportunities for academic collaboration. However, the huge volume of literature data becomes a major problem for human analysis and computer processing, which prevents scholars from rapidly obtaining disciplinary knowledge. Representation learning methods can effectively handle large-scale text and graph data, which have excelled in literature analysis tasks. In this study, we introduce the types of literature data and several typical academic networks. Then, we review the representation learning models and classify these methods into (a) word representation learning-based, including word2vec, GloVe, ELMo, Bert, etc.; (b) graph representation learning-based, including matrix factorization-based, random walk-based and deep learning-based methods, such as DeepWalk, node2vec, GCN, etc. Finally, we discuss the opportunities and present three challenges of representation learning-based approaches for literature analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhang, C., Li, Z., Zhang, J.: A survey on visualization for scientific literature topics. J. Vis. 21(2), 321–335 (2017). https://doi.org/10.1007/s12650-017-0462-2

    Article  Google Scholar 

  2. Federico, P., Heimerl, F., Koch, S., Miksch, S.: A survey on visual approaches for analyzing scientific literature and patents. IEEE Trans. Vis. Comput. Graph. 23, 2179–2198 (2016)

    Article  Google Scholar 

  3. Onwuegbuzie, A.J., Leech, N.L., Collins, K.M.T.: Qualitative analysis techniques for the review of the literature. Qual. Rep. 17, 56 (2012)

    Google Scholar 

  4. Thilakaratne, M., Falkner, K., Atapattu, T.: A systematic review on literature-based discovery: general overview, methodology, & statistical analysis. ACM Comput. Surv. 52, 1–34 (2019)

    Article  Google Scholar 

  5. Chen, C.: CiteSpace II: detecting and visualizing emerging trends and transient patterns in scientific literature. J. Am. Soc. Inform. Sci. Technol. 57, 359–377 (2006)

    Article  Google Scholar 

  6. Yetisgen-Yildiz, M., Pratt, W.: Using statistical and knowledge-based approaches for literature-based discovery. J. Biomed. Inform. 39, 600–611 (2006)

    Article  Google Scholar 

  7. Chen, F., Wang, Y.C., Wang, B., Kuo, C.C.J.: Graph representation learning: a survey. APSIPA Trans. Signal Inf. Process. 9, e15 (2020)

    Google Scholar 

  8. Gao, J., Li, D., He, X., Wang, Y.Y., Duh, K., Liu, X.: Representation Learning Using Multi-Task Deep Neural Networks. US20170032035A1 (2017)

    Google Scholar 

  9. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013)

    Article  Google Scholar 

  10. Jacso, P.: Academic search engines: a quantitative outlook. Online Information Review (2000)

    Google Scholar 

  11. Williams, K., Jian, W., Choudhury, S.R., Khabsa, M., Giles, C.L.: Scholarly big data information extraction and integration in the CiteSeerχ digital library. In: IEEE International Conference on Data Engineering Workshops. IEEE (2017)

    Google Scholar 

  12. London, B., Getoor, L.: Collective classification of network data. Data Classif. Algorithms Appl. 399–416 (2014)

    Google Scholar 

  13. Cecile, C., Antoine, G., Karina, V.S., Mathieu, H., Pierre-Yves, L.T.: The CORA dataset: validation and diagnostics of in-situ ocean temperature and salinity measurements. Ocean Sci. 9(special issue: The MyOcean project: scientific advances for operational ocean monitoring and forecasting), 1–18 (2013)

    Google Scholar 

  14. Annarelli, A., Battistella, C., Nonino, F., Parida, V., Pessot, E.: Literature review on digitalization capabilities: co-citation analysis of antecedents, conceptualization and consequences. Technol. Forecast. Soc. Chang. 166, 120635 (2021)

    Article  Google Scholar 

  15. Hausberg, J.P., Korreck, S.: Business incubators and accelerators: a co-citation analysis-based, systematic literature review. In: Handbook of Research on Business and Technology Incubation and Acceleration (2021)

    Google Scholar 

  16. Liu, S., et al.: Bridging text visualization and mining: a task-driven survey. IEEE Trans. Vis. Comput. Graph. 25, 2482–2504 (2019)

    Article  Google Scholar 

  17. Kevork, E.K., Vrechopoulos, A.P.: CRM literature: conceptual and functional insights by keyword analysis. Mark. Intell. Plan. 1(1), 48–55 (2019)

    Google Scholar 

  18. Siddiqi, S., Sharan, A.: Keyword and keyphrase extraction techniques: a literature review. Int. J. Comput. Appl. 109 (2015)

    Google Scholar 

  19. Gopalakrishnan, V., Jha, K., Xun, G., Ngo, H.Q., Zhang, A.: Towards self-learning based hypotheses generation in biomedical text domain. Bioinformatics 34, 2103–2115 (2018)

    Article  Google Scholar 

  20. Chen, Y., Lv, C., Li, Y., Chen, W., Ma, K.-L.: Ordered matrix representation supporting the visual analysis of associated data. Science China Inf. Sci. 63(8), 1–3 (2020). https://doi.org/10.1007/s11432-019-2647-3

    Article  Google Scholar 

  21. Chen, Y., Sun, M., Wu, C., Sun, X.: Visual associative analysis of big data in food safety: a review. Big Data Res. 7, 61–77 (2021)

    Google Scholar 

  22. Du, X., Chen, Y., Li, Y.: TransGraph: a transformation-based graph for analyzing relations in data set. J. Comput.-Aided Des. Comput. Graph. 30, 79–89 (2018)

    Google Scholar 

  23. Chen, Y.: A survey on visualization approaches for exploring association relationships in graph data. J. Vis. 22, 625–639 (2019)

    Article  Google Scholar 

  24. Radhakrishnan, S., Erbis, S., Isaacs, J.A., Kamarthi, S.: Novel keyword co-occurrence network-based methods to foster systematic reviews of scientific literature. PLoS ONE 12, e0172778 (2017)

    Article  Google Scholar 

  25. Butun, E., Kaya, M.: Predicting citation count of scientists as a link prediction problem. IEEE Trans. Cybern. 50, 4518–4529 (2020)

    Article  Google Scholar 

  26. Choe, K., Jung, S., Park, S., Hong, H., Seo, J.: Papers101: supporting the discovery process in the literature review workflow for novice researchers. In: IEEE Pacific Visualization Symposium, pp. 176–180 (2021)

    Google Scholar 

  27. Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proc. Nat. Acad. Sci. USA 105, 1118–1123 (2008)

    Article  Google Scholar 

  28. Lu, M., Qu, Z., Wang, M., Qin, Z.: Recommending authors and papers based on ACTTM community and bilayer citation network. China Commun. 15, 111–130 (2018)

    Article  Google Scholar 

  29. Heimerl, F., Han, Q., Koch, S., Ertl, T.: CiteRivers: visual analytics of citation patterns. IEEE Trans. Vis. Comput. Graph. 22, 190–199 (2016)

    Article  Google Scholar 

  30. Li, H., An, H., Wang, Y., Huang, J., Gao, X.: Evolutionary features of academic articles co-keyword network and keywords co-occurrence network: based on two-mode affiliation network. Physica A 450, 657–669 (2016)

    Article  Google Scholar 

  31. Benito-Santos, A., Sanchez, R.T.: Cross-domain visual exploration of academic corpora via the latent meaning of user-authored keywords. IEEE Access 7, 98144–98160 (2019)

    Article  Google Scholar 

  32. Abdelaal, M., Heimerl, F., Koch, S.: ColTop: visual topic-based analysis of scientific community structure. In: 2017 International Symposium on Big Data Visual Analytics, BDVA 2017 (2017)

    Google Scholar 

  33. Li, E.Y., Liao, C.H., Yen, H.R.: Co-authorship networks and research impact: a social capital perspective. Res. Policy 42, 1515–1530 (2013)

    Article  Google Scholar 

  34. Park, I., Yoon, B.: Technological opportunity discovery for technological convergence based on the prediction of technology knowledge flow in a citation network. J. Informetr. 12, 1199–1222 (2018)

    Article  Google Scholar 

  35. Érdi, P., et al.: Prediction of emerging technologies based on analysis of the US patent citation network. Scientometrics 95, 225–242 (2013)

    Article  Google Scholar 

  36. Eto, M.: Extended co-citation search: Graph-based document retrieval on a co-citation network containing citation context information. Inf. Process. Manag. 56, 102046 (2019)

    Article  Google Scholar 

  37. Shiau, W.L., Dwivedi, Y.K., Yang, H.S.: Co-citation and cluster analyses of extant literature on social networks. Int. J. Inf. Manag. 37, 390–399 (2017)

    Article  Google Scholar 

  38. Shin, H., Perdue, R.R.: Self-service technology research: a bibliometric co-citation visualization analysis. Int. J. Hosp. Manag. 80, 101–112 (2019)

    Article  Google Scholar 

  39. Verma, S., Bhattacharyya, S.S.: The intellectual core and structure of mergers and acquisitions literature: a co-citation analysis. Int. J. Bus. Innov. Res. 20, 305–336 (2019)

    Article  Google Scholar 

  40. Chen, J., Gong, Z., Wang, W., Wang, C., Liu, W.: Adversarial caching training: unsupervised inductive network representation learning on large-scale graphs. IEEE Trans. Neural Netw. Learn. Syst. 2021, 1–12 (2021)

    Google Scholar 

  41. Wu, S., et al.: Deep learning in clinical natural language processing: a methodical review. J. Am. Med. Inform. Assoc. 27(3), 457–470 (2020)

    Article  Google Scholar 

  42. Gysel, C.V., Rijke, M.D., Kanoulas, E.: Neural vector spaces for unsupervised information retrieval. ACM Trans. Inf. Syst. (TOIS) 36(4), 1–25 (2017)

    Article  Google Scholar 

  43. Bengio, Y., Ducharme, R., Vincent, P.: A neural probabilistic language model. In: Advances in Neural Information Processing Systems, vol. 13, pp. 1–7 (2003)

    Google Scholar 

  44. Mikolov, T., Sutskever, I., Chen, K.: Distributed representations of words and phrases and their compositionality. In: The 26th International Conference on Neural Information Processing Systems, pp. 3111–3119. ACM Press, New York (2013)

    Google Scholar 

  45. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)

    Google Scholar 

  46. Peters, M., Neumann, M., Iyyer, M., Gardner, M., Zettlemoyer, L.: Deep contextualized word representations (2018)

    Google Scholar 

  47. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  48. Xu, G., Meng, Y., Qiu, X., Yu, Z., Wu, X.: Sentiment analysis of comment texts based on BiLSTM. IEEE Access 7, 51522–51532 (2019)

    Article  Google Scholar 

  49. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  50. Tafti, A.P., Wang, Y., Shen, F., Sagheb, E., Kingsbury, P., Liu, H.: Integrating word embedding neural networks with PubMed abstracts to extract keyword proximity of chronic diseases. In: 2019 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI), pp. 1–4 (2019)

    Google Scholar 

  51. Zhang, J., Wan, Y., Hu, Y.: Analyzing sci-tech topics based on semantic representation of patent references. Data Anal. Knowl. Discov. 3, 52–60 (2019)

    Google Scholar 

  52. Sun, X., Chen, N.: Analysis of Patent Science Relevance Based on Representation Learning, vol. 41, pp. 10–18 (2021)

    Google Scholar 

  53. Wang, W., Yao, C., Qiao, Z., Cui, W., Du, Y., Zhou, Y.: Method of discovering interdisciplinary knowledge of the national natural science foundation of China based on word embedding: a case study on artificial intelligence and information management. J. China Soc. Sci. Tech. Inf. 40, 15 (2021)

    Google Scholar 

  54. Kanakia, A., Shen, Z., Eide, D., Wang, K.: A scalable hybrid research paper recommender system for Microsoft academic. In: WWW (2019)

    Google Scholar 

  55. Narechania, A., Karduni, A., Wesslen, R., Wall, E.: VITALITY: promoting serendipitous discovery of academic literature with transformers & visual analytics. IEEE Trans. Visual Comput. Graphics 28, 486–496 (2021)

    Article  Google Scholar 

  56. Tu, Y., Xu, J., Shen, H.W.: KeywordMap: attention-based visual exploration for keyword analysis. In: 2021 IEEE 14th Pacific Visualization Symposium (PacificVis), pp. 206–215 (2021)

    Google Scholar 

  57. Tshitoyan, V., et al.: Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571, 95–98 (2019)

    Article  Google Scholar 

  58. Sun, C., et al.: A deep learning approach with deep contextualized word representations for chemical–protein interaction extraction from biomedical literature. IEEE Access 7, 151034–151046 (2019)

    Article  Google Scholar 

  59. Chen, B., Ding, Y., Ma, F.: Semantic word shifts in a scientific domain. Scientometrics 117(1), 211–226 (2018). https://doi.org/10.1007/s11192-018-2843-2

    Article  Google Scholar 

  60. Yun, J.: Generalization of bibliographic coupling and co-citation using the node split network. arXiv preprint arXiv:2110.15513 (2021)

  61. Hu, A., Chen, H.: Data visualization analysis of knowledge graph application. In: 2021 2nd International Conference on Artificial Intelligence and Information Systems, pp. 1–10 (2021)

    Google Scholar 

  62. Ribeiro, L.F., Saverese, P.H., Figueiredo, D.R.: struc2vec: learning node representations from structural identity. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 385–394 (2017)

    Google Scholar 

  63. Wang, X., Cui, P., Wang, J., Pei, J., Zhu, W., Yang, S.: Community preserving network embedding. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  64. Tu, C., Zhang, W., Liu, Z., Sun, M.: Max-margin deepwalk: discriminative learning of network representation. In: IJCAI, vol. 2016, pp. 3889–3895 (2016)

    Google Scholar 

  65. Zhang, D., Yin, J., Zhu, X., Zhang, C.: Collective classification via discriminative matrix factorization on sparsely labeled networks. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1563–1572 (2016)

    Google Scholar 

  66. Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)

    Google Scholar 

  67. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J. Mei, Q.: Line: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077 (2015)

    Google Scholar 

  68. Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)

    Google Scholar 

  69. Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1225–1234 (2016)

    Google Scholar 

  70. Gu, Y., Sun, Y., Li, Y., Yang, Y.: Rare: social rank regulated large-scale network embedding. In: Proceedings of the 2018 World Wide Web Conference, pp. 359–368 (2018)

    Google Scholar 

  71. Dong, Y., Chawla, N.V. Swami, A.: metapath2vec: scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 135–144 (2017)

    Google Scholar 

  72. Gallicchio, C., Micheli, A.: Fast and deep graph neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 3898–3905 (2020)

    Google Scholar 

  73. Kipf, T. N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  74. Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  75. Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)

  76. Tian, H., Zhuo, H. H.: Paper2vec: citation-context based document distributed representation for scholar recommendation. arXiv preprint arXiv:1703.06587 (2017)

  77. Ganesh, J., Ganguly, S., Gupta, M., Varma, V., Pudi, V.: Author2vec: learning author representations by combining content and link information. In: WWW (Companion Volume) (2016)

    Google Scholar 

  78. Qin, J., Zeng, X., Wu, S., Tang, E.: E-GCN: graph convolution with estimated labels. Appl. Intell. 51(7), 5007–5015 (2021). https://doi.org/10.1007/s10489-020-02093-5

    Article  Google Scholar 

  79. Jeong, C., Jang, S., Park, E., Choi, S.: A context-aware citation recommendation model with BERT and graph convolutional networks. Scientometrics 124(3), 1907–1922 (2020). https://doi.org/10.1007/s11192-020-03561-y

    Article  Google Scholar 

  80. Wu, Y., Wang, B., Cui, Y., Tong, X.: Study on co-citation enhancing directed network embedding. Comput. Sci. 47, 279–284 (2020)

    Google Scholar 

  81. Yadati, N., Nimishakavi, M., Yadav, P., Nitin, V., Louis, A., Talukdar, P.: Hypergcn: a new method for training graph convolutional networks on hypergraphs. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  82. Ganesh, J., Gupta, M., Varma, V.: Doc2Sent2Vec: a novel two-phase approach for learning document representation. In: SIGIR, pp. 809–812 (2016)

    Google Scholar 

  83. Agarwal, V., Joglekar, S., Young, A.P., Sastry, N.: GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates. arXiv preprint arXiv:2202.08175 (2022)

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (61972010); National Key R&D program of China (2018YFC1603602).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Q., Chen, Y. (2022). A Survey of Literature Analysis Methods Based on Representation Learning. In: Wang, Y., Ma, H., Peng, Y., Liu, Y., He, R. (eds) Image and Graphics Technologies and Applications. IGTA 2022. Communications in Computer and Information Science, vol 1611. Springer, Singapore. https://doi.org/10.1007/978-981-19-5096-4_19

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-5096-4_19

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-5095-7

  • Online ISBN: 978-981-19-5096-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics