A Survey of Literature Analysis Methods Based on Representation Learning

Zhang, Qinghui; Chen, Yi

doi:10.1007/978-981-19-5096-4_19

Qinghui Zhang¹⁰ &
Yi Chen¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1611))

Included in the following conference series:

Chinese Conference on Image and Graphics Technologies

513 Accesses

Abstract

The analysis of massive scientific literature collections can help researchers capture potential knowledge, understand disciplinary insights, and discover opportunities for academic collaboration. However, the huge volume of literature data becomes a major problem for human analysis and computer processing, which prevents scholars from rapidly obtaining disciplinary knowledge. Representation learning methods can effectively handle large-scale text and graph data, which have excelled in literature analysis tasks. In this study, we introduce the types of literature data and several typical academic networks. Then, we review the representation learning models and classify these methods into (a) word representation learning-based, including word2vec, GloVe, ELMo, Bert, etc.; (b) graph representation learning-based, including matrix factorization-based, random walk-based and deep learning-based methods, such as DeepWalk, node2vec, GCN, etc. Finally, we discuss the opportunities and present three challenges of representation learning-based approaches for literature analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zhang, C., Li, Z., Zhang, J.: A survey on visualization for scientific literature topics. J. Vis. 21(2), 321–335 (2017). https://doi.org/10.1007/s12650-017-0462-2
Article Google Scholar
Federico, P., Heimerl, F., Koch, S., Miksch, S.: A survey on visual approaches for analyzing scientific literature and patents. IEEE Trans. Vis. Comput. Graph. 23, 2179–2198 (2016)
Article Google Scholar
Onwuegbuzie, A.J., Leech, N.L., Collins, K.M.T.: Qualitative analysis techniques for the review of the literature. Qual. Rep. 17, 56 (2012)
Google Scholar
Thilakaratne, M., Falkner, K., Atapattu, T.: A systematic review on literature-based discovery: general overview, methodology, & statistical analysis. ACM Comput. Surv. 52, 1–34 (2019)
Article Google Scholar
Chen, C.: CiteSpace II: detecting and visualizing emerging trends and transient patterns in scientific literature. J. Am. Soc. Inform. Sci. Technol. 57, 359–377 (2006)
Article Google Scholar
Yetisgen-Yildiz, M., Pratt, W.: Using statistical and knowledge-based approaches for literature-based discovery. J. Biomed. Inform. 39, 600–611 (2006)
Article Google Scholar
Chen, F., Wang, Y.C., Wang, B., Kuo, C.C.J.: Graph representation learning: a survey. APSIPA Trans. Signal Inf. Process. 9, e15 (2020)
Google Scholar
Gao, J., Li, D., He, X., Wang, Y.Y., Duh, K., Liu, X.: Representation Learning Using Multi-Task Deep Neural Networks. US20170032035A1 (2017)
Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013)
Article Google Scholar
Jacso, P.: Academic search engines: a quantitative outlook. Online Information Review (2000)
Google Scholar
Williams, K., Jian, W., Choudhury, S.R., Khabsa, M., Giles, C.L.: Scholarly big data information extraction and integration in the CiteSeerχ digital library. In: IEEE International Conference on Data Engineering Workshops. IEEE (2017)
Google Scholar
London, B., Getoor, L.: Collective classification of network data. Data Classif. Algorithms Appl. 399–416 (2014)
Google Scholar
Cecile, C., Antoine, G., Karina, V.S., Mathieu, H., Pierre-Yves, L.T.: The CORA dataset: validation and diagnostics of in-situ ocean temperature and salinity measurements. Ocean Sci. 9(special issue: The MyOcean project: scientific advances for operational ocean monitoring and forecasting), 1–18 (2013)
Google Scholar
Annarelli, A., Battistella, C., Nonino, F., Parida, V., Pessot, E.: Literature review on digitalization capabilities: co-citation analysis of antecedents, conceptualization and consequences. Technol. Forecast. Soc. Chang. 166, 120635 (2021)
Article Google Scholar
Hausberg, J.P., Korreck, S.: Business incubators and accelerators: a co-citation analysis-based, systematic literature review. In: Handbook of Research on Business and Technology Incubation and Acceleration (2021)
Google Scholar
Liu, S., et al.: Bridging text visualization and mining: a task-driven survey. IEEE Trans. Vis. Comput. Graph. 25, 2482–2504 (2019)
Article Google Scholar
Kevork, E.K., Vrechopoulos, A.P.: CRM literature: conceptual and functional insights by keyword analysis. Mark. Intell. Plan. 1(1), 48–55 (2019)
Google Scholar
Siddiqi, S., Sharan, A.: Keyword and keyphrase extraction techniques: a literature review. Int. J. Comput. Appl. 109 (2015)
Google Scholar
Gopalakrishnan, V., Jha, K., Xun, G., Ngo, H.Q., Zhang, A.: Towards self-learning based hypotheses generation in biomedical text domain. Bioinformatics 34, 2103–2115 (2018)
Article Google Scholar
Chen, Y., Lv, C., Li, Y., Chen, W., Ma, K.-L.: Ordered matrix representation supporting the visual analysis of associated data. Science China Inf. Sci. 63(8), 1–3 (2020). https://doi.org/10.1007/s11432-019-2647-3
Article Google Scholar
Chen, Y., Sun, M., Wu, C., Sun, X.: Visual associative analysis of big data in food safety: a review. Big Data Res. 7, 61–77 (2021)
Google Scholar
Du, X., Chen, Y., Li, Y.: TransGraph: a transformation-based graph for analyzing relations in data set. J. Comput.-Aided Des. Comput. Graph. 30, 79–89 (2018)
Google Scholar
Chen, Y.: A survey on visualization approaches for exploring association relationships in graph data. J. Vis. 22, 625–639 (2019)
Article Google Scholar
Radhakrishnan, S., Erbis, S., Isaacs, J.A., Kamarthi, S.: Novel keyword co-occurrence network-based methods to foster systematic reviews of scientific literature. PLoS ONE 12, e0172778 (2017)
Article Google Scholar
Butun, E., Kaya, M.: Predicting citation count of scientists as a link prediction problem. IEEE Trans. Cybern. 50, 4518–4529 (2020)
Article Google Scholar
Choe, K., Jung, S., Park, S., Hong, H., Seo, J.: Papers101: supporting the discovery process in the literature review workflow for novice researchers. In: IEEE Pacific Visualization Symposium, pp. 176–180 (2021)
Google Scholar
Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proc. Nat. Acad. Sci. USA 105, 1118–1123 (2008)
Article Google Scholar
Lu, M., Qu, Z., Wang, M., Qin, Z.: Recommending authors and papers based on ACTTM community and bilayer citation network. China Commun. 15, 111–130 (2018)
Article Google Scholar
Heimerl, F., Han, Q., Koch, S., Ertl, T.: CiteRivers: visual analytics of citation patterns. IEEE Trans. Vis. Comput. Graph. 22, 190–199 (2016)
Article Google Scholar
Li, H., An, H., Wang, Y., Huang, J., Gao, X.: Evolutionary features of academic articles co-keyword network and keywords co-occurrence network: based on two-mode affiliation network. Physica A 450, 657–669 (2016)
Article Google Scholar
Benito-Santos, A., Sanchez, R.T.: Cross-domain visual exploration of academic corpora via the latent meaning of user-authored keywords. IEEE Access 7, 98144–98160 (2019)
Article Google Scholar
Abdelaal, M., Heimerl, F., Koch, S.: ColTop: visual topic-based analysis of scientific community structure. In: 2017 International Symposium on Big Data Visual Analytics, BDVA 2017 (2017)
Google Scholar
Li, E.Y., Liao, C.H., Yen, H.R.: Co-authorship networks and research impact: a social capital perspective. Res. Policy 42, 1515–1530 (2013)
Article Google Scholar
Park, I., Yoon, B.: Technological opportunity discovery for technological convergence based on the prediction of technology knowledge flow in a citation network. J. Informetr. 12, 1199–1222 (2018)
Article Google Scholar
Érdi, P., et al.: Prediction of emerging technologies based on analysis of the US patent citation network. Scientometrics 95, 225–242 (2013)
Article Google Scholar
Eto, M.: Extended co-citation search: Graph-based document retrieval on a co-citation network containing citation context information. Inf. Process. Manag. 56, 102046 (2019)
Article Google Scholar
Shiau, W.L., Dwivedi, Y.K., Yang, H.S.: Co-citation and cluster analyses of extant literature on social networks. Int. J. Inf. Manag. 37, 390–399 (2017)
Article Google Scholar
Shin, H., Perdue, R.R.: Self-service technology research: a bibliometric co-citation visualization analysis. Int. J. Hosp. Manag. 80, 101–112 (2019)
Article Google Scholar
Verma, S., Bhattacharyya, S.S.: The intellectual core and structure of mergers and acquisitions literature: a co-citation analysis. Int. J. Bus. Innov. Res. 20, 305–336 (2019)
Article Google Scholar
Chen, J., Gong, Z., Wang, W., Wang, C., Liu, W.: Adversarial caching training: unsupervised inductive network representation learning on large-scale graphs. IEEE Trans. Neural Netw. Learn. Syst. 2021, 1–12 (2021)
Google Scholar
Wu, S., et al.: Deep learning in clinical natural language processing: a methodical review. J. Am. Med. Inform. Assoc. 27(3), 457–470 (2020)
Article Google Scholar
Gysel, C.V., Rijke, M.D., Kanoulas, E.: Neural vector spaces for unsupervised information retrieval. ACM Trans. Inf. Syst. (TOIS) 36(4), 1–25 (2017)
Article Google Scholar
Bengio, Y., Ducharme, R., Vincent, P.: A neural probabilistic language model. In: Advances in Neural Information Processing Systems, vol. 13, pp. 1–7 (2003)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K.: Distributed representations of words and phrases and their compositionality. In: The 26th International Conference on Neural Information Processing Systems, pp. 3111–3119. ACM Press, New York (2013)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)
Google Scholar
Peters, M., Neumann, M., Iyyer, M., Gardner, M., Zettlemoyer, L.: Deep contextualized word representations (2018)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Xu, G., Meng, Y., Qiu, X., Yu, Z., Wu, X.: Sentiment analysis of comment texts based on BiLSTM. IEEE Access 7, 51522–51532 (2019)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Tafti, A.P., Wang, Y., Shen, F., Sagheb, E., Kingsbury, P., Liu, H.: Integrating word embedding neural networks with PubMed abstracts to extract keyword proximity of chronic diseases. In: 2019 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI), pp. 1–4 (2019)
Google Scholar
Zhang, J., Wan, Y., Hu, Y.: Analyzing sci-tech topics based on semantic representation of patent references. Data Anal. Knowl. Discov. 3, 52–60 (2019)
Google Scholar
Sun, X., Chen, N.: Analysis of Patent Science Relevance Based on Representation Learning, vol. 41, pp. 10–18 (2021)
Google Scholar
Wang, W., Yao, C., Qiao, Z., Cui, W., Du, Y., Zhou, Y.: Method of discovering interdisciplinary knowledge of the national natural science foundation of China based on word embedding: a case study on artificial intelligence and information management. J. China Soc. Sci. Tech. Inf. 40, 15 (2021)
Google Scholar
Kanakia, A., Shen, Z., Eide, D., Wang, K.: A scalable hybrid research paper recommender system for Microsoft academic. In: WWW (2019)
Google Scholar
Narechania, A., Karduni, A., Wesslen, R., Wall, E.: VITALITY: promoting serendipitous discovery of academic literature with transformers & visual analytics. IEEE Trans. Visual Comput. Graphics 28, 486–496 (2021)
Article Google Scholar
Tu, Y., Xu, J., Shen, H.W.: KeywordMap: attention-based visual exploration for keyword analysis. In: 2021 IEEE 14th Pacific Visualization Symposium (PacificVis), pp. 206–215 (2021)
Google Scholar
Tshitoyan, V., et al.: Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571, 95–98 (2019)
Article Google Scholar
Sun, C., et al.: A deep learning approach with deep contextualized word representations for chemical–protein interaction extraction from biomedical literature. IEEE Access 7, 151034–151046 (2019)
Article Google Scholar
Chen, B., Ding, Y., Ma, F.: Semantic word shifts in a scientific domain. Scientometrics 117(1), 211–226 (2018). https://doi.org/10.1007/s11192-018-2843-2
Article Google Scholar
Yun, J.: Generalization of bibliographic coupling and co-citation using the node split network. arXiv preprint arXiv:2110.15513 (2021)
Hu, A., Chen, H.: Data visualization analysis of knowledge graph application. In: 2021 2nd International Conference on Artificial Intelligence and Information Systems, pp. 1–10 (2021)
Google Scholar
Ribeiro, L.F., Saverese, P.H., Figueiredo, D.R.: struc2vec: learning node representations from structural identity. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 385–394 (2017)
Google Scholar
Wang, X., Cui, P., Wang, J., Pei, J., Zhu, W., Yang, S.: Community preserving network embedding. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Tu, C., Zhang, W., Liu, Z., Sun, M.: Max-margin deepwalk: discriminative learning of network representation. In: IJCAI, vol. 2016, pp. 3889–3895 (2016)
Google Scholar
Zhang, D., Yin, J., Zhu, X., Zhang, C.: Collective classification via discriminative matrix factorization on sparsely labeled networks. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1563–1572 (2016)
Google Scholar
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)
Google Scholar
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J. Mei, Q.: Line: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077 (2015)
Google Scholar
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)
Google Scholar
Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1225–1234 (2016)
Google Scholar
Gu, Y., Sun, Y., Li, Y., Yang, Y.: Rare: social rank regulated large-scale network embedding. In: Proceedings of the 2018 World Wide Web Conference, pp. 359–368 (2018)
Google Scholar
Dong, Y., Chawla, N.V. Swami, A.: metapath2vec: scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 135–144 (2017)
Google Scholar
Gallicchio, C., Micheli, A.: Fast and deep graph neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 3898–3905 (2020)
Google Scholar
Kipf, T. N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)
Tian, H., Zhuo, H. H.: Paper2vec: citation-context based document distributed representation for scholar recommendation. arXiv preprint arXiv:1703.06587 (2017)
Ganesh, J., Ganguly, S., Gupta, M., Varma, V., Pudi, V.: Author2vec: learning author representations by combining content and link information. In: WWW (Companion Volume) (2016)
Google Scholar
Qin, J., Zeng, X., Wu, S., Tang, E.: E-GCN: graph convolution with estimated labels. Appl. Intell. 51(7), 5007–5015 (2021). https://doi.org/10.1007/s10489-020-02093-5
Article Google Scholar
Jeong, C., Jang, S., Park, E., Choi, S.: A context-aware citation recommendation model with BERT and graph convolutional networks. Scientometrics 124(3), 1907–1922 (2020). https://doi.org/10.1007/s11192-020-03561-y
Article Google Scholar
Wu, Y., Wang, B., Cui, Y., Tong, X.: Study on co-citation enhancing directed network embedding. Comput. Sci. 47, 279–284 (2020)
Google Scholar
Yadati, N., Nimishakavi, M., Yadav, P., Nitin, V., Louis, A., Talukdar, P.: Hypergcn: a new method for training graph convolutional networks on hypergraphs. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Ganesh, J., Gupta, M., Varma, V.: Doc2Sent2Vec: a novel two-phase approach for learning document representation. In: SIGIR, pp. 809–812 (2016)
Google Scholar
Agarwal, V., Joglekar, S., Young, A.P., Sastry, N.: GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates. arXiv preprint arXiv:2202.08175 (2022)

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (61972010); National Key R&D program of China (2018YFC1603602).

Author information

Authors and Affiliations

Beijing Key Laboratory of Big Data Technology for Food Safety, School of Computer Science and Engineering, Beijing Technology and Business University, Beijing, 100048, China
Qinghui Zhang & Yi Chen

Authors

Qinghui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Chen .

Editor information

Editors and Affiliations

Beijing Institute of Technology, Beijing, China
Yongtian Wang
University of Science and Technology Beijing, Beijing, China
Huimin Ma
Peking University, Beijing, China
Yuxin Peng
Beijing Institute of Technology, Beijing, China
Yue Liu
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Ran He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Q., Chen, Y. (2022). A Survey of Literature Analysis Methods Based on Representation Learning. In: Wang, Y., Ma, H., Peng, Y., Liu, Y., He, R. (eds) Image and Graphics Technologies and Applications. IGTA 2022. Communications in Computer and Information Science, vol 1611. Springer, Singapore. https://doi.org/10.1007/978-981-19-5096-4_19

Download citation

DOI: https://doi.org/10.1007/978-981-19-5096-4_19
Published: 22 July 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-5095-7
Online ISBN: 978-981-19-5096-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics