Abstract
The analysis of massive scientific literature collections can help researchers capture potential knowledge, understand disciplinary insights, and discover opportunities for academic collaboration. However, the huge volume of literature data becomes a major problem for human analysis and computer processing, which prevents scholars from rapidly obtaining disciplinary knowledge. Representation learning methods can effectively handle large-scale text and graph data, which have excelled in literature analysis tasks. In this study, we introduce the types of literature data and several typical academic networks. Then, we review the representation learning models and classify these methods into (a) word representation learning-based, including word2vec, GloVe, ELMo, Bert, etc.; (b) graph representation learning-based, including matrix factorization-based, random walk-based and deep learning-based methods, such as DeepWalk, node2vec, GCN, etc. Finally, we discuss the opportunities and present three challenges of representation learning-based approaches for literature analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhang, C., Li, Z., Zhang, J.: A survey on visualization for scientific literature topics. J. Vis. 21(2), 321–335 (2017). https://doi.org/10.1007/s12650-017-0462-2
Federico, P., Heimerl, F., Koch, S., Miksch, S.: A survey on visual approaches for analyzing scientific literature and patents. IEEE Trans. Vis. Comput. Graph. 23, 2179–2198 (2016)
Onwuegbuzie, A.J., Leech, N.L., Collins, K.M.T.: Qualitative analysis techniques for the review of the literature. Qual. Rep. 17, 56 (2012)
Thilakaratne, M., Falkner, K., Atapattu, T.: A systematic review on literature-based discovery: general overview, methodology, & statistical analysis. ACM Comput. Surv. 52, 1–34 (2019)
Chen, C.: CiteSpace II: detecting and visualizing emerging trends and transient patterns in scientific literature. J. Am. Soc. Inform. Sci. Technol. 57, 359–377 (2006)
Yetisgen-Yildiz, M., Pratt, W.: Using statistical and knowledge-based approaches for literature-based discovery. J. Biomed. Inform. 39, 600–611 (2006)
Chen, F., Wang, Y.C., Wang, B., Kuo, C.C.J.: Graph representation learning: a survey. APSIPA Trans. Signal Inf. Process. 9, e15 (2020)
Gao, J., Li, D., He, X., Wang, Y.Y., Duh, K., Liu, X.: Representation Learning Using Multi-Task Deep Neural Networks. US20170032035A1 (2017)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013)
Jacso, P.: Academic search engines: a quantitative outlook. Online Information Review (2000)
Williams, K., Jian, W., Choudhury, S.R., Khabsa, M., Giles, C.L.: Scholarly big data information extraction and integration in the CiteSeerχ digital library. In: IEEE International Conference on Data Engineering Workshops. IEEE (2017)
London, B., Getoor, L.: Collective classification of network data. Data Classif. Algorithms Appl. 399–416 (2014)
Cecile, C., Antoine, G., Karina, V.S., Mathieu, H., Pierre-Yves, L.T.: The CORA dataset: validation and diagnostics of in-situ ocean temperature and salinity measurements. Ocean Sci. 9(special issue: The MyOcean project: scientific advances for operational ocean monitoring and forecasting), 1–18 (2013)
Annarelli, A., Battistella, C., Nonino, F., Parida, V., Pessot, E.: Literature review on digitalization capabilities: co-citation analysis of antecedents, conceptualization and consequences. Technol. Forecast. Soc. Chang. 166, 120635 (2021)
Hausberg, J.P., Korreck, S.: Business incubators and accelerators: a co-citation analysis-based, systematic literature review. In: Handbook of Research on Business and Technology Incubation and Acceleration (2021)
Liu, S., et al.: Bridging text visualization and mining: a task-driven survey. IEEE Trans. Vis. Comput. Graph. 25, 2482–2504 (2019)
Kevork, E.K., Vrechopoulos, A.P.: CRM literature: conceptual and functional insights by keyword analysis. Mark. Intell. Plan. 1(1), 48–55 (2019)
Siddiqi, S., Sharan, A.: Keyword and keyphrase extraction techniques: a literature review. Int. J. Comput. Appl. 109 (2015)
Gopalakrishnan, V., Jha, K., Xun, G., Ngo, H.Q., Zhang, A.: Towards self-learning based hypotheses generation in biomedical text domain. Bioinformatics 34, 2103–2115 (2018)
Chen, Y., Lv, C., Li, Y., Chen, W., Ma, K.-L.: Ordered matrix representation supporting the visual analysis of associated data. Science China Inf. Sci. 63(8), 1–3 (2020). https://doi.org/10.1007/s11432-019-2647-3
Chen, Y., Sun, M., Wu, C., Sun, X.: Visual associative analysis of big data in food safety: a review. Big Data Res. 7, 61–77 (2021)
Du, X., Chen, Y., Li, Y.: TransGraph: a transformation-based graph for analyzing relations in data set. J. Comput.-Aided Des. Comput. Graph. 30, 79–89 (2018)
Chen, Y.: A survey on visualization approaches for exploring association relationships in graph data. J. Vis. 22, 625–639 (2019)
Radhakrishnan, S., Erbis, S., Isaacs, J.A., Kamarthi, S.: Novel keyword co-occurrence network-based methods to foster systematic reviews of scientific literature. PLoS ONE 12, e0172778 (2017)
Butun, E., Kaya, M.: Predicting citation count of scientists as a link prediction problem. IEEE Trans. Cybern. 50, 4518–4529 (2020)
Choe, K., Jung, S., Park, S., Hong, H., Seo, J.: Papers101: supporting the discovery process in the literature review workflow for novice researchers. In: IEEE Pacific Visualization Symposium, pp. 176–180 (2021)
Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proc. Nat. Acad. Sci. USA 105, 1118–1123 (2008)
Lu, M., Qu, Z., Wang, M., Qin, Z.: Recommending authors and papers based on ACTTM community and bilayer citation network. China Commun. 15, 111–130 (2018)
Heimerl, F., Han, Q., Koch, S., Ertl, T.: CiteRivers: visual analytics of citation patterns. IEEE Trans. Vis. Comput. Graph. 22, 190–199 (2016)
Li, H., An, H., Wang, Y., Huang, J., Gao, X.: Evolutionary features of academic articles co-keyword network and keywords co-occurrence network: based on two-mode affiliation network. Physica A 450, 657–669 (2016)
Benito-Santos, A., Sanchez, R.T.: Cross-domain visual exploration of academic corpora via the latent meaning of user-authored keywords. IEEE Access 7, 98144–98160 (2019)
Abdelaal, M., Heimerl, F., Koch, S.: ColTop: visual topic-based analysis of scientific community structure. In: 2017 International Symposium on Big Data Visual Analytics, BDVA 2017 (2017)
Li, E.Y., Liao, C.H., Yen, H.R.: Co-authorship networks and research impact: a social capital perspective. Res. Policy 42, 1515–1530 (2013)
Park, I., Yoon, B.: Technological opportunity discovery for technological convergence based on the prediction of technology knowledge flow in a citation network. J. Informetr. 12, 1199–1222 (2018)
Érdi, P., et al.: Prediction of emerging technologies based on analysis of the US patent citation network. Scientometrics 95, 225–242 (2013)
Eto, M.: Extended co-citation search: Graph-based document retrieval on a co-citation network containing citation context information. Inf. Process. Manag. 56, 102046 (2019)
Shiau, W.L., Dwivedi, Y.K., Yang, H.S.: Co-citation and cluster analyses of extant literature on social networks. Int. J. Inf. Manag. 37, 390–399 (2017)
Shin, H., Perdue, R.R.: Self-service technology research: a bibliometric co-citation visualization analysis. Int. J. Hosp. Manag. 80, 101–112 (2019)
Verma, S., Bhattacharyya, S.S.: The intellectual core and structure of mergers and acquisitions literature: a co-citation analysis. Int. J. Bus. Innov. Res. 20, 305–336 (2019)
Chen, J., Gong, Z., Wang, W., Wang, C., Liu, W.: Adversarial caching training: unsupervised inductive network representation learning on large-scale graphs. IEEE Trans. Neural Netw. Learn. Syst. 2021, 1–12 (2021)
Wu, S., et al.: Deep learning in clinical natural language processing: a methodical review. J. Am. Med. Inform. Assoc. 27(3), 457–470 (2020)
Gysel, C.V., Rijke, M.D., Kanoulas, E.: Neural vector spaces for unsupervised information retrieval. ACM Trans. Inf. Syst. (TOIS) 36(4), 1–25 (2017)
Bengio, Y., Ducharme, R., Vincent, P.: A neural probabilistic language model. In: Advances in Neural Information Processing Systems, vol. 13, pp. 1–7 (2003)
Mikolov, T., Sutskever, I., Chen, K.: Distributed representations of words and phrases and their compositionality. In: The 26th International Conference on Neural Information Processing Systems, pp. 3111–3119. ACM Press, New York (2013)
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)
Peters, M., Neumann, M., Iyyer, M., Gardner, M., Zettlemoyer, L.: Deep contextualized word representations (2018)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Xu, G., Meng, Y., Qiu, X., Yu, Z., Wu, X.: Sentiment analysis of comment texts based on BiLSTM. IEEE Access 7, 51522–51532 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Tafti, A.P., Wang, Y., Shen, F., Sagheb, E., Kingsbury, P., Liu, H.: Integrating word embedding neural networks with PubMed abstracts to extract keyword proximity of chronic diseases. In: 2019 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI), pp. 1–4 (2019)
Zhang, J., Wan, Y., Hu, Y.: Analyzing sci-tech topics based on semantic representation of patent references. Data Anal. Knowl. Discov. 3, 52–60 (2019)
Sun, X., Chen, N.: Analysis of Patent Science Relevance Based on Representation Learning, vol. 41, pp. 10–18 (2021)
Wang, W., Yao, C., Qiao, Z., Cui, W., Du, Y., Zhou, Y.: Method of discovering interdisciplinary knowledge of the national natural science foundation of China based on word embedding: a case study on artificial intelligence and information management. J. China Soc. Sci. Tech. Inf. 40, 15 (2021)
Kanakia, A., Shen, Z., Eide, D., Wang, K.: A scalable hybrid research paper recommender system for Microsoft academic. In: WWW (2019)
Narechania, A., Karduni, A., Wesslen, R., Wall, E.: VITALITY: promoting serendipitous discovery of academic literature with transformers & visual analytics. IEEE Trans. Visual Comput. Graphics 28, 486–496 (2021)
Tu, Y., Xu, J., Shen, H.W.: KeywordMap: attention-based visual exploration for keyword analysis. In: 2021 IEEE 14th Pacific Visualization Symposium (PacificVis), pp. 206–215 (2021)
Tshitoyan, V., et al.: Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571, 95–98 (2019)
Sun, C., et al.: A deep learning approach with deep contextualized word representations for chemical–protein interaction extraction from biomedical literature. IEEE Access 7, 151034–151046 (2019)
Chen, B., Ding, Y., Ma, F.: Semantic word shifts in a scientific domain. Scientometrics 117(1), 211–226 (2018). https://doi.org/10.1007/s11192-018-2843-2
Yun, J.: Generalization of bibliographic coupling and co-citation using the node split network. arXiv preprint arXiv:2110.15513 (2021)
Hu, A., Chen, H.: Data visualization analysis of knowledge graph application. In: 2021 2nd International Conference on Artificial Intelligence and Information Systems, pp. 1–10 (2021)
Ribeiro, L.F., Saverese, P.H., Figueiredo, D.R.: struc2vec: learning node representations from structural identity. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 385–394 (2017)
Wang, X., Cui, P., Wang, J., Pei, J., Zhu, W., Yang, S.: Community preserving network embedding. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Tu, C., Zhang, W., Liu, Z., Sun, M.: Max-margin deepwalk: discriminative learning of network representation. In: IJCAI, vol. 2016, pp. 3889–3895 (2016)
Zhang, D., Yin, J., Zhu, X., Zhang, C.: Collective classification via discriminative matrix factorization on sparsely labeled networks. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1563–1572 (2016)
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J. Mei, Q.: Line: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077 (2015)
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)
Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1225–1234 (2016)
Gu, Y., Sun, Y., Li, Y., Yang, Y.: Rare: social rank regulated large-scale network embedding. In: Proceedings of the 2018 World Wide Web Conference, pp. 359–368 (2018)
Dong, Y., Chawla, N.V. Swami, A.: metapath2vec: scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 135–144 (2017)
Gallicchio, C., Micheli, A.: Fast and deep graph neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 3898–3905 (2020)
Kipf, T. N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)
Tian, H., Zhuo, H. H.: Paper2vec: citation-context based document distributed representation for scholar recommendation. arXiv preprint arXiv:1703.06587 (2017)
Ganesh, J., Ganguly, S., Gupta, M., Varma, V., Pudi, V.: Author2vec: learning author representations by combining content and link information. In: WWW (Companion Volume) (2016)
Qin, J., Zeng, X., Wu, S., Tang, E.: E-GCN: graph convolution with estimated labels. Appl. Intell. 51(7), 5007–5015 (2021). https://doi.org/10.1007/s10489-020-02093-5
Jeong, C., Jang, S., Park, E., Choi, S.: A context-aware citation recommendation model with BERT and graph convolutional networks. Scientometrics 124(3), 1907–1922 (2020). https://doi.org/10.1007/s11192-020-03561-y
Wu, Y., Wang, B., Cui, Y., Tong, X.: Study on co-citation enhancing directed network embedding. Comput. Sci. 47, 279–284 (2020)
Yadati, N., Nimishakavi, M., Yadav, P., Nitin, V., Louis, A., Talukdar, P.: Hypergcn: a new method for training graph convolutional networks on hypergraphs. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Ganesh, J., Gupta, M., Varma, V.: Doc2Sent2Vec: a novel two-phase approach for learning document representation. In: SIGIR, pp. 809–812 (2016)
Agarwal, V., Joglekar, S., Young, A.P., Sastry, N.: GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates. arXiv preprint arXiv:2202.08175 (2022)
Acknowledgments
This work is supported by the National Natural Science Foundation of China (61972010); National Key R&D program of China (2018YFC1603602).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, Q., Chen, Y. (2022). A Survey of Literature Analysis Methods Based on Representation Learning. In: Wang, Y., Ma, H., Peng, Y., Liu, Y., He, R. (eds) Image and Graphics Technologies and Applications. IGTA 2022. Communications in Computer and Information Science, vol 1611. Springer, Singapore. https://doi.org/10.1007/978-981-19-5096-4_19
Download citation
DOI: https://doi.org/10.1007/978-981-19-5096-4_19
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-5095-7
Online ISBN: 978-981-19-5096-4
eBook Packages: Computer ScienceComputer Science (R0)