Advertisement

Disclosing Citation Meanings for Augmented Research Retrieval and Exploration

  • Roger Ferrod
  • Claudio SchifanellaEmail author
  • Luigi Di Caro
  • Mario Cataldi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11503)

Abstract

In recent years, new digital technologies are being used to support the navigation and the analysis of scientific publications, justified by the increasing number of articles published every year. For this reason, experts make use of on-line systems to browse thousands of articles in search of relevant information. In this paper, we present a new method that automatically assigns meanings to references on the basis of the citation text through a Natural Language Processing pipeline and a slightly-supervised clustering process. The resulting network of semantically-linked articles allows an informed exploration of the research panorama through semantic paths. The proposed approach has been validated using the ACL Anthology Dataset containing several thousands of papers related to the Computational Linguistics field. A manual evaluation on the extracted citation meanings carried to very high levels of accuracy. Finally, a freely-available web-based application has been developed and published on-line.

Keywords

Citation semantics Literature exploration Natural Language Processing 

References

  1. 1.
    Akujuobi, U., Zhang, X.: Delve: a dataset-driven scholarly search and analysis system. SIGKDD Explor. Newsl. 19(2), 36–46 (2017).  https://doi.org/10.1145/3166054.3166059. http://doi.acm.org/10.1145/3166054.3166059CrossRefGoogle Scholar
  2. 2.
    Alexander, E., Kohlmann, J., Valenza, R., Witmore, M., Gleicher, M.: Serendip: topic model-driven visual exploration of text corpora. In: 2014 IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 173–182. IEEE (2014)Google Scholar
  3. 3.
    Bergström, P., Atkinson, D.C.: Augmenting the exploration of digital libraries with web-based visualizations. In: 2009 Fourth International Conference on Digital Information Management, pp. 1–7, November 2009.  https://doi.org/10.1109/ICDIM.2009.5356798
  4. 4.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41(6), 391–407 (1990)CrossRefGoogle Scholar
  5. 5.
    Diederich, J., Balke, W.T., Thaden, U.: Demonstrating the semantic GrowBag: automatically creating topic facets for facetedDBLP. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2007, p. 505. ACM, New York (2007).  https://doi.org/10.1145/1255175.1255305. http://doi.acm.org/10.1145/1255175.1255305
  6. 6.
    Šubelj, L., van Eck, N.J., Waltman, L.: Clustering scientific publications based on citation relations: a systematic comparison of different methods. PLoS ONE 11(4), e0154404 (2016)CrossRefGoogle Scholar
  7. 7.
    van Eck, N.J., Waltman, L.: VOS: a new method for visualizing similarities between objects. In: Decker, R., Lenz, H.-J. (eds.) Advances in Data Analysis. SCDAKO, pp. 299–306. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-70981-7_34CrossRefGoogle Scholar
  8. 8.
    van Eck, N.J., Waltman, L.: CitNetExplorer: a new software tool for analyzing and visualizing citation networks. J. Informetrics 8(4), 802–823 (2014)CrossRefGoogle Scholar
  9. 9.
    van Eck, N.J., Waltman, L.: Citation-based clustering of publications using CitNetExplorer and VOSviewer. Scientometrics 111(2), 1053–1070 (2017)CrossRefGoogle Scholar
  10. 10.
    Kan, M.-Y., Councill, I.G., Giles, C.L.: ParsCit: an open-source CRF reference string parsing package. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2008), Marrakesh, Morrocco, May 2008Google Scholar
  11. 11.
    Kataria, S., Mitra, P., Bhatia, S.: Utilizing context in generative Bayesian models for linked corpus. In: AAAI, vol. 10, p. 1 (2010)Google Scholar
  12. 12.
    Kim, J., Kim, D., Oh, A.: Joint modeling of topics, citations, and topical authority in academic corpora. arXiv preprint arXiv:1706.00593 (2017)
  13. 13.
    Li, H., Councill, I.G., Lee, W.C., Giles, C.L.: CiteSeerx: an architecture and web service design for an academic document search engine. In: WWW (2006)Google Scholar
  14. 14.
    Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations, pp. 55–60 (2014). http://www.aclweb.org/anthology/P/P14/P14-5010
  15. 15.
    McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 169–178. ACM (2000)Google Scholar
  16. 16.
    Ratcliff, J.W., Metzener, D.E.: Pattern matching: the gestalt approach. Dr. Dobb’s J. 13(7), 46, 47, 59–51, 68–72 (July 1988)Google Scholar
  17. 17.
    Mutschke, P.: Mining networks and central entities in digital libraries. A graph theoretic approach applied to co-author networks. In: Berthold, M.R., Lenz, H.-J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810, pp. 155–166. Springer, Heidelberg (2003).  https://doi.org/10.1007/978-3-540-45231-7_15CrossRefGoogle Scholar
  18. 18.
    Nagwani, N.: Summarizing large text collection using topic modeling and clustering based on mapreduce framework. J. Big Data 2(1), 6 (2015)CrossRefGoogle Scholar
  19. 19.
    Newman, M.E.: Scientific collaboration networks. I. Network construction and fundamental results. Phys. Rev. E 64(1), 016131 (2001)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Oelke, D., Strobelt, H., Rohrdantz, C., Gurevych, I., Deussen, O.: Comparative exploration of document collections: a visual analytics approach. In: Computer Graphics Forum, vol. 33, pp. 201–210. Wiley Online Library (2014)Google Scholar
  21. 21.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report, Stanford InfoLab, November 1999. http://ilpubs.stanford.edu:8090/422/, previous number = SIDL-WP-1999-0120
  22. 22.
    Popescul, A., Ungar, L.H., Flake, G.W., Lawrence, S., Giles, C.L.: Clustering and identifying temporal trends in document databases. In: ADL, p. 173. IEEE (2000)Google Scholar
  23. 23.
    Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., Steyvers, M.: Learning author-topic models from text corpora. ACM Trans. Inf. Syst. (TOIS) 28(1), 4 (2010)CrossRefGoogle Scholar
  24. 24.
    Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press (2004)Google Scholar
  25. 25.
    Shotton, S.P.D.: FaBiO and CiTO: ontologies for describing bibliographic resources and citations. Web Semant. Sci. Serv. Agents World Wide Web 17, 33–43 (2012)CrossRefGoogle Scholar
  26. 26.
    Strapparava, C., Mihalcea, R., Corley, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI 2006 Proceedings of the 21st National Conference on Artificial Intelligence, vol. 1, pp. 775–780 (2006)Google Scholar
  27. 27.
    Tu, Y., Johri, N., Roth, D., Hockenmaier, J.: Citation author topic model in expert search. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 1265–1273. Association for Computational Linguistics (2010)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  • Roger Ferrod
    • 1
  • Claudio Schifanella
    • 1
    Email author
  • Luigi Di Caro
    • 1
  • Mario Cataldi
    • 2
  1. 1.Department of Computer ScienceUniversity of TurinTurinItaly
  2. 2.Department of Computer ScienceUniversity of Paris 8Saint-DenisFrance

Personalised recommendations