Mining the Context of Citations in Scientific Publications

  • Saeed-Ul HassanEmail author
  • Sehrish Iqbal
  • Mubashir Imran
  • Naif Radi Aljohani
  • Raheel Nawaz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11279)


Recent advancements in information retrieval systems significantly rely on the context-based features and semantic matching techniques to provide relevant information to users from ever-growing digital libraries. Scientific communities seek to understand the implications of research, its importance and its applicability for future research directions. To mine this information, absolute citations merely fail to measure the importance of scientific literature, as a citation may have a specific context in full text. Thus, a comprehensive contextual understanding of cited references is necessary. For this purpose, numerous techniques have been proposed that tap the power of artificial intelligence models to detect important or incidental (non-important) citations in full text scholarly publications. In this paper, we compare and build upon on four state-of-the-art models that detect important citations using 450 manually annotated citations by experts - randomly selected from 20,527 papers from the Association for Computational Linguistics corpus. Of the total 64 unique features proposed by the four selected state-of-the-art models, the top 29 were chosen using the Extra-Trees classifier. These were then fed it to our supervised machine learning based models: Random Forest (RF) and Support Vector Machine. The RF model outperforms existing selected systems by more than 10%, with 89% precision-recall curve. Finally, we qualitatively assessed important and non-important citations by employing and self-organizing maps. Overall, our research work supports information retrieval algorithms that detect and fetch scientific articles on the basis of both qualitative and quantitative indices in scholarly big data.


Citation context analysis Influential citations Machine learning Self-organizing maps 


  1. 1.
    Abu-Jbara, A., Ezra, J., Radev, D.: Purpose and polarity of citation: towards NLP-based bibliometrics. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 596–606 (2013)Google Scholar
  2. 2.
    Moravcsik, M.J., Murugesan, P.: Some results on the function and quality of citations. Soc. Stud. Sci. 5(1), 86–92 (1975)CrossRefGoogle Scholar
  3. 3.
    Teufel, S., Siddharthan, A., Tidhar, D.: Automatic classification of citation function. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 103–110. Association for Computational Linguistics (2006)Google Scholar
  4. 4.
    Valenzuela, M., Ha, V., Etzioni, O.: Identifying meaningful citations. In: AAAI Workshop: Scholarly Big Data (2015)Google Scholar
  5. 5.
    Hassan, S.U., Safder, I., Akram, A., Kamiran, F.: A novel machine-learning approach to measuring scientific knowledge flows using citation context analysis. Scientometrics 116(2), 973–996 (2018)CrossRefGoogle Scholar
  6. 6.
    Hassan, S.U., Imran, M., Iftikhar, T., Safder, I., Shabbir, M.: Deep stylometry and lexical & syntactic features based author attribution on PLoS digital repository. In: Choemprayong, S., Crestani, F., Cunningham, S. (eds.) Digital Libraries: Data, Information, and Knowledge for Digital Lives. LNCS, vol. 10647, pp. 119–127. Springer, Cham (2017). Scholar
  7. 7.
    Zhu, X., Turney, P., Lemire, D., Vellino, A.: Measuring academic influence: not all citations are equal. J. Assoc. Inf. Sci. Technol. 66(2), 408–427 (2015)CrossRefGoogle Scholar
  8. 8.
    Cohan, A., Goharian, N.: Scientific document summarization via citation contextualization and scientific discourse. Int. J. Digit. Libr. 19(2-3), 287–303 (2018)CrossRefGoogle Scholar
  9. 9.
    Bornmann, L., Haunschild, R., Hug, S.E.: Visualizing the context of citations referencing papers published by Eugene Garfield: a new type of keyword co-occurrence analysis. Scientometrics 114(2), 427–437 (2018)CrossRefGoogle Scholar
  10. 10.
    Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)CrossRefGoogle Scholar
  11. 11.
    Auria, L., Moro, R.A.: Support vector machines (SVM) as a technique for solvency analysis. Technical report, Deutsche Bundesbank, Hannover; German Institute for Economic Research, Berlin (2008)Google Scholar
  12. 12.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  13. 13.
    Cao, H., Naito, T., Ninomiya, Y.: Approximate RBF kernel SVM and its applications in pedestrian classification. In: The 1st International Workshop on Machine Learning for Vision-based Motion Analysis-MLVMA (2008)Google Scholar
  14. 14.
    Hassan, S.U., Akram, A., Haddawy, P.: Identifying important citations using contextual information from full text. In: ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 1–8. IEEE (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Information Technology UniversityLahorePakistan
  2. 2.The University of QueenslandSt LuciaAustralia
  3. 3.King Abdulaziz UniversityJeddahSaudi Arabia
  4. 4.Manchester Metropolitan UniversityManchesterUK

Personalised recommendations