Using Full-Text of Research Articles to Analyze Academic Impact of Algorithms

  • Yuzhuo Wang
  • Chengzhi ZhangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10766)


Top-10 algorithms in data mining voted by experts were widely used in various domains. How about the academic impact of these algorithms in a special domain, e.g. Natural Language Processing (NLP)? To answer this question, this paper uses full-text corpus of research articles published in ACL conference to explore influence of the Top-10 data mining algorithms in NLP domain. Academic influence of algorithms is analyzed according to three aspects: number of papers which mention algorithm, mention frequency, and mention location of algorithm. What’s more, we find the most popular algorithm in a particular task via correlation coefficient between algorithm and task. This research offers a new way for evaluating influence of algorithms quantitatively. Results show that there are obvious differences of influences among algorithms. Specifically, impact of SVM algorithm is significantly higher than the other algorithms. Moreover, the most related task resolved by each algorithm is different.


Influence of algorithm Full-text content Citation features 



This work is supported by Major Projects of National Social Science Fund (No. 17ZDA291) and Qing Lan Project.


  1. 1.
    Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)CrossRefGoogle Scholar
  2. 2.
    Mitkov, R.: The Oxford Handbook of Computational Linguistics. Foreign Language and Res. Oxford University Press, Oxford (2012)CrossRefGoogle Scholar
  3. 3.
    Ding, Y., Liu, X., Guo, C., et al.: The distribution of references across texts: some implications for citation analysis. J. Informetr. 7(3), 583–592 (2013)CrossRefGoogle Scholar
  4. 4.
    Wan, X., Liu, F.: WL-index: leveraging citation mention number to quantify an individual’s scientific impact. J. Assoc. Inf. Sci. Technol. 65(12), 2509–2517 (2014)CrossRefGoogle Scholar
  5. 5.
    An, J.Y., Kim, N., Kan, M.Y., et al.: Exploring characteristics of highly cited. J. Assoc. Inf. Sci. Technol. 68(8), 1975–1988 (2017)CrossRefGoogle Scholar
  6. 6.
    Pearson, K.: On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos. Mag. Ser. 50(5), 157–175 (1900)CrossRefzbMATHGoogle Scholar
  7. 7.
    McCain, K., Turner, K.: Citation context analysis and aging patterns of journal articles in molecular genetics. Scientometrics 17(1–2), 127–163 (1989)CrossRefGoogle Scholar
  8. 8.
    Maričić, S., Spaventi, J., Pavičić, L., et al.: Citation context versus the frequency counts of citation histories. J. Assoc. Inf. Sci. Technol. 49(6), 530–540 (1998)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Information ManagementNanjing University of Science and TechnologyNanjingChina

Personalised recommendations