, Volume 101, Issue 2, pp 1233–1252 | Cite as

Citation impact prediction for scientific papers using stepwise regression analysis

  • Tian Yu
  • Guang Yu
  • Peng-Yu Li
  • Liang Wang


Researchers typically pay greater attention to scientific papers published within the last 2 years, and especially papers that may have great citation impact in the future. However, the accuracy of current citation impact prediction methods is still not satisfactory. This paper argues that objective features of scientific papers can make citation impact prediction relatively accurate. The external features of a paper, features of authors, features of the journal of publication, and features of citations are all considered in constructing a paper’s feature space. The stepwise multiple regression analysis is used to select appropriate features from the space and to build a regression model for explaining the relationship between citation impact and the chosen features. The validity of this model is also experimentally verified in the subject area of Information Science & Library Science. The results show that the regression model is effective within this subject.


Scientific paper Citation impact prediction Feature space Multiple regression 



This work was supported by the National Natural Science Foundation of China (Grant No. 70973031).


  1. Aksnes, D. W. (2003). Characteristics of highly cited papers. Research Evaluation, 12(3), 159–170.CrossRefGoogle Scholar
  2. Borsuk, R. M., Budden, A. E., Leimu, R., Aarssen, L. W., & Lortie, C. J. (2009). The influence of author gender, national language and number of authors on citation rate in ecology. Open Ecology Journal, 2, 25–28.CrossRefGoogle Scholar
  3. Boyack, K. W., & Klavans, R. (2011). Multiple dimensions of journal specificity: Why journals can’t be assigned to disciplines. In E. Noyons, P. Ngulube, & J. Leta (Eds.), The 13th conference of the international society for scientometrics and informetrics (Vol. I, pp. 123–133). Durban: ISSI, Leiden University and the University of Zululand.Google Scholar
  4. Burrell, Q. L. (2001). Stochastic modelling of the first-citation distribution. Scientometrics, 52, 3–12.CrossRefGoogle Scholar
  5. Burrell, Q. L. (2003). Predicting future citation behavior. Journal of the American Society for Information Science and Technology, 54(5), 372–378.CrossRefGoogle Scholar
  6. Danell, R. (2011). Can the quality of scientific work be predicted using information on the author’s track record? Journal of the American Society for Information Science and Technology, 62(1), 50–60.CrossRefGoogle Scholar
  7. Didegah, F., & Thelwall, M. (2013). Determinants of research citation impact in nanoscience and nanotechnology. Journal of the American Society for Information Science and Technology, 64(5), 1055–1064.CrossRefGoogle Scholar
  8. Feitelson, D., & Yovel, U. (2004). Predictive ranking of computer scientists using CiteSeer data. Journal of Documentation, 60(1), 44–61.CrossRefGoogle Scholar
  9. Fu, L. D., & Aliferis, C. F. (2010). Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature. Scientometrics, 85(1), 257–270.CrossRefGoogle Scholar
  10. Fu, L. D., Aphinyanaphongs, Y., & Aliferis, C. F. (2013). Computer models for identifying instrumental citations in the biomedical literature. Scientometrics, 97(3), 871–882.CrossRefGoogle Scholar
  11. Garfield, E. (1979). Citation indexing: Its theory and application in science, technology and humanities. New York: Wiley.Google Scholar
  12. Gazni, A., & Didegah, F. (2010). Investigating different types of research collaboration and citation impact: A case study of Harvard University’s publications. Scientometrics, 87(2), 251–265.CrossRefGoogle Scholar
  13. Gibbons, M. R. (1982). Multivariate tests of financial models: A new approach. Journal of Financial Economics, 10(1), 3–27.CrossRefGoogle Scholar
  14. Glänzel, W., Schlemmer, B., & Thijs, B. (2003). Better later than never? On the chance to become highly cited only beyond the standard bibliometric time horizon. Scientometrics, 58(3), 571–586.CrossRefGoogle Scholar
  15. Glänzel, W., & Schubert, A. (1995). Predictive aspects of a stochastic model for citation processes. Information Processing and Management, 31(1), 69–80.CrossRefGoogle Scholar
  16. Hargens, L. L., & Schuman, H. (1990). Citation counts and social comparisons: Scientists’ use and evaluation of citation index data. Social Science Research, 19(3), 205–221.CrossRefGoogle Scholar
  17. Kleinbaum, D. G., Kupper, L. L., Muller, K. E., & Nizam, A. (1998). Applied regression analysis and other multivariable methods. Pacific Grove: Brooks/Cole Publishing Company.Google Scholar
  18. Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 2(12), 1137–1143.Google Scholar
  19. Leimu, R., & Koricheva, J. (2005). Does scientific collaboration increase the impact of ecological articles? BioScience, 55(5), 438–443.CrossRefGoogle Scholar
  20. Leydesdorff, L. (2012). Alternatives to the journal impact factor: I3 and the top-10 % (or top-25 %?) of the most-highly cited papers. Scientometrics, 92(2), 355–365.CrossRefGoogle Scholar
  21. Leydesdorff, L., & Bornmann, L. (2011). Integrated impact indicators (I3) compared with impact factors (IFs): An alternative design with policy implications. Journal of the American Society for Information Science and Technology, 62(7), 1370–1381.CrossRefGoogle Scholar
  22. Merton, R. K. (1968). The Matthew effect in science. Science, 159, 56–63.CrossRefGoogle Scholar
  23. Moed, H. F. (2010). Measuring contextual citation impact of scientific journals. Journal of Informetrics, 4(3), 265–277.CrossRefGoogle Scholar
  24. Peñas, C. S., & Willett, P. (2006). Brief communication: Gender differences in publication and citation counts in librarianship and information science research. Journal of Information Science, 32(5), 480–485.CrossRefGoogle Scholar
  25. Portes, A. (1998). Social capital: Its origins and applications in modern sociology. Annual Review of Sociology, 24, 1–24.CrossRefGoogle Scholar
  26. Prpić, K. (2002). Gender and productivity differentials in science. Scientometrics, 55(1), 27–58.CrossRefGoogle Scholar
  27. Radicchi, F., & Castellano, C. (2012). Testing the fairness of citation indicators for comparison across scientific domains: The case of fractional citation counts. Journal of Informetrics, 6(1), 121–130.CrossRefGoogle Scholar
  28. Radicchi, F., Fortunato, S., & Castellano, C. (2008). Universality of citation distributions: Toward an objective measure of scientific impact. PNAS, 105(45), 17268–17272.CrossRefGoogle Scholar
  29. Sin, S. C. J. (2011). International coauthorship and citation impact: A bibliometric study of six LIS journals, 1980–2008. Journal of the American Society for Information Science and Technology, 62(9), 1770–1783.CrossRefGoogle Scholar
  30. Stewart, J. A. (1983). Achievement and ascriptive processes in the recognition of scientific articles. Social Forces, 62(1), 166–189.CrossRefGoogle Scholar
  31. Van Dalen, H. P., & Henkens, K. (1999). How influential are demography journals? Population and Development Review, 25(2), 229–251.CrossRefGoogle Scholar
  32. Van Dalen, H. P., & Henkens, K. (2001). What makes a scientific article influential? The case of demographers. Scientometrics, 50(3), 455–482.CrossRefGoogle Scholar
  33. Van Dalen, H. P., & Henkens, K. (2005). Signals in science-on the importance of signaling in gaining attention in science. Scientometrics, 64(2), 209–233.CrossRefGoogle Scholar
  34. Wang, M. Y., Yu, G., An, S., & Yu, D. R. (2012). Discovery of factors influencing citation impact based on a soft fuzzy rough set model. Scientometrics, 93(3), 635–644.CrossRefGoogle Scholar
  35. Wang, M. Y., Yu, G., & Yu, D. R. (2011). Mining typical features for highly cited papers. Scientometrics, 87(3), 695–706.CrossRefGoogle Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2014

Authors and Affiliations

  1. 1.School of ManagementHarbin Institute of TechnologyHarbinPeople’s Republic of China
  2. 2.School of EducationHarbin Institute of TechnologyHarbinPeople’s Republic of China

Personalised recommendations