, Volume 114, Issue 3, pp 993–1010 | Cite as

Predicting scientific impact based on h-index

  • Samreen Ayaz
  • Nayyer Masood
  • Muhammad Arshad Islam


Predicting the future impact of a scientist/researcher is a critical task. The objective of this work is to evaluate different h-index prediction models for the field of Computer Science. Different combinations of parameters have been identified to build the model and applied on a large data set taken from Arnetminer comprised of almost 1.8 million authors and 2.1 million publications’ record of Computer Science. Machine learning prediction technique, regression, is used to find the best set of parameters suitable for h-index prediction for the scientists from all career ages, without enforcing any constraint on their current h-index values with R 2 as a metric to measure the accuracy. Further, these parameters are evaluated for different career ages and different thresholds for h-index values. Prediction results for 1 year are really good, having R 2 0.93 but for 5 years R 2 declines to 0.82 on average. Hence inferred that prediction of h-index is difficult for longer periods. Predictions for the researchers having 1 year experience are not precise, having R 2 0.60 for 1 year and 0.33 for 5 years. Considering scientists of different career ages, average R 2 values for researchers having 20–36 years of experience were 0.99. For the researches having different h-index values, researchers having low h-index were difficult to predict. Parameters set comprising of current h-index, average citations per paper, number of coauthors, years since publishing first article, number of publications, number of impact factor publications, and number of publications in distinct journals performed better than all other combinations.


h-Index prediction Regression Career age R2 

Supplementary material

11192_2017_2618_MOESM1_ESM.docx (71 kb)
Supplementary material 1 (DOCX 70 kb)


  1. Acuna, D. E., Allesina, S., & Kording, K. P. (2012). Future impact: Predicting scientific success. Nature, 489(7415), 201–202.CrossRefGoogle Scholar
  2. Acuna, D. E., & Penner, O. (2013). Point/counterpoint. Medical Physics, 40, 110601.CrossRefGoogle Scholar
  3. Amjad, T., Ding, Y., Xu, J., Zhang, C., Daud, A., Tang, J., et al. (2017). Standing on the shoulders of giants. Journal of Informetrics, 11(1), 307–323.CrossRefGoogle Scholar
  4. Aoun, S. G., Bendok, B. R., Rahme, R. J., Dacey, R. G., & Batjer, H. H. (2013). Standardizing the evaluation of scientific and academic performance in neurosurgery—Critical review of the “h” index and its variants. World Neurosurgery, 80(5), e85–e90.CrossRefGoogle Scholar
  5. Ayaz, S., & Afzal, M. T. (2016). Identification of conversion factor for completing-h index for the field of mathematics. Scientometrics, 109(3), 1511–1524.CrossRefGoogle Scholar
  6. Bertsimas, D., Brynjolfsson, E., Reichman, S., & Silberholz, J. (2013). Network analysis for predicting academic impact. In Proceedings of the 34th international conference on information systems, Milan, Italy.Google Scholar
  7. Bornmann, L. (2014). h-Index research in scientometrics: A summary. Journal of Informetrics, 8(3), 749–750.CrossRefGoogle Scholar
  8. Bornmann, L., Mutz, R., & Daniel, H.-D. (2008). Are there better indices for evaluation purposes than the h index? A comparison of nine different variants of the h index using data from biomedicine. Journal of the American Society for Information Science and Technology, 59(5), 830–837.CrossRefGoogle Scholar
  9. Bu, Y., Ni, S., & Huang, W. B. (2017). Combining multiple scholarly relationships with author cocitation analysis: A preliminary exploration on improving knowledge domain mappings. Journal of Informetrics, 11(3), 810–822.CrossRefGoogle Scholar
  10. Dong, Y., Johnson, R. A., & Chawla, N. V. (2016). Can scientific impact be predicted? IEEE Transactions on Big Data, 2(1), 18–30.CrossRefGoogle Scholar
  11. García-Pérez, M. A. (2013). Limited validity of equations to predict the future h index. Scientometrics, 96(3), 901–909.CrossRefGoogle Scholar
  12. Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. In Proceedings of the national academy of sciences of the United States of America (pp. 16569–16572).Google Scholar
  13. Hirsch, J. E. (2007). Does the h index have predictive power? Proceedings of the National Academy of Sciences, 104(49), 19193–19198.CrossRefGoogle Scholar
  14. Jones, A. (2013). The explosive growth of postdocs in computer science. Communications of the ACM, 56(2), 37–39.CrossRefGoogle Scholar
  15. Kong, X., Jiang, H., Wang, W., Bekele, T. M., Xu, Z., & Wang, M. (2017). Exploring dynamic research interest and academic influence for scientific collaborator recommendation. Scientometrics, 113(1), 369–385.CrossRefGoogle Scholar
  16. Mazloumian, A. (2012). Predicting scholars’ scientific impact. PLoS ONE, 7(11), e49246.CrossRefGoogle Scholar
  17. Miró, Ò., Burbano, P., Graham, C. A., Cone, D. C., Ducharme, J., Brown, A. F., & Martín-Sánchez, F. J. (2017). Analysis of h-index and other bibliometric markers of productivity and repercussion of a selected sample of worldwide emergency medicine researchers. Emergency Medicine Journal, 34(3), 175–181.CrossRefGoogle Scholar
  18. Oppenheim, C. (2007). Using the h-index to rank influential British researchers in information science and librarianship. Journal of the American Society for Information Science and Technology, 58(2), 297–301.CrossRefGoogle Scholar
  19. Penner, O., Pan, R. K., Petersen, A. M., Kaski, K., & Fortunato, S. (2013). On the predictability of future impact in science. Scientific Reports, 3, 3052.CrossRefGoogle Scholar
  20. Schreiber, M. (2013). How relevant is the predictive power of the h-index? A case study of the time-dependent Hirsch index. Journal of Informetrics, 7(2), 325–329.CrossRefGoogle Scholar
  21. Schreiber, M. (2014). Is it possible to measure scientific performance with the h-index or with another variant from the Hirsch index zoo? Journal of Unsolved Questions, 4(1), 5–10.Google Scholar
  22. Tang, J., Fong, A. C., Wang, B., & Zhang, J. (2012). A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering, 24(6), 975–987.CrossRefGoogle Scholar
  23. Tang, J., Zhang, J., Jin, R., Yang, Z., Cai, K., Zhang, L., et al. (2011). “Topic level expertise search over heterogeneous networks. Machine Learning, 82(2), 211–237.MathSciNetCrossRefGoogle Scholar
  24. Tang, J., Zhang, D., & Yao, L. (2007). Social network extraction of academic researchers. In Seventh IEEE international conference on data mining, 2007. ICDM 2007 (pp. 292–301). IEEE.Google Scholar
  25. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008). Arnetminer: Extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 990–998). ACM.Google Scholar
  26. Tyrrell, P. N., Moody, A. R., Moody, J. O. C., & Ghiam, N. (2017). Departmental h-index: Evidence for publishing less? Canadian Association of Radiologists Journal, 68(1), 10–15.CrossRefGoogle Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceCapital University of Science & TechnologyIslamabadPakistan

Personalised recommendations