Which can better predict the future success of articles? Bibliometric indices or alternative metrics
In this paper, we made a survey on the prediction capability of bibliometric indices and alternative metrics on the future success of articles by establishing a machine learning framework. Twenty-three bibliometric and alternative indices were collected to establish the feature space for the predication task. In order to eliminate the possible redundancy in feature space, three feature selection techniques of Relief-F, principal component analysis and entropy weighted method were used to rank the features according to their contribution to the original data set. Combining the fractal dimension of the data set, the intrinsic features which can better represent the original feature space were extracted. Three classifiers of Naïve Bayes, KNN and random forest were performed to detect the classification performance of these features. Experimental results show that both bibliometric indices and alternative metrics are beneficial to articles’ growth. Early citation features, early Web usage statistics, as well as the reputation of the first author are the most valuable indicators in making an article more influential in the future.
KeywordsHighly-cited papers Bibliometric index Alternative metrics Machine learning
This work was supported by the National Natural Science Foundation of China (Grant No. 71473034), the financial assistance from Postdoctoral Scientific Research Developmental Fund of Heilongjiang Province (Grant No. LBH-Q16003), and the national undergraduate training programs for innovation (Grant No. 201510225167).
- Belussi, A., & Faloutsos, C. (1995). Estimating the selectivity of spatial queries using the ‘correlation’ fractal dimension. In Proceedings of the 21th international conference on very large data bases (pp. 299–310).Google Scholar
- Berchtold, S., Böhm, C., & Kriegel, H.-P. (1998). The pyramid-tree: Breaking the curse of dimensionality. In Proceedings of the ACM SIGMOD international conference on management of data (pp. 142–153).Google Scholar
- Buela-Casal, G., & Zych, I. (2010). Analysis of the relationship between the number of citations and the quality evaluated by experts in psychology journals. Psicothema, 22(2), 270–276.Google Scholar
- Chakraborty, T., Kumar, S., Goyal, p. Ganguly, N. & Mukherjee, A. (2014). Towards a stratified learning approach to predict future citation counts. In Proceedings of the ACM/IEEE joint conference on digital libraries.Google Scholar
- Glänzel, W. (2008). Seven myths in bibliometrics. About facts and fiction in quantitative science studies. In 4th International conference on webometrics, informetrics and scientometrics & 9th COLLNET meeting, Berlin, Germany.Google Scholar
- Glänzel, W., & Heeffer, S. (2014). Cross-national preferences and similarities in downloads and citations of scientific articles: a pilot study. In E. Noyons (Ed.), Proceedings of the STI conference 2014, Leiden (pp. 207–215).Google Scholar
- Haustein, S., Peters, I., Bar-Ilan, J., Priem, J., Hadas, S., & Terliesner, J. (2013). Coverage and adoption of altmetrics sources in the bibliometric community. In Proceeding of 14th international society of scientometrics and informatics conference (pp. 468–483).Google Scholar
- Hurley, L. A., Ogier, A. L., & Torvik, V. I. (2013). Deconstructing the collaborative impact: Article and author characteristics that influence citation count. Proceedings of the ASIST Annual Meeting, 50(1), 1–10.Google Scholar
- Kononenko, I. (1994). Estimating attributes: Analysis and extensions of RELIEF. In European conference on machine learning (pp. 171–182).Google Scholar
- Korn, F., Pagel, B.-U., & Faloutsos, C. (2001). On the ‘dimensionality curse’ and the ‘self-similarity blessing’. IEEE TKDE, 13, 96–111.Google Scholar
- Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian classifers. In Proceedings of the 10th national conference on artificial intelligence (pp. 223–228).Google Scholar
- Lira, R. P. C., Vieira, R. M. C., Goncalves, F. A., Ferreira, M. C. A., Maziero, D., & Arieta, C. E. L. (2013). Influence of English language in the number of citations of articles published in Brazilian journals of Ophthalmology. Arquivos Brasileiros de Oftalmologia, 76(1), 26–28.CrossRefGoogle Scholar
- Marashi, S. A., Hosseini-Nami, S., Alishah, K., Hadi, M., Karimi, A., Hosseinian, S., et al. (2013). Impact of wikipeida on citation trends. Excli Journal, 12, 15–19.Google Scholar
- Naraei, P. & Sadeghian, A. (2017). A PCA based feature reduction in intracranial hypertension analysis. In IEEE international conference on 30th Canadian conference on electrical and computer engineering (pp. 1–6).Google Scholar
- Pagel, P. S., & Hudetz, J. A. (2011). Scholarly productivity of United States academic cardiothoracic anesthesiologists: Influence of fellowship accreditation and transesophageal echocardiographic credentials on h-index and other citation bibliometrics. Journal of Cardiothoracic and Vascular Anesthesia, 25(5), 761–765.CrossRefGoogle Scholar
- Pagel, B.-U., Korn, F. & Faloutsos, C. (2000). Deflating the dimensionality curse using multiple fractal dimensions. In 16th ICDE (pp. 589–598).Google Scholar
- Priem, J., & Hemminger, B. M. (2010). Scientometrics 2.0: Toward new metrics of scholarly impact on the social Web. First Monday, 15(7). Retrieved from https://journals.uic.edu/ojs/index.php/fm/article/view/2874/2570.
- Priem, J., Parra, C., Piwowar, H., Groth, P., & Waagmeester, A. (2012). Uncovering impacts: a case study in using altmetrics tools. In Second international conference on the future of scholarly communication and scientific publishing. Heraklion, Greece. http://jasonpriem.org/self-archived/altmetrics/sepublica/cameraready.pdf. Accessed 19 Mar 2013.
- Syamili, C., & Rekha, R. V. (2017). Do altmetric correlate with citation? A study based on PLOS ONE journal. Journal of Scientometrics and Information Management, 11(1), 103–117.Google Scholar
- Tang, X., Wang, L., & Kishore, R. (2014). Why do is scholars cite other scholars? An empirical analysis of the direct and moderating effects of cooperation and competition among is scholars on individual citation behavior C3. In 35th International conference on information systems (ICIS 2014).Google Scholar
- Traina, C., Traina, A., Wu, L., & Faloutsos, C. (2000). Fast feature selection using fractal dimension. In Proceeding 15th Brazilian symposium on database (SBBD) (pp. 158–171).Google Scholar
- Van Der Pol, C. B., McInnes, M. D. F., Petrcich, W., Tunis, A. S., & Hanna, R. (2015). Is quality and completeness of reporting of systematic reviews and meta-analyses published in high impact radiology journals associated with citation rates? PLoS ONE, 10(3), e011892.Google Scholar
- Wang, X., Liu, C., Fang, Z., & Mao, W. (2014). From attention to citation, what and how does altmetrics work? http://arxiv.org/abs/1409.4269
- Xu, J. L., Xu, B. W., Zhang, W. F., & Cui, Z. F. (2008). Principal component analysis based feature selection for clustering. In 2008 international conference on machine learning and cybernetics (Vol. 1, pp. 460–465).Google Scholar
- Yu, T., & Yu, G. (2014). Features of scientific papers and the relationships with their citation impact. Malaysian Journal of Library and Information Science, 19(1), 37–50.Google Scholar
- Zahedi, Z., Costas, R. & Wouters, P. (2013). How well developed are Altmerics? Cross-disciplinary analysis of the presence of ‘alternative metrics’ in scientific publications. In 14th International society of scientometrics and informatics conference (pp. 876–884).Google Scholar