CSTeller: forecasting scientific collaboration sustainability based on extreme gradient boosting
- 114 Downloads
- 1 Citations
Abstract
The mechanism why two strange scholars become collaborators has been extensively studied from the perspective of social network analysis. In academia, two scholars may collaborate with each other more than once, which means that scientific collaboration is to some extent sustainable. However, less research has been done to explore the sustainability of scientific collaboration. In this paper, we examine to what extent the collaboration sustainability can be predicted. For this purpose, an extreme gradient boosting-based collaboration sustainability prediction model named CSTeller is devised. We propose to analyze the sustainability of scientific collaboration from the perspectives of collaboration duration and collaboration times. We investigate factors that may affect collaboration sustainability based on scholars’ local properties and network properties. These factors are adopted as input features of CSTeller. Extensive experiments on two real scholarly datasets demonstrate the effectiveness of our proposed model. To the best of our knowledge, this is the first attempt to explore scientific collaboration mechanism from the perspective of sustainability. Our work may shed light on scientific collaboration analysis and benefit many practical issues such as collaborator recommendation since a scientific collaboration is not a one-shot deal.
Keywords
Scholarly big data Deep learning Relation mining Coauthor networkNotes
Acknowledgements
We thank Tong Gao for assistance with the experiments. This work was supported by the National Natural Science Foundation of China (NSFC) under Grant 61502071, 71774020 and 71473028, and the Fundamental Research Funds for the Central Universities under Grant (DUT18JC09).
References
- 1.Adam-Bourdarios, C., Cowan, G., Germain-Renaud, C., Guyon, I., Kégl, B., Rousseau, D.: The higgs machine learning challenge. In: Journal of Physics: Conference Series, vol. 664, p 072015. IOP Publishing (2015)Google Scholar
- 2.Babajide Mustapha, I., Saeed, F.: Bioactive molecule prediction using extreme gradient boosting. Molecules 21(8), 983 (2016)CrossRefGoogle Scholar
- 3.Benchettara, N., Kanawati, R., Rouveirol, C.: Supervised machine learning applied to link prediction in bipartite social networks. In: 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp 326–330. IEEE (2010)Google Scholar
- 4.Birnholtz, J.P.: When do researchers collaborate? Toward a model of collaboration propensity. J. Am. Soc. Inf. Sci. Technol. 58(14), 2226–2239 (2007)CrossRefGoogle Scholar
- 5.Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
- 6.Bu, Y., Ding, Y., Liang, X., Murray, D.S.: Understanding persistent scientific collaboration. Journal of the Association for Information Science and Technology p. https://doi.org/10.1002/asi.23966 (2017)CrossRefGoogle Scholar
- 7.Caragea, C., Wu, J., Williams, K., Gollapalli, S.D., Khabsa, M., Teregowda, P., Giles, C.L.: Automatic identification of research articles from crawled documents. In: WSDM 2014 Workshop on Web-scale Classification: Classifying Big Data from the Web (2014)Google Scholar
- 8.Chakraborty, T., Patranabis, S., Goyal, P., Mukherjee, A.: On the formation of circles in co-authorship networks. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 109–118. ACM (2015)Google Scholar
- 9.Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 785–794. ACM, New York (2016), https://doi.org/10.1145/2939672.2939785
- 10.Choudhury, N., Uddin, S.: Time-aware link prediction to explore network effects on temporal knowledge evolution. Scientometrics 108(2), 745–776 (2016)CrossRefGoogle Scholar
- 11.Cronin, B., Shaw, D., La Barre, K.: A cast of thousands: Coauthorship and subauthorship collaboration in the 20th century as manifested in the scholarly journal literature of psychology and philosophy. J. Am. Soc. Inf. Sci. Technol. 54(9), 855–871 (2003)CrossRefGoogle Scholar
- 12.Dong, Y., Johnson, R.A., Yang, Y., Chawla, N.V.: Collaboration signatures reveal scientific impact. In: 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp 480–487. IEEE (2015)Google Scholar
- 13.Dong, Y., Johnson, R.A., Chawla, N.V.: Can scientific impact be predicted? IEEE Trans. Big Data 2(1), 18–30 (2016)CrossRefGoogle Scholar
- 14.Eom, Y.H., Jo, H.H.: Generalized friendship paradox in complex networks: The case of scientific collaboration. Sci. Rep. 4, 4603 (2014)CrossRefGoogle Scholar
- 15.Granovetter, M.S.: The strength of weak ties. Am. J. Sociol., 1360–1380 (1973)CrossRefGoogle Scholar
- 16.Hara, N., Solomon, P., Kim, S.L., Sonnenwald, D.H.: An emerging view of scientific collaboration: Scientists’ perspectives on collaboration and factors that impact collaboration. J. Am. Soc. Inf. Sci. Technol. 54(10), 952–965 (2003)CrossRefGoogle Scholar
- 17.He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRefGoogle Scholar
- 18.Hou, H., Kretschmer, H., Liu, Z.: The structure of scientific collaboration networks in scientometrics. Scientometrics 75(2), 189–202 (2007)CrossRefGoogle Scholar
- 19.Huang, J., Zhuang, Z., Li, J., Giles, C.L.: Collaboration over time: Characterizing and modeling network evolution. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp 107–116. ACM (2008)Google Scholar
- 20.Jiang, T., Liu, T., Ge, T., Sha, L., Li, S., Chang, B., Sui, Z.: Encoding temporal information for time-aware link prediction. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2350–2354 (2016)Google Scholar
- 21.Katz, J.S., Martin, B.R.: What is research collaboration? Res. Policy 26(1), 1–18 (1997)CrossRefGoogle Scholar
- 22.Khabsa, M., Giles, C.L.: The number of scholarly documents on the public Web. PLoS ONE 9(5), e93, 949 (2014)CrossRefGoogle Scholar
- 23.Khan, S., Liu, X., Shakil, K.A., Alam, M.: A survey on scholarly data: From big data perspective. Inf. Process. Manag. 53(4), 923–944 (2017)CrossRefGoogle Scholar
- 24.Kong, X., Jiang, H., Yang, Z., Xu, Z., Xia, F., Tolba, A.: Exploiting publication contents and collaboration networks for collaborator recommendation. PLoS ONE 11(2), e0148, 492 (2016)CrossRefGoogle Scholar
- 25.Kong, X., Mao, M., Wang, W., Liu, J., Xu, B.: Voprec: Vector representation learning of papers with text information and structural identity for recommendation. IEEE Transactions on Emerging Topics in Computing. https://doi.org/10.1109/TETC.2018.2830698 (2018)
- 26.Kossinets, G., Watts, D.J.: Empirical analysis of an evolving social network. Science 311(5757), 88–90 (2006)MathSciNetCrossRefGoogle Scholar
- 27.Kramer, O.: K-nearest neighbors. In: Dimensionality Reduction with Unsupervised Nearest Neighbors, pp 13–23. Springer (2013)Google Scholar
- 28.Li, J., Xia, F., Wang, W., Chen, Z., Asabere, N.Y., Jiang, H.: Acrec: A co-authorship based random walk model for academic collaboration recommendation. In: Proceedings of the 23rd International Conference on World Wide Web, pp 1209–1214. ACM (2014)Google Scholar
- 29.Li, L., Tong, H.: The child is father of the man: Foresee the success at the early stage. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 655–664. ACM (2015)Google Scholar
- 30.Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Assoc. Inf. Sci. Technol. 58(7), 1019–1031 (2007)CrossRefGoogle Scholar
- 31.Liu, H., Zhang, X., Zhang, X., Cui, Y.: Self-adapted mixture distance measure for clustering uncertain data. Knowl.-Based Syst. 126, 33–47 (2017)CrossRefGoogle Scholar
- 32.Lopes, G.R., Moro, M.M., Wives, L.K., De Oliveira, J.P.M.: Collaboration recommendation on academic social networks. In: International Conference on Conceptual Modeling, pp 190–199. Springer (2010)Google Scholar
- 33.Lü, L., Zhou, T.: Link prediction in complex networks: A survey. Physica A: Statist. Mech. Appl. 390(6), 1150–1170 (2011)CrossRefGoogle Scholar
- 34.Newman , M.E.: Scientific collaboration networks. ii. Shortest paths, weighted networks, and centrality. Phys. Rev. E 64(1), 016, 132 (2001)MathSciNetCrossRefGoogle Scholar
- 35.Newman, M.E.: The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. 98(2), 404–409 (2001)MathSciNetCrossRefGoogle Scholar
- 36.Newman, M.E.: Coauthorship networks and patterns of scientific collaboration. Proc. Nat. Acad. Sci. 101(suppl 1), 5200–5205 (2004)CrossRefGoogle Scholar
- 37.Persson, O., Glänzel, W., Danell, R.: Inflationary bibliometric values: The role of scientific collaboration and the need for relative indicators in evaluative studies. Scientometrics 60(3), 421–432 (2004)CrossRefGoogle Scholar
- 38.Petersen, A.M.: Quantifying the impact of weak, strong, and super ties in scientific careers. Proc. Natl. Acad. Sci. 112(34), E4671–E4680 (2015)CrossRefGoogle Scholar
- 39.Rokach, L., Maimon, O.: Data Mining with Decision Trees: Theory and Applications. World Scientific (2014)Google Scholar
- 40.Seber, G.A., Lee, A.J.: Linear Regression Analysis, vol. 936. Wiley (2012)Google Scholar
- 41.Sinatra, R., Wang, D., Deville, P., Song, C., Barabási, A.L.: Quantifying the evolution of individual scientific impact. Science 354(6312), aaf5239 (2016)CrossRefGoogle Scholar
- 42.Sonnenwald, D.H.: Scientific collaboration. Ann. Rev. Inf. Sci. Technol. 41(1), 643–681 (2007)CrossRefGoogle Scholar
- 43.Stokols, D., Hall, K.L., Taylor, B.K., Moser, R.P.: The science of team science: Overview of the field and introduction to the supplement. Am. J. Prev. Med. 35(2), S77–S89 (2008)CrossRefGoogle Scholar
- 44.Sun, Y., Han, J., Aggarwal, C.C., Chawla, N.V.: When will it happen?: Relationship prediction in heterogeneous information networks. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, pp 663–672. ACM (2012)Google Scholar
- 45.Tang, J., Wu, S., Sun, J., Su, H.: Cross-domain collaboration recommendation. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1285–1293. ACM (2012)Google Scholar
- 46.Tang, J., Chang, S., Aggarwal, C., Liu, H.: Negative link prediction in social media. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp 87–96. ACM (2015)Google Scholar
- 47.Tsai, C.H., Lin, Y.R.: Tracing and predicting collaboration for junior scholars. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp 375–380. International World Wide Web Conferences Steering Committee (2016)Google Scholar
- 48.Tylenda, T., Angelova, R., Bedathur, S.: Towards time-aware link prediction in evolving social networks. In: Proceedings of the 3rd Workshop on Social Network Mining and Analysis, p 9. ACM (2009)Google Scholar
- 49.Wang, W., Bai, X., Xia, F., Bekele, T.M., Su, X., Tolba, A.: From triadic closure to conference closure: The role of academic conferences in promoting scientific collaborations. Scientometrics 113(1), 177–193 (2017)CrossRefGoogle Scholar
- 50.Wang, W., Cui, Z., Gao, T., Yu, S., Kong, X., Xia, F.: Is scientific collaboration sustainability predictable?. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp 853–854. International World Wide Web Conferences Steering Committee (2017)Google Scholar
- 51.Williams, K., Wu, J., Choudhury, S.R., Khabsa, M., Giles, C.L.: Scholarly big data information extraction and integration in the citeseer χ digital library. In: 2014 IEEE 30th International Conference on Data Engineering Workshops (ICDEW), pp 68–73. IEEE (2014)Google Scholar
- 52.Wuchty, S., Jones, B.F., Uzzi, B.: The increasing dominance of teams in production of knowledge. Science 316(5827), 1036–1039 (2007)CrossRefGoogle Scholar
- 53.Xia, F., Chen, Z., Wang, W., Li, J., Yang, L.T.: Mvcwalker: Random walk-based most valuable collaborators recommendation exploiting academic factors. IEEE Trans. Emerg. Topics Comput. 2(3), 364–375 (2014)CrossRefGoogle Scholar
- 54.Xia, F., Wang, W., Bekele, T.M., Liu, H.: Big scholarly data: A survey. IEEE Trans. Big Data 3(1), 18–35 (2017)CrossRefGoogle Scholar
- 55.Yang, Z.R.: Biological applications of support vector machines. Brief. Bioinform. 5(4), 328–338 (2004)CrossRefGoogle Scholar
- 56.Zhang, C., Bu, Y., Ding, Y., Xu, J.: Understanding scientific collaboration: Homophily, transitivity, and preferential attachment. Journal of the Association for Information Science and Technology. https://doi.org/10.1002/asi.23916 (2017)CrossRefGoogle Scholar