, Volume 117, Issue 1, pp 351–380 | Cite as

The next generation (plus one): an analysis of doctoral students’ academic fecundity based on a novel approach to advisor identification

  • Dominik P. HeinischEmail author
  • Guido Buenstorf


Scientific communities reproduce themselves by allowing senior scientists to educate young researchers, in particular through the training of doctoral students. This process of reproduction is imperfectly understood, in part because there are few large-scale datasets linking doctoral students to their advisors. We present a novel approach employing machine learning techniques to identify advisors among (frequent) co-authors in doctoral students’ publications. This approach enabled us to construct an original dataset encompassing more than 20,000 doctoral student-advisor pairs in applied physics and electrical engineering from German universities, 1975–2005. We employ this dataset to analyze the “fecundity” of doctoral students, i.e. their probability to become advisors themselves.


Advisor identification Fecundity Ph.D. training Advisor affects Academic careers Machine learning 

JEL Classification

PI23 O30 D83 D85 



We would like to thank two anonymous reviewers of the ISSI 2017 conference, as well as two reviewers of this journal, for their helpful comments. This work was funded by the German Federal Ministry of Education and Research (BMBF) in its program “Forschung zu den Karrierebedingungen und Karriereentwicklungen des Wissenschaftlichen Nachwuchses (FoWiN)” under Grant 16FWN001.


  1. Andraos, J. (2005). Scientific genealogies of physical and mechanistic organic chemists. Canadian Journal of Chemistry, 83, 1400–1414.CrossRefGoogle Scholar
  2. Azoulay, P., Liu, C. C., & Stuart, T. E. (2017). Social influence given (partially) deliberate matching: Career imprints in the creation of academic entrepreneurs. American Journal of Sociology, 122(4), 1223–71.CrossRefGoogle Scholar
  3. Bäker, A. (2015). Non-tenured post-doctoral researchers’ job mobility and research output: An analysis of the role of research discipline, department size, and coauthors. Research Policy, 44(3), 634–650.CrossRefGoogle Scholar
  4. Ballester, C., Calvo-Armengol, A., & Zenou, Y. (2006). Who’s who in networks. Wanted: The key player. Econometrica, 74(5), 1403–1417.MathSciNetCrossRefzbMATHGoogle Scholar
  5. Balsmeier, B., & Pellens, M. (2014). Who makes, who breaks: Which scientists stay in academe? Economics Letters, 122(2), 229–232.CrossRefGoogle Scholar
  6. Bandura, A. (1986). The explanatory and predictive scope of self-efficacy theory. Journal of Social and Clinical Psychology, 4(3), 359–373.CrossRefGoogle Scholar
  7. Baruffaldi, S., Visentin, F., & Conti, A. (2016). The productivity of science and engineering PhD students hired from supervisors’ networks. Research Policy, 45(4), 785–796.CrossRefGoogle Scholar
  8. Bishop, C. M. (2006). Pattern recognition and machine learning. New York, NY: Springer.zbMATHGoogle Scholar
  9. Bonacich, P. (1987). Power and centrality: A family of measures. American Journal of Sociology, 92(5), 1170–1182.CrossRefGoogle Scholar
  10. Buenstorf, G., & Geissler, M. (2014). Tracing role model learning in the evolution of German laser research. Jahrbücher für Nationalökonomie und Statistik, 234(2+3), 158–184.Google Scholar
  11. Collins, H. M. (1974). The TEA set: Tacit knowledge and scientific networks. Science Studies, 4(2), 165–185.CrossRefGoogle Scholar
  12. Conley, J. P., & Önder, A. S. (2014). The research productivity of new PhDs in economics: The surprisingly high non-success of the successful. Journal of Economic Perspectives, 28(3), 205–216.CrossRefGoogle Scholar
  13. Culp, M., Johnson, K., & Michailidis, G. (2006). ada: An R package for stochastic boosting. Journal of Statistical Software. Scholar
  14. D’Angelo, C. A., Giuffrida, C., & Abramo, G. (2011). A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments. Journal of the American Society for Information Science and Technology, 62(2), 257–269.CrossRefGoogle Scholar
  15. Dasgupta, P., & David, P. A. (1994). Toward a new economics of science. Research Policy, 23(5), 487–521.CrossRefGoogle Scholar
  16. David, S. V., & Hayden, B. Y. (2012). Neurotree: A collaborative, graphical database of the academic genealogy of neuroscience. PLoS One, 7(10), e46608.CrossRefGoogle Scholar
  17. de Mey, M. (1982). The cognitive paradigm. Dordrecht: D. Reidel Publishing Company.CrossRefGoogle Scholar
  18. Dores, W., Benevenuto, F., & Laender, A.H. (2016). Extracting academic genealogy trees from the networked digital library of theses and dissertations. In Proceedings of the 16th ACM/IEEE-CS on joint conference on digital libraries—JCDL ’16 (pp. 163–166).Google Scholar
  19. Ferreira, A. A., Gonçalves, M. A., & Laender, A. H. (2012). A brief survey of automatic methods for author name disambiguation. SIGMOD Record, 41(2), 15–26.CrossRefGoogle Scholar
  20. Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22.CrossRefGoogle Scholar
  21. Han, H., Giles, L., Zha, H., Li, C., & Tsioutsiouliklis, K. (2004). Two supervised learning approaches for name disambiguation in author citations. In Proceedings of the 2004 joint ACM/IEEE conference on IEEE. (pp. 296–305).Google Scholar
  22. Horta, H., Veloso, F. M., & Grediaga, R. (2010). Navel gazing: Academic inbreeding and scientific productivity. Management Science, 56(3), 414–429.CrossRefGoogle Scholar
  23. Hottenrott, H., & Lawson, C. (2017). Flying the nest: How the home department shapes researchers’ career paths. Studies in Higher Education, 42(6), 1091–1109.CrossRefGoogle Scholar
  24. Jackson, A. (2007). A labor of love: the mathematics genealogy project. Notices of the American Mathematical Society, 54(8), 1002–1003.MathSciNetGoogle Scholar
  25. Krabel, S. (2012). Scientists’ valuation of open science and commercialization: The influence of peers and organizational context. In G. Buenstorf (Ed.), Evolution, organization and economic behavior (pp. 75–102). Cheltenham: Edward Elgar.Google Scholar
  26. Levin, S. G., & Stephan, P. E. (1991). Research productivity over the life cycle: Evidence for academic scientists. The American Economic Review, 81(1), 114–132.Google Scholar
  27. Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18–22.Google Scholar
  28. Long, J. S., & McGinnis, R. (1985). The effects of the mentor on the academic career. Scientometrics, 7(3–6), 255–280.CrossRefGoogle Scholar
  29. Malmgren, R. D., Ottino, J. M., & Amaral, L. A. N. (2010). The role of mentorship in protégé performance. Nature, 465(June), 622–627.CrossRefGoogle Scholar
  30. Marsh, E. J. (2017). Family matters: Measuring impact through one’s academic descendants. Perspectives on Psychological Science, 12(6), 1130–1132.CrossRefGoogle Scholar
  31. Merton, R. K. (1968). The Matthew effect in science. Science, 159(3810), 56–63.CrossRefGoogle Scholar
  32. Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., & Leisch, F. (2015). e1071: Misc functions of the department of statistics, probability theory group (Formerly: E1071). Wien: TU Wien.Google Scholar
  33. Morichika, N., & Shibayama, S. (2016). Use of dissertation data in science policy research. Scientometrics, 108(1), 221–241.CrossRefGoogle Scholar
  34. Paglis, L. L., Green, S. G., & Bauer, T. N. (2006). Does adviser mentoring add value? A longitudinal study of mentoring and doctoral student outcomes. Research in Higher Education, 47(4), 451–476.CrossRefGoogle Scholar
  35. R Core Team. (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna.Google Scholar
  36. Reijnhoudt, L., Costas, R., Noyons, E., Börner, K., & Scharnhorst, A. (2014). \(\text{ Seed } + \text{ expand }\): A general methodology for detecting publication oeuvres of individual researchers. Scientometrics, 101(2), 1403–1417.CrossRefGoogle Scholar
  37. Rossi, L., Freire, I. L., & Mena-chalco, J. P. (2017). Genealogical index: A metric to analyze advisor—advisee relationships. Journal of Informetrics, 11(2), 564–582.CrossRefGoogle Scholar
  38. Schoen, A., Heinisch, D., & Buenstorf, G. (2014). Playing the name game to identify academic patents in Germany. Scientometrics, 101(1), 527–545.CrossRefGoogle Scholar
  39. Singh, R., Ragins, B. R., & Tharenou, P. (2009). What matters most? The relative role of mentoring and career capital in career success. Journal of Vocational Behavior, 75(1), 56–67.CrossRefGoogle Scholar
  40. Smalheiser, N. R., & Torvik, V. I. (2009). Author name disambiguation. Annual Review of Information Science and Technology, 43(1), 1–43.CrossRefGoogle Scholar
  41. Stephan, P. E. (2012). How economics shapes science. Cambridge, MA: Harvard University Press.CrossRefGoogle Scholar
  42. Strotmann, A., Zhao, D., & Bubela, T. (2009). Author name disambiguation for collaboration network analysis and visualization. Proceedings of the American Society for Information Science and Technology, 46(1), 1–20.Google Scholar
  43. Sugimoto, C. R. (2012). Are you my mentor? Identifying mentors and their roles in LIS doctoral education. Journal of Education for Library and Information Science, 53(1), 2–19.Google Scholar
  44. Sugimoto, C. R. (2014). Academic genealogy. In B. Cronin & C. R. Sugimoto (Eds.), Beyond Bibliometrics: Harnessing multidimensional indicators of scholarly impact (pp. 365–382). Cambridge, MA: MIT Press.Google Scholar
  45. Tartari, V., Perkmann, M., & Salter, A. (2014). In good company: The influence of peers on industry engagement by academic scientists. Research Policy, 43(7), 1189–1203.CrossRefGoogle Scholar
  46. Waldinger, F. (2016). Bombs, brains, and science: The role of human and physical capital for the production of scientific knowledge. The Review of Economics and Statistics, 98(5), 811–831.CrossRefGoogle Scholar
  47. Wang, J., Berzins, K., Hicks, D., Melkers, J., Xiao, F., & Pinheiro, D. (2012). A boosted-trees method for name disambiguation. Scientometrics, 93(2), 391–411.CrossRefGoogle Scholar
  48. Wang, W., Liu, J., Xia, F., King, I., & Tong, H. (2017). Shifu: Deep learning based advisor-advisee relationship mining in scholarly big data. In Proceedings of the 26th international conference on world wide web companion (pp. 303–310).Google Scholar
  49. Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques. San Francisco, CA: Morgan Kaufmann Publishers.zbMATHGoogle Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2018

Authors and Affiliations

  1. 1.Institute of Economics and INCHER KasselUniversity of KasselKasselGermany
  2. 2.Institute of Innovation and EntrepreneurshipUniversity of GothenburgGothenburgSweden
  3. 3.IWH Leibniz Institute of Economics HalleHalleGermany

Personalised recommendations