Skip to main content

Understanding the Scientific Enterprise: Citation Analysis, Data and Modeling

  • Chapter
Social Phenomena

Part of the book series: Computational Social Sciences ((CSS))

Abstract

The large amount of information contained in bibliographic databases has recently boosted the use of citations, and other indicators based on citation numbers, as tools for the quantitative assessment of scientific research. Citations counts are often interpreted as proxies for the scientific influence of papers, journals, scholars, and institutions. Given their importance in practical contexts, the interest in the study of bibliographic datasets is no longer restricted to specialists in bibliometrics but extends to scholars having very different primary fields of research. As a result, the recent past has witnessed a huge production of papers on this topic of research. The present chapter aims at providing a brief overview of the progress recently made in the analysis of bibliographic databases. In the first part of the chapter, we will focus our attention on studies devoted to the statistical description of distributions of citations received by individual publications. The second part is instead devoted at summarizing some recent research efforts towards the modeling of the citation dynamics and the growth of citation networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lotka, A. J. (1926). The frequency distribution of scientific productivity. Journal of Washington Academy of Sciences, 16(12), 317–324.

    Google Scholar 

  2. Shockley, W. (1957). On the statistics of individual variations of productivity in research laboratories. Proceedings of the IRE, 45(3), 279–290.

    Article  Google Scholar 

  3. de Solla Price, D. J. (1965). Networks of scientific papers. Science, 149(3683), 510–515.

    Article  ADS  Google Scholar 

  4. de Solla Price, D. J. (1976). A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27(5), 292–306.

    Article  Google Scholar 

  5. Newman, M. E. J. (2003). The structure and function of complex networks. SIAM Review, 45(2), 167–256.

    Article  MathSciNet  ADS  MATH  Google Scholar 

  6. MacRoberts, M. H., & MacRoberts, B. R. (1989). Problems of citation analysis: A critical review. Journal of the American Society for Information Science, 40(5), 342–349.

    Article  Google Scholar 

  7. MacRoberts, M. H., & MacRoberts, B. R. (1996). Problems of citation analysis. Scientometrics, 36(3), 435–444.

    Article  Google Scholar 

  8. Adler, R., Ewing, J., Taylor, P. (2009) Citation statistics. Statistical Science, 24(1), 1.

    Article  MathSciNet  Google Scholar 

  9. Bornmann, L., & Daniel, H.-D. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64(1), 45–80

    Article  Google Scholar 

  10. Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National academy of Sciences of the United States of America, 102(46), 16569–16572.

    Article  ADS  Google Scholar 

  11. Egghe, L. (2006). Theory and practise of the g-index. Scientometrics, 69(1), 131–152.

    Article  MathSciNet  Google Scholar 

  12. Garfield, E. (2006). The history and meaning of the journal impact factor. JAMA: The Journal of the American Medical Association, 295(1), 90–93.

    Article  Google Scholar 

  13. Davis, P., & Papanek, G. F. (1984). Faculty ratings of major economics departments by citations. The American Economic Review, 74(1), 225–230.

    Google Scholar 

  14. Kinney, A. L. (2007). National scientific facilities and their science impact on nonbiomedical research. Proceedings of the National Academy of Sciences, 104(46), 17943–17947.

    Article  ADS  Google Scholar 

  15. King, D. A. (2004). The scientific impact of nations. Nature, 430(6997), 311–316.

    Article  ADS  Google Scholar 

  16. Bornmann, L., & Daniel, H.-D. (2006). Selecting scientific excellence through committee peer review-a citation analysis of publications previously published to approval or rejection of post-doctoral research fellowship applicants. Scientometrics, 68(3), 427–440.

    Article  Google Scholar 

  17. Bornmann, L., Wallon, G., & Ledin, A. (2008). Does the committee peer review select the best applicants for funding? An investigation of the selection process for two European molecular biology organization programmes. PLoS One, 3(10), e3480.

    Article  ADS  Google Scholar 

  18. Web of Science. Available at http://wokinfo.com.

  19. CrossRef. Available at http://www.crossref.org.

  20. Scopus. Available at http://www.scopus.com.

  21. GoogleScholar. Available at http://scholar.google.com.

  22. Citeseer. Available at http://citeseerx.ist.psu.edu.

  23. inSpire. Available at http://inspirehep.net.

  24. Redner, S. (1998). How popular is your paper? An empirical study of the citation distribution. The European Physical Journal B-Condensed Matter and Complex Systems, 4(2), 131–134.

    Article  ADS  Google Scholar 

  25. Laherrere, J., & Sornette, D. (1998). Stretched exponential distributions in nature and economy: “fat tails” with characteristic scales. The European Physical Journal B-Condensed Matter and Complex Systems, 2(4), 525–539.

    Article  ADS  Google Scholar 

  26. Tsallis, C., & de Albuquerque, M. P. (2000). Are citations of scientific papers a case of nonextensivity? The European Physical Journal B-Condensed Matter and Complex Systems, 13(4), 777–780.

    Article  ADS  Google Scholar 

  27. Redner, S. (2005). Citation statistics from more than a century of physical review. Physics Today, 58, 49–54.

    Article  Google Scholar 

  28. Seglen, P. O. (1992). The skewness of science. Journal of the American Society for Information Science, 43(9), 628–638.

    Article  Google Scholar 

  29. Vázquez, A. (2001). Statistics of citation networks. arXiv preprint cond-mat/0105031.

    Google Scholar 

  30. Lehmann, S., Lautrup, B., & Jackson, A. D. (2003). Citation networks in high energy physics. Physical Review E, 68(2), 026113.

    Article  ADS  Google Scholar 

  31. Bommarito, M. J., & Katz, D. M. (2009). Properties of the united states code citation network. Available at SSRN: http://ssrn.com/abstract=1502927 or http://dx.doi.org/10.2139/ssrn.1502927

  32. Eom, Y.-H., & Fortunato, S. (2011). Characterizing and modeling citation dynamics. PLoS One, 6(9), e24926.

    Article  ADS  Google Scholar 

  33. Stringer, M. J., Sales-Pardo, M., & Nunes Amaral, L. A. (2008). Effectiveness of journal ranking schemes as a tool for locating information. PLoS One, 3(2), e1683.

    Article  ADS  Google Scholar 

  34. Radicchi, F., Fortunato, S., & Castellano, C. (2008). Universality of citation distributions: Toward an objective measure of scientific impact. Proceedings of the National Academy of Sciences, 105(45), 17268–17272.

    Article  ADS  Google Scholar 

  35. Castellano, C., & Radicchi, F. (2009). On the fairness of using relative indicators for comparing citation performance in different disciplines. Archivum immunologiae et therapiae experimentalis, 57(2), 85–90.

    Article  Google Scholar 

  36. Stringer, M. J., Sales-Pardo, M., & Amaral, L. A. N. (2010). Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in a scientific journal. Journal of the American Society for Information Science and Technology, 61(7), 1377–1385.

    Article  Google Scholar 

  37. Wallace, M. L., Larivière, V., & Gingras, Y. (2009). Modeling a century of citation distributions. Journal of Informetrics, 3(4), 296–303.

    Article  Google Scholar 

  38. Anastasiadis, A. D., de Albuquerque, M. P., de Albuquerque, M. P., & Mussi, D. B. (2010). Tsallis q-exponential describes the distribution of scientific citations – A new characterization of the impact. Scientometrics, 83(1), 205–218.

    Article  Google Scholar 

  39. van Raan, A. F. J. (2001). Two-step competition process leads to quasi power-law income distributions: Application to scientific publication and citation distributions. Physica A: Statistical Mechanics and Its Applications, 298(3), 530–536.

    Article  ADS  MATH  Google Scholar 

  40. Van Raan, A. F. J. (2001). Competition amongst scientists for publication status: Toward a model of scientific publication and citation distributions. Scientometrics, 51(1), 347–357.

    Article  Google Scholar 

  41. Kryssanov, V. V., Kuleshov, E. L., Rinaldo, F. J., & Ogawa, H. (2007). We cite as we communicate: A communication model for the citation process. arXiv preprint cs/0703115.

    Google Scholar 

  42. Waltman, L., van Eck, N. J., & van Raan, A. F. J. (2012). Universality of citation distributions revisited. Journal of the American Society for Information Science and Technology, 63(1), 72–77.

    Article  Google Scholar 

  43. Evans, T. S., Hopkins, N., & Kaube, B. S. (2012). Universality of performance indicators based on citation and reference counts. Scientometrics, 93(2), 473–495.

    Article  Google Scholar 

  44. Radicchi, F., & Castellano, C. (2011). Rescaling citations of publications in physics. Physical Review E, 83(4), 046116.

    Article  ADS  Google Scholar 

  45. Bornmann, L., & Daniel, H.-D. (2009). Universality of citation distributions – A validation of Radicchi et al.’s relative indicator cf= c/c0 at the micro level using data from chemistry. Journal of the American Society for Information Science and Technology, 60(8), 1664–1670.

    Article  Google Scholar 

  46. Kaur, J., Radicchi, F., & Menczer, F. (2013). Universality of scholarly impact metrics. Journal of Informetrics, 7(4), 924–932.

    Article  Google Scholar 

  47. Leydesdorff, L., Radicchi, F., Bornmann, L., Castellano, C., & de Nooy, W. (2013). Field-normalized impact factors: A comparison of rescaling versus fractionally counted ifs. Journal of the American Society for Information Science and Technology, 64(11), 2299–2309.

    Article  Google Scholar 

  48. Chatterjee, A., Ghosh, A., & Chakrabarti, B. K. (2014). Universality of citation distributions for academic institutions and journals. arXiv preprint arXiv:1409.8029.

    Google Scholar 

  49. Radicchi, F., & Castellano, C. (2012). A reverse engineering approach to the suppression of citation biases reveals universal properties of citation distributions. PLoS One, 7(3), e33833.

    Article  ADS  Google Scholar 

  50. Lawless, J. F. (2011). Statistical models and methods for lifetime data (Vol. 362). New York: Wiley.

    Google Scholar 

  51. Li, Y., Radicchi, F., Castellano, C., & Ruiz-Castillo, J. (2013). Quantitative evaluation of alternative field normalization procedures. Journal of Informetrics, 7(3), 746–755.

    Article  Google Scholar 

  52. Crespo, J. A., Li, Y., & Ruiz-Castillo, J. (2013). The measurement of the effect on citation inequality of differences in citation practices across scientific fields. PLoS One, 8(3), e58727.

    Article  ADS  Google Scholar 

  53. Karrer, B., & Newman, M. E. J. (2009). Random acyclic networks. Physical Review Letters, 102(12), 128701.

    Article  ADS  Google Scholar 

  54. Karrer, B., & Newman, M. E. J. (2009). Random graph models for directed acyclic networks. Physical Review E, 80(4), 046110.

    Article  ADS  Google Scholar 

  55. Molloy, M., & Reed, B. (1998). The size of the giant component of a random graph with a given degree sequence. Combinatorics, Probability and Computing, 7(03), 295–305.

    Article  MathSciNet  MATH  Google Scholar 

  56. Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., & Alon, U. (2002). Network motifs: Simple building blocks of complex networks. Science, 298(5594), 824–827.

    Article  ADS  Google Scholar 

  57. Wu, Z.-X., & Holme, P. (2009). Modeling scientific-citation patterns and other triangle-rich acyclic networks. Physical Review E, 80(3), 037101.

    Article  ADS  Google Scholar 

  58. Yule, G. U. (1925). A mathematical theory of evolution, based on the conclusions of Dr. JC Willis, FRS. Philosophical Transactions of the Royal Society of London. Series B, Containing Papers of a Biological Character, 213, 21–87.

    Google Scholar 

  59. Simon, H. A. (1957). Models of man: Social and rational. New York: Wiley.

    MATH  Google Scholar 

  60. Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.

    Article  MathSciNet  ADS  Google Scholar 

  61. Krapivsky, P. L., Redner, S., & Leyvraz, F. (2000). Connectivity of growing random networks. Physical Review Letters, 85(21), 4629.

    Article  ADS  Google Scholar 

  62. Dorogovtsev, S. N., Mendes, J. F. F., & Samukhin, A. N. (2000). Structure of growing networks with preferential linking. Physical Review Letters, 85(21), 4633.

    Article  ADS  Google Scholar 

  63. Newman, M. E. J. (2009). The first-mover advantage in scientific publication. Europhysics Letters, 86(6), 68001.

    Article  ADS  Google Scholar 

  64. Jeong, H., Néda, Z., & Barabási, A.-L. (2003). Measuring preferential attachment in evolving networks. Europhysics Letters, 61(4), 567.

    Article  ADS  Google Scholar 

  65. Golosovsky, M., & Solomon, S. (2012). Stochastic dynamical model of a growing citation network based on a self-exciting point process. Physical Review Letters, 109(9), 098701.

    Article  ADS  Google Scholar 

  66. Golosovsky, M., & Solomon, S. (2013). The transition towards immortality: Non-linear autocatalytic growth of citations to scientific papers. Journal of Statistical Physics, 151(1–2), 340–354.

    Article  MathSciNet  ADS  MATH  Google Scholar 

  67. Hajra, K. B., & Sen, P. (2004). Phase transitions in an aging network. Physical Review E, 70(5), 056103.

    Article  ADS  Google Scholar 

  68. Hajra, K. B., & Sen, P. (2005). Aging in citation networks. Physica A: Statistical Mechanics and Its Applications, 346(1), 44–48.

    Article  ADS  Google Scholar 

  69. Hajra, K. B., & Sen, P. (2006). Modelling aging characteristics in citation networks. Physica A: Statistical Mechanics and Its Applications, 368(2), 575–582.

    Article  ADS  Google Scholar 

  70. Wang, M., Yu, G., & Yu, D. (2008). Measuring the preferential attachment mechanism in citation networks. Physica A: Statistical Mechanics and Its Applications, 387(18), 4692–4698.

    Article  ADS  Google Scholar 

  71. Dorogovtsev, S. N., & Mendes, J. F. F. (2000). Evolution of networks with aging of sites. Physical Review E, 62(2), 1842.

    Article  ADS  Google Scholar 

  72. Dorogovtsev, S. N., & Mendes, J. F. F. (2001). Scaling properties of scale-free evolving networks: Continuous approach. Physical Review E, 63(5), 056125.

    Article  ADS  Google Scholar 

  73. Zhu, H., Wang, X., & Zhu, J.-Y. (2003). Effect of aging on network structure. Physical Review E, 68(5), 056121.

    Article  ADS  Google Scholar 

  74. Wang, D., Song, C., & Barabási, A.-L. (2013). Quantifying long-term scientific impact. Science, 342(6154), 127–132.

    Article  ADS  Google Scholar 

  75. Wang, J., Mei, Y., & Hicks, D. (2014). Comment on “quantifying long-term scientific impact”. Science, 345(6193), 149–149.

    ADS  Google Scholar 

  76. Ibáñez, A., Larrañaga, P., & Bielza, C. (2009). Predicting citation count of bioinformatics papers within four years of publication. Bioinformatics, 25(24), 3303–3309.

    Article  Google Scholar 

  77. Livne, A., Adar, E., Teevan, J., & Dumais, S. (2013). Predicting citation counts using text and graph mining. In: Proceedings of the iConference 2013 Workshop on Computational Scientometrics: Theory and Applications.

    Google Scholar 

  78. Shibata, N., Kajikawa, Y., & Matsushima, K. (2007). Topological analysis of citation networks to discover the future core articles. Journal of the American Society for Information Science and Technology, 58(6), 872–882.

    Article  Google Scholar 

  79. Sarigöl, E., Pfitzner, R., Scholtes, I., Garas, A., & Schweitzer, F. (2014). Predicting scientific success based on coauthorship networks. EPJ Data Science, 3(1), 1.

    Article  Google Scholar 

  80. Bertsimas, D., Brynjolfsson, E., Reichman, S., & Silberholz, J. M. (2014). Moneyball for academics: Network analysis for predicting research impact. Available at SSRN: http://ssrn.com/abstract=2374581 or http://dx.doi.org/10.2139/ssrn.2374581

  81. Acuna, D. E., Allesina, S., & Kording, K. P. (2012). Future impact: Predicting scientific success. Nature, 489(7415), 201–202.

    Article  ADS  Google Scholar 

  82. Penner, O., Pan, R. K., Petersen, A. M., Kaski, K., & Fortunato, S. (2013). On the predictability of future impact in science. Scientific Reports, 3, 3052.

    Article  ADS  Google Scholar 

  83. De Nicolao, G. (2014, October). Times higher education world university rankings: Science or quackery?. https://www.aspeninstitute.it/aspenia-online/article/international-university-rankings-science-or-quackery

  84. Radicchi, F., Fortunato, S., & Vespignani, A. (2012). Citation networks. In A. Scharnhorst, K. Börner, & P. van den Besselaar (Eds.) Models of science dynamics, understanding complex systems (pp. 233–257). Berlin/Heidelberg: Springer.

    Chapter  Google Scholar 

Download references

Acknowledgements

We are indebted to A.Vespignani and S.Fortunato for the core part of the chapter [84]. F. Radicchi acknowledges the support from the NSF grant SMA-1446078.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Filippo Radicchi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Radicchi, F., Castellano, C. (2015). Understanding the Scientific Enterprise: Citation Analysis, Data and Modeling. In: Gonçalves, B., Perra, N. (eds) Social Phenomena. Computational Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-14011-7_8

Download citation

Publish with us

Policies and ethics