Towards a simple mathematical theory of citation distributions
 959 Downloads
 1 Citations
Abstract
The paper is written with the assumption that the purpose of a mathematical theory of citation is to explain bibliometric regularities at the level of mathematical formalism. A mathematical formalism is proposed for the appearance of power law distributions in social citation systems. The principal contributions of this paper are an axiomatic characterization of citation distributions in terms of the Ekeland variational principle and a mathematical exploration of the power law nature of citation distributions. Apart from its inherent value in providing a better understanding of the mathematical underpinnings of bibliometric models, such an approach can be used to derive a citation distribution from first principles.
Keywords
Bibliometrics Citation distributions Power law distribution Wakeby distributionMathematics Subject Classification
91D30 91D99Background
Scholars have been investigating their own citation practice for too long. Bibliometrics already forced considerable changes in citation practice Michels and Schmoch (2014). Because the overwhelming majority of bibliometric studies focus on the citation statistics of scientific papers (see, e.g., Adler et al. 2009; Albarrán and RuizCastillo 2011; De Battisti and Salini 2013; Nicolaisen 2007; Yang and Han 2015), special attention is devoted to citation distributions (see, inter alia, Radicchi and Castellano 2012; RuizCastillo 2012; Sangwal 2014; Thelwall and Wilson 2014; Vieira and Gomes 2010). However, the fundamentals of the citation distribution (or CD for convenience) are far from being well established and the universal law of CD is still unknown (we do not go into details, and refer the reader to Bornmann and Daniel 2009; Eom and Fortunato 2011; Peterson et al. 2010; Radicchi et al. 2008; RuizCastillo 2013; Waltman et al. 2012). Furthermore, existing bibliometric models of CDs place little or no emphasis on characteristics of the mathematical formalism itself (cf. Egghe 1998; Simkin and Roychowdhury 2007; Zhang 2013).
A mathematical theory of the CDs does not considers social citation system in its actuality. (We prefer to abbreviate social citation system to SCS; for the definition of SCS the reader is referred to, e.g., Fujigaki 1998; RodríguezRuiz 2009; Rousseau and Ye 2012.) This task is completely left to scientometricists. The mathematical theory of CDs is used to investigate a mathematical substitute instead of a real process. For this mathematical substitute, the term mathematical structure has been introduced.
The objective of scientometrics is to bridge a gap between our insights of science and our knowledge of science Mingers and Leydesdorff (2015). A mathematical theory of citation can appear as an attempt to understand the structures that constitute the bases of scientometric models. To “understand” here means to bring a bibliometric structure into congruence with a mathematical structure. The purpose of a mathematical theory is fulfilled if it provides a structure of thought objects that allows us to relate bibliometric data sets and interpret the state of affairs in science by making mathematical deductions. A scientometric model attempts to create a heuristic explanation of an empirical data set. In contrast, a mathematical theory of citation is not concerned with bibliometric data per se and strives to construct a clear and coherent framework that accurately expresses some scientometric propositions in mathematical language. In this way, opportunities emerge for applying sophisticated mathematical concepts to bibliometric phenomena. The difference between a bibliometric model and a mathematical theory of citation is more apparent than real because, although the concepts of bibliometrics can be analyzed in terms of mathematics, they cannot be eliminated in favor of the latter without losing the understanding gained by bibliometrics. In particular, a firm foundation for a mathematical theory of citation can be obtained only phenomenologically by comparing the consequences of basic mathematical statements to bibliometric data.
Motivation
We will study the axioms on which a mathematical description of SCS can be based. The author risks asserting that a mathematical theory appears to be a systematic reformulation of the problem of cumulative CDs on a purely mathematical basis. That is the main intent of this paper. Before we proceed with the analysis, we remark that there are no strong arguments leading from the bibliometric facts to the axioms. However, as we hope to show below, one can obtain additional conceptual information (relating to SCS) that is not readily available from a conventional bibliometric model by means of the axioms.
Purpose
The purpose of the research reported in this article is to provide a simple and coherent presentation of CDs based on the Ekeland variational principle. We stress the elementary variational principle governing the state of SCS and have also attempted to provide enough technical detail to create a basis for potential future studies.
Methodology
The paper addresses the construction of structural hypotheses for “how SCS works” rather than statistical inferences from bibliometric data. We accept that the continuous reproduction of a scientific inequality is a conceptual basis for almost all SCSs (cf. Bourdieu 2004). An emphasis is placed on the role of the variational principle as a valid approach for describing the local behavior of an continuous SCS. We consider an SCS to obey the following scheme. Suppose an SCS is a sufficiently smooth “motion” to ensure the consistency and the integrity of citations. In phase space, this condition is equivalent to a variational principle that produces the Euler equation for the weak form of a CD. This variational principle asserts that, for an appropriate functional, one can add a small perturbation to make it attain a minimum.
Preliminaries
In the language of \(\mathsf {P}(Z \in B)\), the PDF \(f(\cdot )\) is (almost everywhere) given by formal differentiating; as a result of this, a rather simple interpretation of \(f(\cdot )\) can be given in the framework of Sobolev spaces \(H^{k}(\mathbb {R})\). (For the definitions and properties of Sobolev spaces, see Maz’ya 2011.)
Results
Because \(\varphi (\cdot )\) and \(\zeta\) are so fundamental in this paper, it may seem strange that we have not explicitly defined them in formal mathematical terms. As with other primitive objects of the mathematical theory, the most one can do is to give the implicit definitions by postulating the properties that hold for \(\varphi (\cdot )\) and \(\zeta\).
 \(\mathbf {A_{1}}\)
 A function \(\varphi (\cdot )\) for \(\forall v \in \bigl ( V, \Vert \cdot \Vert \bigr )\) is a proper (\(\mathrm {dom}\,\varphi \not = \emptyset\)), lower semicontinuous (\(\varphi (v)\le \liminf _{n\rightarrow \infty }\varphi (v_{n})\) if \(v_{n}\rightarrow v\)), convex, and bounded below (\(\inf _{V}\varphi > \infty\)) function from \(\bigl (V, \Vert \cdot \Vert \bigr )\) to \(\mathbb {R}_{+}\), satisfying the following condition:$$\begin{aligned}(\forall v\in V)(\forall c\in \mathbb {R}):\varphi (u+v)=\varphi (u)+\varphi (v)\Leftrightarrow v= cu. \end{aligned}$$
 \(\mathbf {A_{2}}\)

Among all admissible \(\zeta\), the quantity \(\zeta _{*}\) which actually describes a given CD, is assigned in such a way that the function \(\varphi (\cdot )\) reaches its minimum.
 \(S_{1}\)
 There exists a unique \(\zeta \in E\) such thatThe formula (5) reads as the “weak” Euler equation in the current setting.$$\begin{aligned} \bigl (\forall v\in E \bigr ):\bigl ( \zeta , v \bigr )_{E} = 0. \end{aligned}$$(5)
 \(S_{2}\)
 \(\zeta\) is obtained by$$\begin{aligned} \min _{v\in {E}}\left\{ \frac{1}{2}\,\varphi (v) \right\} . \end{aligned}$$(6)
Goodness of fit—summary
Distribution  Kolmogorov–Smirnov \({\alpha = 0.1} ,\) Crit. val. 0.0013 Statistic D  Anderson–Darling \({\alpha = 0.1} ,\) Crit. val. 1.929 Statistic A 

WD  0.0406  1449.0 
Lognormal  0.1062  \(1{.}258\mathrm {E}+5\) 
Weibull  0.1262  \(1{.}319\mathrm {E}+5\) 
Discussion
One of the most exciting and fruitful applications of mathematical methods in the natural sciences is the variational principle. The substantive aim of the present paper is the derivation of a variational principle, which makes it possible to interpret the empirical regularities of the CDs as a logical necessity. Starting from the famous Ekeland variational principle, we show that the derivation of the CDs given in this paper might be considered a step in indicated above direction. Using the variational principle (6) in the energetic space E together with empirical evidences about the existence of the slowly varying functions representing the right tail of the CDs allows us to introduce the WD (and the GPD) naturally.
Let us stress that modest mathematical means concerning some simple facts of functional analysis yield a simple mathematical theory of CDs from which, as its consequence, concrete CDs are immediate derived. It is remarkable that a firstprinciples derivation of the CDs (e.g., GPD) in a bibliometric model is possible at the price of uncontrollable assumptions, which are justified a posteriori. On the contrary, in our derivation it is only assumed that Eq. (8) is relevant. This is, of course, more satisfactory. However, note that there are no proper bibliometric reasons for which the Sobolev spaces are preferred over any other, and, therefore there are also no reasons to give the vague bibliometric meaning of the consistency and the integrity of citations the mathematical form of Ekeland’s variational principle.
One must bear in mind that our result refers to properties of some “pure mathematical structure”. Like any mathematical result, the Eq. (13) cannot give a completely accurate description of a empirical CD. Moreover, in the mathematical theory of CDs, “by construction”, we have no direct knowledge of the statistical parameters. Thus, we can only measure the parameters that index the CDs, not compute them from the axioms.
Conclusions
In summary, the approach suggested here allows an interpretation of the Ekeland variational principle in terms of the standard uniform RV, which may have some interest. It is shown that in a sufficiently “smooth” SCS a powerlaw tail of the static CD can appear. However, there are no grounds to consider this a mathematical model underlying bibliometric theory. At the same time, the present study may be instructive beyond the specific research site and can contribute to a mathematical theory of CDs building.
Notes
Acknowledgements
The financial support from the Government of the Russian Federation within the framework of the Basic Research Program at the National Research University Higher School of Economics and within the framework of implementation of the 5100 Programme Roadmap of the National Research University Higher School of Economics is acknowledged.
Competing interests
The author declare that he have no competing interests.
References
 Adler R, Ewing J, Taylor P et al (2009) Citation statistics. Stat Sci 24(1):1. doi: 10.1214/09STS285 CrossRefGoogle Scholar
 Albarrán P, Crespo JA, Ortuño I, RuizCastillo J (2011) The skewness of science in 219 subfields and a number of aggregates. Scientometrics 88(2):385–397. doi: 10.1007/s1119201104079 CrossRefGoogle Scholar
 Albarrán P, RuizCastillo J (2011) References made and citations received by scientific articles. J Am Soc Inf Sci Technol 62(1):40–49. doi: 10.1002/asi.21448 CrossRefGoogle Scholar
 Attouch H, Buttazzo G, Michaille G (2014) Chapter 6: Variational problems: Some classical examples. In: Variational Analysis in Sobolev and BV Spaces: Applications to PDEs and Optimization. MOSSIAM Series on Optimization. Society for Industrial and Applied Mathematics: Mathematical Programming Society, Philadelphia, pp 219–284. doi: 10.1137/1.9781611973488.ch6
 Aubin JP, Ekeland I (2006) Applied Nonlinear Analysis. Dover Books on Mathematics Series. Dover Publications, Mineola. URL: http://store.doverpublications.com/0486453243.html
 Bornmann L, Daniel HD (2009) Universality of citation distributions—a validation of Radicchi et al’.s relative indicator \(cfc/c_0\) at the micro level using data from chemistry. J Am Soc Inform Sci Technol 60(8):1664–1670. doi: 10.1002/asi.21076 CrossRefGoogle Scholar
 Borovkov AA (2013) The basic properties of regularly varying functions and subexponential distributions. In: Probability Theory. Universitext, Springer, London, pp 665–685. doi: 10.1007/9781447152019
 Bourdieu P (2004) Science of Science and Reflexivity. University of Chicago Press, Chicago. URL:http://www.press.uchicago.edu/ucp/books/book/chicago/S/bo3630402.html
 Brzezinski M (2015) Power laws in citation distributions: evidence from Scopus. Scientometrics 103(1):213–228. doi: 10.1007/s111920141524z CrossRefGoogle Scholar
 De Battisti F, Salini S (2013) Robust analysis of bibliometric data. Stat Methods Appl 22(2):269–283. doi: 10.1007/s1026001202170 CrossRefGoogle Scholar
 Deville R, Ghoussoub N (2001) Perturbed minimization principles and applications. In: Johnson WB, Lindenstrauss J (eds) Handbook of the geometry of Banach Spaces. vol 1, Elsevier Science B.V., Amsterdam, pp 393–435. doi: 10.1016/S18745849(01)800127
 Egghe L (1998) Mathematical theories of citation. Scientometrics 43(1):57–62. doi: 10.1007/BF02458394 CrossRefGoogle Scholar
 Egghe L (2005) Power Laws in the Information Production Process: Lotkaian Informetrics. Elsevier / Academic Press, Kidlington, Oxfordshire. doi: 10.1108/S18760562(2005)0000005004
 Ekeland I (1974) On the variational principle. J Math Anal Appl 47(2):324–353. doi: 10.1016/0022247X(74)900250 CrossRefGoogle Scholar
 Eom YH, Fortunato S (2011) Characterizing and modeling citation dynamics. PLoS One 6(9):24926. doi: 10.1371/journal.pone.0024926 CrossRefGoogle Scholar
 Fujigaki Y (1998) The citation system: citation networks as repeatedly focusing on difference, continuous reevaluation, and as persistent knowledge accumulation. Scientometrics 43(1):77–85. doi: 10.1007/BF02458397 CrossRefGoogle Scholar
 Golosovsky M, Solomon S (2014) Uncovering the dynamics of citations of scientific papers. ArXiv eprints. /hyperimagehttp://arixiv.org/abs/1410.0343arXiv:1410.0343Google Scholar
 Golosovsky M, Solomon S (2012) Runaway events dominate the heavy tail of citation distributions. Eur Phys J Spec Top 205(1):303–311. doi: 10.1140/epjst/e2012015764 CrossRefGoogle Scholar
 Gupta HM, Campanha JR, Pesce RAG (2005) Powerlaw distributions for the citation index of scientific publications and scientists. Braz J Phys 35:981–986. doi: 10.1590/S010397332005000600012 CrossRefGoogle Scholar
 Hosking JRM, Wallis JR (2005) Regional Frequency Analysis: an Approach Based on \(L\)moments. Cambridge University Press, Cambridge. URL:http://www.cambridge.org/9780521430456Google Scholar
 Ioffe AD, Tikhomirov VM (1997) Some remarks on variational principles. Math Notes 61(2):248–253. doi: 10.1007/BF02355736 CrossRefGoogle Scholar
 Johnson WB, Lindenstrauss J (2001) Basic concepts in the geometry of Banach spaces. In: Johnson WB, Lindenstrauss J (eds) Handbook of the geometry of Banach spaces, vol 1, Elsevier Science B.V., Amsterdam, pp 1–84. doi:10.1016/S18745849(01)800036Google Scholar
 Johnson NL, Kotz S, Balakrishnan N (2010) Continuous univariate distributions, vol 1, 3rd edn., Wiley Series in Probability and Statistics SeriesJohn Wiley & Sons Incorporated, New YorkGoogle Scholar
 Katchanov YL, Markova YV (2015) On a heuristic point of view concerning the citation distribution: introducing the Wakeby distribution. SpringerPlus 4:94. doi: 10.1186/s4006401508211 CrossRefGoogle Scholar
 Kristály A, Rădulescu VD, Varga C (2010) Elliptic systems of gradient type. In: Variational Principles in Mathematical Physics, Geometry, and Economics: Qualitative Analysis of Nonlinear Equations and Unilateral Problems. Encyclopedia of Mathematics and its Applications, vol 136. Cambridge University Press, Cambridge, pp 117–145. URL:http://www.cambridge.org/9780521117821Google Scholar
 Maz’ya V (2011) Basic properties of Sobolev spaces. In: Sobolev spaces with spplications to elliptic partial differential equations. Grundlehren der mathematischen Wissenschaften, vol 342. Springer, Berlin, pp 1–121 . doi: 10.1007/97836421556421
 Michels C, Schmoch U (2014) Impact of bibliometric studies on the publication behaviour of authors. Scientometrics 98(1):369–385. doi: 10.1007/s1119201310157 CrossRefGoogle Scholar
 Mingers J, Leydesdorff L (2015) A review of theory and practice in scientometrics. Eur J Oper Res 246(1):1–19. doi: 10.1016/j.ejor.2015.04.002 CrossRefGoogle Scholar
 Nicolaisen J (2007) Citation analysis. Ann Rev Inform Sci Technol 41(1):609–641. doi: 10.1002/aris.2007.1440410120 CrossRefGoogle Scholar
 Peterson GJ, Pressé S, Dill KA (2010) Nonuniversal power law scaling in the probability distribution of scientific citations. Proc Natl Acad Sci 107(37):16023–16027. doi: 10.1073/pnas.1010757107 CrossRefGoogle Scholar
 Radicchi F, Fortunato S, Castellano C (2008) Universality of citation distributions: toward an objective measure of scientific impact. Proc Natl Acad Sci 105(45):17268–17272. doi: 10.1073/pnas.0806977105 CrossRefGoogle Scholar
 Radicchi F, Castellano C (2015) Understanding the scientific enterprise: citation analysis, data and modeling. In: Gonçalves B, Perra N (eds) Social Phenomena. Computational Social Sciences. Springer, Cham, pp 135–151. doi: 10.1007/97833191401178
 Radicchi F, Castellano C (2012) Testing the fairness of citation indicators for comparison across scientific domains: The case of fractional citation counts. J Informetr 6(1):121–130. doi: 10.1016/j.joi.2011.09.002 CrossRefGoogle Scholar
 Radicchi F, Fortunato S, Vespignani A (2012) Citation networks. In: Scharnhorst A, Börner K, van den Besselaar P (eds) Models of Science Dynamics. Understanding Complex Systems. Springer, Berlin, pp 233–257. doi: 10.1007/97836422306847
 Redner S (1998) How popular is your paper? An empirical study of the citation distribution. Eur Phys J B Condens Matter Complex Syst 4(2):131–134. doi: 10.1007/s100510050359 CrossRefGoogle Scholar
 Redner S (2005) Citation statistics from 110 years of Physical Review. Phys Today 85(6):49–54. doi: 10.1063/1.1996475 CrossRefGoogle Scholar
 RodríguezRuiz O (2009) The citation indexes and the quantification of knowledge. J Educ Adm 47(2):250–266. doi: 10.1108/09578230910941075 CrossRefGoogle Scholar
 Rousseau R, Ye FY (2012) Basic independence axioms for the publicationcitation system. J Scientometr Res 1(1):22–27. doi: 10.5530/jscires.2012.1.6 CrossRefGoogle Scholar
 RuizCastillo J (2012) The evaluation of citation distributions. SERIEs 3(1–2):291–310. doi: 10.1007/s1320901100743 CrossRefGoogle Scholar
 RuizCastillo J (2013) The role of statistics in establishing the similarity of citation distributions in a static and a dynamic context. Scientometrics 96(1):173–181. doi: 10.1007/s1119201309543 CrossRefGoogle Scholar
 Sangwal K (2014) Distributions of citations of papers of individual authors publishing in different scientific disciplines: Application of Langmuirtype function. J Informetr 8(4):972–984. doi: 10.1016/j.joi.2014.09.009 CrossRefGoogle Scholar
 Simkin MV, Roychowdhury VP (2007) A mathematical theory of citing. J Am Soc Inf Sci Technol 58(11):1661–1673. doi: 10.1002/asi.20653 CrossRefGoogle Scholar
 Thelwall M, Wilson P (2014) Distributions for cited articles from individual subjects and years. J Informetr 8(4):824–839. doi: 10.1016/j.joi.2014.08.001 CrossRefGoogle Scholar
 Vieira ES, Gomes JANF (2010) Citations to scientific articles: Its distribution and dependence on the article features. J Informetr 4(1):1–13. doi: 10.1016/j.joi.2009.06.002 CrossRefGoogle Scholar
 Wallace ML, Larivière V, Gingras Y (2009) Modeling a century of citation distributions. J Informetr 3(4):296–303. doi: 10.1016/j.joi.2009.03.010 CrossRefGoogle Scholar
 Waltman L, van Eck NJ, van Raan AFJ (2012) Universality of citation distributions revisited. J Am Soc Inf Sci Technol 63(1):72–77. doi: 10.1002/asi.21671 CrossRefGoogle Scholar
 Wang XW, Zhang LJ, Yang GH, Xu XJ (2013) Modeling citation networks based on vigorousness and dormancy. Modern Phys Lett B 27(22):1350155. doi: 10.1142/S0217984913501558 CrossRefGoogle Scholar
 Yang S, Han R (2015) Breadth and depth of citation distribution. Inform Process Manag 51(2):130–140. doi: 10.1016/j.ipm.2014.12.003 CrossRefGoogle Scholar
 Yao Z, Peng XL, Zhang LJ, Xu XJ (2014) Modeling nonuniversal citation distributions: the role of scientific journals. J Stat Mech Theor Exp 2014(4):04029. doi: 10.1088/17425468/2014/04/P04029 CrossRefGoogle Scholar
 Zeidler E (1999) Selfadjoint operators, the Friedrichs extension, and the partial differential equations of mathematical physics. In: Applied functional analysis. Applied Mathematical sciences, vol 108. Springer, New York, pp 253–424. doi: 10.1007/97814612081505
 Zhang CT (2013) A novel triangle mapping technique to study the \(h\)index based citation distribution. Sci Rep 3:1023. doi: 10.1038/srep01023 Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.