Comparison to Existing Models

Golosovsky, Michael

doi:10.1007/978-3-030-28169-4_9

Comparison to Existing Models

Michael Golosovsky²¹

Chapter
First Online: 27 September 2019

570 Accesses

Part of the book series: SpringerBriefs in Complexity ((BRIEFSCOMPLEXITY))

Abstract

We make a survey of models of citation dynamics and focus on the preferential attachment and fitness models. We show that under certain realistic conditions these models are equivalent. In order to find the microscopic foundations of the preferential attachment mechanism, we analyze theoretically and experimentally several citation networks and demonstrate that, for a broad fitness distribution, this mechanism reduces to the fitness model. The fitness model yields the long-sought explanation for the initial attractivity K ₀, an elusive parameter which was left unexplained within the framework of the empirical preferential attachment model. We show that the initial attractivity is determined by the width of the fitness distribution. We compare the preferential attachment and fitness models to our microscopic model of citation dynamics based on recursive search and show that our model contains both these phenomenological models.

Download chapter PDF

9.1 Preferential Attachment Mechanism

9.1.1 Theoretical Model

To explain the power-law distribution of citations of scientific papers, de Solla Price suggested his cumulative advantage model [138]. This model assumes a network consisting of papers that appear with constant rate N, each paper extending ∼ R ₀ references to older ones. The probability of a new paper i to cite an old paper j is

$$\displaystyle \begin{aligned} \varPi_{ij}\propto(K_{j}+K_{0}), {} \end{aligned} $$

(9.1)

where K _j is the number of citations of paper j. The initial attractivity K ₀ ensures that new papers become cited as well (de Solla Price denoted K ₀ by c). The cumulative advantage mechanism captured by Eq. 9.1 yields a power-law citation distribution, p(K) ∼ K ^−ν, with the exponent

$$\displaystyle \begin{aligned} \nu=2+\frac{K_{0}}{R_{0}}. {} \end{aligned} $$

(9.2)

In the absence of any clue, Price postulated K ₀ = 1. Since R ₀ >> 1, Eq. 9.2 yields ν slightly exceeding 2.

Following proliferation of digitized information in 1990s, a number of information, biological, and social complex networks came to the forefront of scientific research, most of them exhibiting power-law degree distributions with ν ∼ 3 [7, 15, 28, 121]. To account for these distributions, Barabasi and Albert suggested the preferential attachment model [3] which is very similar to but not identical with the Price’s cumulative advantage. With regards to citation network, the Barabasi-Albert model assumes that a new paper i cites an old paper j with probability

$$\displaystyle \begin{aligned} \varPi_{ij}\sim (K_{j}+R_{0}). {} \end{aligned} $$

(9.3)

Equation 9.3 yields ν = 3 and this value better fits the observations.

In what follows, we do not make distinction between the Price’s and Barabasi-Albert approaches and consider Eq. 9.1 with unspecified K ₀ as the preferential attachment model. Its success in explaining the seemingly universal power-law degree distribution in complex networks prompted a flurry of theoretical generalizations, including nonlinear attachment rule [88] and aging [47, 73, 127, 176, 178]. The generalized preferential attachment model is captured by the following equation,

$$\displaystyle \begin{aligned} \varPi_{ij}\propto A(t)(K_{j}+K_{0})^{\zeta}, {} \end{aligned} $$

(9.4)

where t = t _i − t _j is the age of paper j with respect to paper i, ζ is the attachment exponent, and A(t) is the aging function which is different from that appearing in the model of Chap. 3.

9.1.2 Model Validation by Measurements

Straightforward verification of the preferential attachment model, as given by Eq. 9.4, requires analysis of decisions made by the authors of the papers and, to the best of our knowledge, this approach has not been implemented so far. The common approach is to trace citation dynamics of individual papers. To this end, the perspective is shifted from the citing paper to the cited paper and this results in the following equation,

$$\displaystyle \begin{aligned} \varDelta K_{j}=\tilde{A}(t_{j})(K_{j}+K_{0})^{\zeta}\varDelta t, {} \end{aligned} $$

(9.5)

where ΔK _j = k _j(t)Δt, k _j(t) is the citation rate of the paper j, the aging function is $\tilde {A}(t)=\frac {A(t)}{\sum _{l}A(t_{l})[K_{l}+K_{0}]^{\zeta }}R_{0}N$, and N is the annual number of publications. [Note difference between A(t) and $\tilde {A}(t)$: for the Barabasi-Albert model A(t) = 1 while $\tilde {A}(t)=\frac {2N}{t}$. Both A(t) and $\tilde {A}(t)$ are different from the aging functions appearing in the model of Chap. 3.]

The measurements of Refs. [90, 95, 113, 131, 134] verified that the growth of citation networks indeed follows Eq. 9.5 although some of these measurements exhibited preferential attachment (ΔK _j grows with K _j) only for papers with low and moderate K _j while the highly-cited papers frequently exhibit anti-preferential attachment (ΔK _j decreases with K _j) [32, 90].

With respect to linearity, early measurements claimed linear or close-to-linear (ζ ≈ 1) preferential attachment [83, 122] while subsequent measurements for large datasets of scientific papers [70, 77] and patent citations [44, 76] revealed superlinear attachment with the exponent ζ ∼ 1.25.

The measurements of initial attractivity posed a significant challenge to the preferential attachment model. Indeed, for citations to scientific papers, the initial attractivity is very small K ₀ ∼ 1 [71, 76]. Patent citations also yield small K ₀ ∼ 1 [44, 77]. Such small initial attractivioty better conforms to Price’s conjecture K ₀ = 1 than to the Barabasi-Albert conjecture, K ₀ = R ₀. However, K ₀ ∼ 1 yields too small an exponent of the power-law citation distribution which is incompatible with observations [44, 49, 71, 83].

Several specific predictions of the preferential attachment model are also at odds with observations. In particular, this model predicts the first mover advantage, namely, strong positive correlation between the paper’s age and the number of citations it garners. However, the measurements reveal only weak such correlation [124, 125]. The model predicts that citation dynamics of papers decelerates with time and the citation trajectories of the papers published in the same year should be very similar. However, the measurements show that these trajectories strongly diverge [70, 87] and do not necessarily decelerate with time. In particular, there are “sleeping beauties” [84], the papers whose citation trajectories accelerate with time. Also, the preferential attachment model predicts that citation distributions for the papers published in one year should be narrow and close to the exponential [7, 121], while the measurements show that these distributions are wide and close to the power-law or log-normal [72, 87, 144]. Finally, the preferential attachment model shows that the assortativity of the growing complex network is determined by the initial attractivity K ₀, in such a way that for small and positive initial attractivity (as it has been measured in many networks) the network should be disassortative [7, 122]. This does not fit measurements which reveal that citation networks are weakly assortative [7, 60, 72, 149, 180].

9.2 Fitness-Based Preferential Attachment

When the preferential attachment model is applied for quantitative account of real growing complex networks, it meets too many difficulties. The most popular solution is to introduce fitness [14], an attribute that characterizes the propensity of a node to attract edges (the ability of a paper to draw citations). The fitness is an empirical attribute that should be found from measurements, although there were several attempts to associate it with node relevance [107], local clustering coefficient [5], node’s rank [54], and PageRank coefficient [187].

9.2.1 Multiplicative Fitness

How fitness can be incorporated into dynamic equation of network growth? The Bianconi-Barabasi model [14] introduces fitness on top of the preferential attachment. In the context of citation networks, this model postulates that the probability of paper i to cite paper j is the product of paper’s fitness η _j and the number of its previous citations, namely

$$\displaystyle \begin{aligned} \varPi_{ij}\propto\eta_{j}K_{j}. {} \end{aligned} $$

(9.6)

Solution of Eq. 9.6 yields citation trajectories K _j(t) which strongly depend on fitness, in such a way that a high-fitness latecomer can outperform a low-fitness old paper. Thus, fitness solves the problem of the first-mover advantage. It solves other problems as well, in particular, citation distribution for the papers of the same age is determined more by fitness distribution rather than by the number of previous citations. The most striking prediction of the Bianconi-Barabasi model is the presence of supercritical papers that take a lion share of all citations. Such supercritical papers were indeed observed [9, 67] and this non-intuitive prediction brought a wide popularity to the Bianconi-Barabasi model [12, 28, 33, 61, 175].

However, Eq. 9.6 represents only conceptual model. To convert it into a quantitative tool that accounts for real-life citation dynamics, Wang et al. [175] added initial attractivity K ₀ and aging constant A(t), while Pham et al. [134, 135] added nonlinearity. The most general form of Eq. 9.6 reads

$$\displaystyle \begin{aligned} \varPi_{ij}=\eta_{j}A_{j}(t)(K_{j}+K_{0})^{\zeta}. {} \end{aligned} $$

(9.7)

Wang, Song, and Barabasi parametrize their aging function by the log-normal representation [they denoted it by P _j(t) while we denote it here by A _j(t)], $A_{j}(t)=\frac {1}{\sqrt {2\pi }\sigma _{j} t}e^{{-\left (\frac {(\ln {t}-\mu _{j})^{2}}{2\sigma _{j}^{2}}\right )}}$ where μ _j and σ _j are parameters which are specific for each paper.

While Eq. 9.7 solved many problems, its success came at the price of too many empirical parameters which are required to describe citation trajectory of a paper: in addition to the number of previous citations K _j and age t _j, Eq. 9.7 adds fitness η _j, exponent ζ, and two parameters μ _j, and σ _j which characterize the aging function for each paper.

9.2.2 Additive Fitness

The fitness can be also introduced through optimization procedure (Ref. [128]) or through the following equation [11, 49, 50, 108]

$$\displaystyle \begin{aligned} \varPi_{ij}\propto (K_{j}+\eta_{j}), {}\end{aligned} $$

(9.8)

which is nothing else but the preferential attachment model in which fitness η _j replaces the initial attractivity K ₀.

The growth dynamics described by Eqs. 9.7 and 9.8 are not that different as it could seem. In fact, the combination of nonlinear preferential attachment (Eq. 9.5) with additive fitness (Eq. 9.8) mimics Eq. 9.6, in particular, it yields supercritical nodes. To demonstrate this, we adopt continuous approximation and replace ΔK _j in Eq. 9.5 by $\frac {dK_{j}}{dt}\varDelta t$. In view of Eq. 9.8, Eq. 9.5 can be recast as follows:

$$\displaystyle \begin{aligned} \frac{dK}{dt}=\tilde{A}(t)(K+\eta)^{\zeta}, {}\end{aligned} $$

(9.9)

where we replaced K ₀ by η and dropped index j, for brevity. We introduce ε = ζ − 1, solve Eq. 9.9 for ε > 0, and find

$$\displaystyle \begin{aligned} K(t)=\frac{\eta}{\left[1-\epsilon\eta^{\epsilon} \int_{0}^{t}\tilde{A}(\tau)d\tau\right]^{\frac{1}{\epsilon}}}-\eta. {} \end{aligned} $$

(9.10)

To analyze Eq. 9.10, we assume that the integral $\int _{0}^{t}\tilde {A}(\tau )d\tau $ converges as t →∞. This assumption allows us to introduce $\eta _{crit}=\left [\epsilon \int _{0}^{\infty } \tilde {A}(\tau )d\tau \right ]^{-\frac {1}{\epsilon }}$, in such a way that Eq. 9.10 reduces to

$$\displaystyle \begin{aligned} K(t)=\frac{\eta}{\left[1-\left(\frac{\eta}{\eta_{crit}}\right)^{\epsilon} \frac{\int_{0}^{t}\tilde{A}(\tau)d\tau}{\int_{0}^{\infty}\tilde{A}(\tau)d\tau}\right]^{\frac{1}{\epsilon}}}-\eta, {} \end{aligned} $$

(9.11)

where t is the paper’s age. For η < η _crit, Eq. 9.11 yields K(t) that increases with time and eventually achieves saturation, $K(\infty )=\frac {\eta }{\left [1-\left (\frac {\eta }{\eta _{crit}} \right )^{\epsilon }\right ]^{\frac {1}{\epsilon }}}-\eta $. However, for η ≥ η _crit, K(t) does not achieve saturation and the node becomes supercritical, namely, the number of its citations undergoes a finite-time singularity at certain t ₀, in such a way that Eqs. 9.10 and 9.11 hold only for t < t ₀. Thus, for superlinear preferential attachment, 𝜖 > 0, Eq. 9.8 predicts the supercritical nodes—exactly as Eq. 9.6 does.

9.3 Fitness-Only Models

The fitness model, suggested by Caldarelli et al. [29] and further developed in Refs. [11, 61, 104, 126, 154], considers a general complex network. This model departs from the idea of preferential attachment and assumes that the probability of attachment between a new node i and the target node j depends only on their fitnesses, η _i and η _j, and does not depend on their degrees. For citation networks, Ref. [29] assumed that the probability of paper i to cite paper j is Π _ij = f(η _i, η _j) where η _i and η _j are paper’s fitnesses, and f(η _i, η _j) is the symmetric function of its arguments (linking function). Ref. [29] considered additive linking function but the later publication of the same group [150] introduced multiplicative linking function, f(η _i, η _j) ∼ η _i η _j. The latter assumption became more popular and it allows the following generalization. Consider an old paper j. If fitness is determined by similarity and all papers under consideration belong to the same field, then the paper j will garner citations with the rate $\varDelta K_{j}\propto \overline {\eta _{i}}\eta _{j}$ where $\overline {\eta _{i}}$ is the average fitness of new papers. This average fitness can be absorbed into the aging function, in such a way that the probability of a new paper i to cite an older paper j is

$$\displaystyle \begin{aligned} \varPi_{ij}\sim \eta_{j}A(t_{j}). {} \end{aligned} $$

(9.12)

What is fitness? On the one hand, the fitness includes the notion of similarity known as homophily in social networks [20]. Indeed, citation networks consist of communities and subcommunities. New papers tend to attach to similar papers, those belonging to the same community. To measure similarity, Refs. [39, 108] suggested to use overlap of contents or bibliographies, while Refs. [101, 113, 122] suggested to use the overlap of common neighbors. Another ingredient of fitness is associated with quality or talent. This component is not easy to estimate when the paper first appears, it can be measured only after it already garnered some citations.

9.4 Explanatory Models

The preferential attachment model is phenomenological, namely, it makes plausible but unsubstantiated assumptions regarding the mechanism by which the papers acquire citations, in particular, it deduces this mechanism (rich-gets-richer) from the citation distributions. According to the preferential attachment model, the algorithm used by an author of a new paper to choose his references is as follows. The author considers the number of citations of all papers and chooses a paper to cite accordingly. Thus, each author shall know the number of citations garnered by all papers. However, in the pre-Internet era this number was known only to those few authors who had access to Science Citation Index. [After appearance of Internet and especially Google Scholar, the number of citations can be found using a few clicks. Once the preferential attachment model gained popularity in the scientific community, it became a self-fulfilling prophecy but its microscopic origin in the pre-internet era remains obscure, at least for citation networks.]

Moreover, the preferential attachment mechanism of network growth presents a major conceptual difficulty because it is global and not local. Indeed, Eqs. 9.5 and 9.7 imply that each incoming node shall know degrees of all other nodes. Although this can be true for collaboration and some other social networks [35, 36], in general, a new node has bounded knowledge—it is familiar only with a limited set of nodes. With the proliferation of informational databases such as Google Scholar, Scopus, ISI Web of Science, etc., global information on many complex networks became easily accessible. In particular, for citation networks, the incentive to cite a certain paper nowadays may indeed come from the number of previous citations. Thus, the preferential attachment model is becoming a self-fulfilling prophecy.

While we showed here that the fitness model explains the initial attractivity, it can hardly serve as a basis for the explanation of citation network growth since it is too phenomenological and devoid of specific details characterizing real networks.

There are several non-phenomenological models of citation dynamics based on a some realistic algorithm which the authors use to choose their references. The most important class of such models is the recursive search [154, 170] also known as link copying or redirection [89], random walk or local search [66, 82], triple (triangle) formation [179], triadic closure [106], or forest fire model [96, 162]. This algorithm assumes that an author of a new paper chooses randomly an older paper and includes it into his reference list. Then he explores the reference list of the newly chosen paper and copies one [82] or all [89, 92] of its references. This is a one-level recursive search while Refs. [91, 96, 162] considered a multilevel recursive search, whereby an author explores reference lists of all previously chosen papers. Vazquez [170] showed that one-level recursive search mechanism reduces to the following probability of a new paper i to cite a target paper j,

$$\displaystyle \begin{aligned} \varPi_{ij}= \lambda+ qK_{j}. {}\end{aligned} $$

(9.13)

Here, λ is the probability of random search, qK _j is the probability of recursive search, and K _j is the number of citations of the target paper. Thus, Eq. 9.13 reduces to Eq. 9.1 and it was long considered as a microscopic justification of the preferential attachment mechanism. Note, however, that the derivation of Eq. 9.13 by Ref. [170] was based on very simplifying assumptions: all papers have the same probability of being randomly-chosen, only one ancestor of the randomly-found paper is chosen by a new paper, there is no aging and no memory, etc. Of these assumptions, the absence of aging is crucial for the derivation of Eq. 9.13. In the presence of aging, the recursive search mechanism yields

$$\displaystyle \begin{aligned} \varPi_{ij}= \lambda+ q\varDelta K_{j}(t-\varDelta) {} \end{aligned} $$

(9.14)

where ΔK _j(t − Δ) is the number of citations garnered by the paper in the time window t, t − Δ and Δ is the characteristic memory span. As we have shown in previous chapters, Eq. 9.14 captures citation dynamic of papers more realistically than Eq. 9.13. However, this equation is very different from the preferential attachment model, especially for short Δ.

In this book we have demonstrated a realistic model of citation dynamics which embeds fitness-based search into recursive search and takes proper account of the aging, memory, and topology of citation network. In fact, we put more flesh on the bones of the recursive search algorithm developed by Vasquez [170] and converted it into quantitative tool to account for citation dynamics of scientific papers [72]. Our model combines the fitness model and the preferential attachment model (albeit with memory, following approach of Refs. [65, 109, 118, 127, 145, 176]). This model assumes a reasonable algorithm which the author follows when composes his reference list. Although our model does not assume preferential attachment, citation dynamics generated by this model follows Eq. 9.22 as all fitness-based models do. Comparison of Figs. 5.5 and 9.2 shows this quite convincingly. These two figures look very similar although the latter was generated using a simple fitness model and the former was generated using our recursive search model. And both of them look as if they were generated using a preferential attachment model!

9.5 Equivalence Between Preferential Attachment and Fitness Models

While both Caldarelli and Bianconi-Barabasi fitness models capture the node’s quality/similarity, they are mathematically different. Indeed, the Bianconi-Barabasi’s fitness has been introduced on top of the preferential attachment through Eq. 9.6, while Caldarelli’s fitness has nothing to do with the preferential attachment. We claim, however, that the Caldarelli’s fitness-only model is in some sense equivalent to the classical preferential attachment model.

To demonstrate this equivalence, we consider a complex network that grows according to Eq. 9.12. Every node is endowed with a certain fitness η that remains constant during node’s life. This fitness is drawn from the fitness distribution ρ(η) where$\int _{0}^{\infty }\rho (\eta )d\eta =1$. We also assume that ΔK, the number of new edges garnered by a node during time window (t, t + Δt), is represented by the Poisson distribution, $\frac {\lambda ^{\varDelta K}}{\varDelta K!}e^{-\lambda }$, where Poissonian rate,

$$\displaystyle \begin{aligned} \lambda=\eta A(t)\varDelta t, {} \end{aligned} $$

(9.15)

is determined by the node’s fitness η. A(t) is the normalized aging function. Under this constraint, the fitness η is the node’s degree in the long-time limit, namely, η ≈ K ^∞.

Since Eq. 9.15 is memoryless, the number of edges that each node garners through the period from t = 0 to t also follows Poisson distribution with the node-specific rate

$$\displaystyle \begin{aligned} \varLambda =\eta \int_{0}^{t}A(\tau)d\tau. {} \end{aligned} $$

(9.16)

We consider the set of N nodes that joined the network at the same moment which we set as t = 0. Among these, we focus on the subset of nodes that garnered K edges by time t. Their number is

$$\displaystyle \begin{aligned} n(K,t)=N\int_{0}^{\infty}\frac{\varLambda^{K}}{K!}e^{-\varLambda}\rho(\eta)d\eta, {} \end{aligned} $$

(9.17)

where Λ depends on η in accordance with Eq. 9.16. During time window (t, t + Δt), each of these n(K, t) nodes garners ∼ λ edges, in such a way that the average number of new edges garnered by a node from this subset is

$$\displaystyle \begin{aligned} \overline{\varDelta K}= \frac{N\int_{0}^{\infty}\lambda\frac{\varLambda^K} {K!}e^{-\varLambda}\rho(\eta)d\eta}{n(K,t)} {}. \end{aligned} $$

(9.18)

We substitute Eq. 9.15 into Eq. 9.18, note that $\lambda =\varLambda \tilde {A}(t)$, where $\tilde {A}(t)=\frac {A(t)}{\int _{0}^{t}A(\tau ) d\tau }$, use the equality

$$\displaystyle \begin{aligned} \varLambda Poiss(\varLambda,K)=(K+1)Poiss(\varLambda,K+1), {} \end{aligned} $$

(9.19)

and come to

$$\displaystyle \begin{aligned} \overline{\varDelta K}=\tilde{A}(t)(K+1)\frac{n(K+1,t)}{n(K,t)}\varDelta t. {} \end{aligned} $$

(9.20)

Here, n(K + 1, t) is the number of nodes that garnered K + 1 edges by time t. For broad fitness distribution and for K >> 1, n(K + 1, t) ≈ n(K, t), in such a way that Eq. 9.20 reduces to

$$\displaystyle \begin{aligned} \overline{\varDelta K}=\tilde{A}(t)(K+1)\varDelta t. {} \end{aligned} $$

(9.21)

This expression is nothing else but Eq. 9.4 with K ₀ = 1 and 𝜖 = 0. [Interestingly, this derivation validates the initial conjecture de Solla Price, K ₀ = 1.] A similar result was obtained earlier by Burrell [25] using a different approach. Equation 9.21 is commonly accepted as an evaluation tool of growth mechanism of real complex networks. We demonstrate here that this equation is undiscriminating and holds not only for the preferential attachment but for the fitness model as well.

To validate Eqs. 9.20 and 9.21 through numerical simulation, we considered a set of 400,000 nodes with a log-normal fitness distribution $\rho (\eta )=\frac {1}{\sqrt {2\pi }\sigma \eta }e^{{-\frac {(\ln {\eta }-\mu )^{2}}{2\sigma ^{2}}}}$ where μ = 1.63 and σ = 1.12 (We defined here fitness differently from what was defined in Chaps. 3 and 7. Here, η is multiplied by R ₀, the average reference list length). We simulated the growth of these nodes using Eq. 9.15 and the aging function $A(t)=\frac {0.035t}{|t-2.4|{ }^{1.3}}$. (The parameters of the simulation were chosen in such a way as to imitate measured citation dynamics of Physics papers as reported in Ref. [72].) The time was run from t = 0 to t = 25 with steps Δt = 1, in such a way that $\sum _{0}^{t=25}A(t)=1$. For each node j in this set, we determined K _j(t), the total number of edges accumulated after time t, and ΔK _j(t), the number of additional edges gained between t and t + 1. For every t, we grouped all nodes into 40 logarithmically-spaced bins, each bin containing the nodes with close values of K. For each bin, we determined ΔK-distribution and found its mean, $\overline {\varDelta K}$. Figure 9.1a plots $\overline {\varDelta K}$ versus K for small K. We observe straight lines with common intercept of − 1, as suggested by Eq. 9.21. To fit the whole $\overline {\varDelta K}(K)$ dependence, we used the following equation

$$\displaystyle \begin{aligned} \overline{\varDelta K}=\tilde{A}(t)(K+K_{0}), {} \end{aligned} $$

(9.22)

where K ₀ is the fitting parameter. Figure 9.1b shows that this equation fits the data fairly well for all K.

Figure 9.2 shows $\overline {\varDelta K}$ versus (K + K ₀) dependences for log-normal fitness distributions with different σ and for time slices Δt = 1. These dependences are also well fitted by Eq. 9.22 with K ₀ ∼ 1.

To estimate K ₀ from the data in a more precise way, we turn to Eq. 9.22. It indicates that at small K, $\overline {\varDelta K}\rightarrow \tilde {A}(t)K_{0}$. On another hand, Eq. 9.20 yields for Δt = 1

$$\displaystyle \begin{aligned} \overline{\varDelta K}|{}_{K=0}=\tilde{A}(t)\frac{n(1,t)}{n(0,t)}. {} \end{aligned} $$

(9.23)

Thus,

$$\displaystyle \begin{aligned} K_{0}\approx\frac{n(1,t)}{n(0,t)} =\frac{\int_{0}^{\infty}\varLambda e^{-\varLambda}\rho(\varLambda)d\varLambda}{\int_{0}^{\infty} e^{-\varLambda}\rho(\varLambda)d\varLambda}. {} \end{aligned} $$

(9.24)

where Λ is given by Eq. 9.16. We note that ρ(Λ) follows the log-normal distribution which is nothing else but the fitness distribution with shifted mean, $\mu ^{\prime }=\mu +\ln (\int _{0}^{t}A(\tau ) d\tau )$. Since $\int _{0}^{t}A(\tau ) d\tau )\rightarrow 1$ in the long time limit, the difference between μ and μ ^′ becomes increasingly smaller at long t. Figure 9.2b shows K ₀ calculated according to Eq. 9.24 as a function of μ and σ. We observe that K ₀ increases with μ and decreases with σ. These dependences can be captured by the approximate empirical expression

$$\displaystyle \begin{aligned} K_{0}\approx\frac{e^{\frac{\mu}{1+\sigma}}}{(1+\sigma^{2})^{0.6}}. {} \end{aligned} $$

(9.25)

For reasonable values of μ from 0 to 2 and σ from 1 to 2, K ₀ lies between 0.5 and 1.5. It is determined by σ, and, to a lesser extent, by μ. All this means the following: if K ₀ is measured using Eq. 9.22 using extrapolation from large K, one always gets K ₀ = 1. On another hand, since most fitness distributions are broad, then the estimates made using Eq. 9.22 for small K, as it is usually done in most studies, yield K ₀ = 0.5–1.5. Figure 9.2b shows that for narrow fitness distribution, K ₀ can be higher.

We plot on Fig. 9.2b the measured values of K ₀ which were inferred from our studies of citation dynamics. We considered three research fields: Physics, Economics, and Mathematics and found that the fitness distributions for all these fields are log-normals with the same σ = 1.1. The measured and calculated initial attractivities K ₀ are in good agreement and are all close to 1.

Thus, our numerical simulation supports Eq. 9.22 with initial attractivity K ₀ ≈ 1, as it was postulated by de Solla Price [138]. The natural question arises—why K ₀ ≈ 1 is so widespread? Figure 9.2b shows that K ₀ ≈ 1 corresponds to σ = 1–1.5 irrespective of μ. Nguyen and Tran [126] used numerical simulation to study complex networks with log-normal fitness distribution that grow according to Eq. 9.12. They found that the resulting network structure strongly depends on the width of the fitness distribution σ, in particular, the power-law degree distribution appears only for σ ≈ 1 and its exponent ν is close to 3. This observation implies that the initial attractivity is coupled to the exponent of the degree distribution, in such a way that within the framework of the fitness model, the universality of K ₀ ≈ 1 in complex networks is a consequence of the fact that most of them exhibit power-law degree distributions with ν ∼ 3.

9.6 The Genuine Preferential Attachment Exists and is Related to Nonlinear Citation Dynamics

Although our model is based on the measurements on citation networks, we believe that it is more general and relevant to other complex networks as well. In what follows, we present our reflections on this subject. Our analysis shows that if network growth is considered from the perspective of a target node and is studied using the mean-field approximation, namely, by averaging over many similar nodes, one cannot distinguish between the preferential attachment and the fitness models—both of them yield Eq. 9.22. Thus, in all that concerns the mean-field network dynamics, preferential attachment is equivalent to fitness model, in other words, the rich-gets-richer mechanism reduces to the fit-gets-richer (good-gets-richer) mechanism [29, 135]. This is surprising since these two models are based on different premises. The preferential attachment model assumes that all nodes are born equal, the inequality in their degree coming by chance. After this inequality has been established, it is amplified by the autocatalytic process represented by Eq. 9.1. In contrast, the fitness model and the fitness-based recursive search model assume that the nodes are born unequal, each newly born node is endowed with a certain fitness. The latent fitness inequality becomes evident when the nodes have been developing for some time. Surprisingly, the two opposing assumptions underlying network growth—all nodes are born equal or different—result in the same Eq. 9.22.

This does not mean that the two models are equivalent. While the preferential attachment model does not specify the initial attractivity, the fitness model with aging explains it perfectly well—it is determined by the shape of the fitness distribution. With respect to the power-law degree distribution in complex networks: the preferential attachment relates its to the strategy by which the new node attaches to old nodes, while the fitness model implies that this distribution is inherited from the fitness distribution. The fitness model successfully explains the first-mover advantage, degree distribution for the nodes of the same age, different trajectories of the nodes of the same age, etc. However, this model does not account for the nonlinear growth commonly observed in citation networks.

Although it could seem that the fitness model is a more appropriate framework to conceptualize network growth, Eqs. 9.1 and 9.22 can still be valid since the preferential attachment is a structural rather than explanatory model. Indeed, the relation Π _ij ∼ K _j does not imply that a new node i crawls through the whole network in order to gain information about degrees of all other nodes j. What occurs in reality is that the network grows following some local rule and this rule becomes imprinted in the network topology. When the network growth is analyzed, the changes in topology are visible while the underlying microscopic growth rule is not. This feeds the illusion that the growth dynamics is determined by network topology while in reality the reverse is true.

The challenge is to uncover the microscopic rules of network growth that produce the given network topology. We showed that the recursive search is one of the plausible microscopic mechanisms of network growth. What is the relation of this mechanism to the genuine preferential attachment—namely, the algorithm whereby a new node finds well-connected older nodes and attaches to them? It has been generally believed that the recursive search is one of realizations of this algorithm, since if a new node makes a random choice among the neighbors of already chosen nodes, it has high probability of picking up highly-connected nodes. We demonstrate here (Eq. 3.24) that this strategy works in a straightforward way only if the recursive search does not have memory. In reality, recursive search has rather short memory [72], and it is not clear whether highly-connected nodes can be found by this simple strategy: random choice among the neighbors of already chosen nodes. In Chap. 4 we showed that the recursive search there follows a more clever strategy: the search in the network neighborhood of the previously chosen nodes is not random but has preference for those neighbors that are connected to several already chosen nodes. The cartoon picture of this strategy is as follows. Simple recursive search: if Alice is linked to Bob, and Bob is linked to Frank, there is a chance that Alice will link to Frank. Clever recursive search: if Alice is linked to Bob and Charlie, and both of them are linked to Frank, then Alice will link to Frank almost for sure. Thus, if a new node identifies a target node in the network vicinity of the two or more previously chosen nodes, the probability of attachment to such node exceeds the sum of probabilities for each path, namely, multiple paths interfere constructively, reinforcing one another. The synergetic interaction between the paths to the next-nearest neighbors ensures that a new node finds highly-connected nodes. This strategy of exploring next-nearest neighbors can still be considered as a local strategy, but in fact, it is one step towards global search and this is one of the ways how the genuine preferential attachment emerges within the framework of the recursive search model.

References

Albert, R., & Barabasi, A. L. (2002). Statistical mechanics of complex networks. Reviews of Modern Physics, 74, 47–97.
Article ADS MathSciNet MATH Google Scholar
Bagrow, J. P., & Brockmann, D. (2013). Natural emergence of clusters and bursts in network evolution. Physical Review X, 3, 021016.
Article ADS Google Scholar
Barabasi, A. L. (2015). Network science. Cambridge: Cambridge University Press.
MATH Google Scholar
Barabasi, A. L., Song, C., & Wang D. (2012). Publishing: Handful of papers dominates citation. Nature, 491(7422), 40.
Article ADS Google Scholar
Bedogne, C., & Rodgers, G. J. (2006). Complex growing networks with intrinsic vertex fitness. Physical Review E, 74(4), 046115.
Article ADS Google Scholar
Bell, M., Perera, S., Piraveenan, M., Bliemer, M., Latty, T., & Reid, C. (2017). Network growth models: A behavioural basis for attachment proportional to fitness. Scientific Reports, 7, 42431.
Article ADS Google Scholar
Bianconi, G., & Barabasi, A.-L. (2001). Bose-Einstein condensation in complex networks. Physical Review Letters, 86, 5632–5635.
Article ADS Google Scholar
Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., & Hwang, D.-U. (2006). Complex networks: Structure and dynamics. Physics Reports, 424(4–5), 175–308.
Article ADS MathSciNet MATH Google Scholar
Bramoullé, Y., Currarini, S., Jackson, M. O., Pin, P., & Rogers, B. W. (2012). Homophily and long-run integration in social networks. Journal of Economic Theory, 147(5), 1754–1786.
Article MathSciNet MATH Google Scholar
Burrell, Q. L. (2003). Predicting future citation behavior. Journal of the American Society for Information Science and Technology, 54(5), 372–378.
Article Google Scholar
Caldarelli, G. (2007). Scale-free networks: Complex webs in nature and technology. Oxford: Oxford University Press.
Book MATH Google Scholar
Caldarelli, G., Capocci, A., De Los Rios, P., & Muñoz, M. A. (2002). Scale-free networks from varying vertex intrinsic fitness. Physical Review Letters, 89(25), 258702.
Article ADS Google Scholar
Capocci, A., Servedio, V. D., Colaiori, F., Buriol, L. S., Donato, D., Leonardi, S., et al. (2006). Preferential attachment in the growth of social networks: The internet encyclopedia Wikipedia. Physical Review E, 74(3), 036116.
Article ADS Google Scholar
Carletti, T., Gargiulo, F., & Lambiotte, R. (2015). Preferential attachment with partial information. The European Physical Journal B, 88(1), 18.
Article ADS Google Scholar
Centola, D. (2010). The spread of behavior in an online social network experiment. Science, 329(5996), 1194–1197.
Article ADS Google Scholar
Centola, D., Eguíluz, V. M., & Macy, M. W. (2007). Cascade dynamics of complex propagation. Physica A: Statistical Mechanics and Its Applications, 374(1), 449–456.
Article ADS Google Scholar
Ciotti, V., Bonaventura, M., Nicosia, V., Panzarasa, P., & Latora, V. (2016). Homophily and missing links in citation networks. EPJ Data Science, 5(1), 7.
Article Google Scholar
Csárdi, G., Strandburg, K. J., Zalányi, L., Tobochnik, J., & Érdi, P. (2007). Modeling innovation by a kinetic description of the patent citation system. Physica A: Statistical Mechanics and Its Applications, 374(2), 783–793.
Article ADS Google Scholar
Dorogovtsev, S. N., & Mendes, J. F. F. (2000). Evolution of networks with aging of sites. Physical Review E, 62(2), 1842–1845.
Article ADS Google Scholar
Eom, Y.-H., & Fortunato, S. (2011). Characterizing and modeling citation dynamics. PLoS One, 6(9), e24926.
Article ADS Google Scholar
Ergün, G., & Rodgers, G. J. (2002). Growing random networks with fitness. Physica A: Statistical Mechanics and Its Applications, 303(1), 261–272.
Article ADS MATH Google Scholar
Fortunato, S., Flammini, A., & Menczer, F. (2006). Scale-free network growth by ranking. Physical Review Letters, 96(21), 218701.
Article ADS Google Scholar
Geng, X., & Wang, Y. (2009). Degree correlations in citation networks model with aging. Europhysics Letters, 88(3), 38002.
Article ADS MathSciNet Google Scholar
Ghadge, S., Killingback, T., Sundaram, B., & Tran, D. A. (2010). A statistical construction of power-law networks. International Journal of Parallel, Emergent and Distributed Systems, 25(3), 223–235.
Article MathSciNet MATH Google Scholar
Gleeson, J. P., Cellai, D., Onnela, J.-P., Porter, M. A., & Reed-Tsochas, F. (2014). A simple generative model of collective online behavior. Proceedings of the National Academy of Sciences, 111(29), 10411–10415.
Article ADS Google Scholar
Goldberg, S. R., Anthony, H., & Evans, T. S. (2015). Modelling citation networks. Scientometrics, 105(3), 1577–1604.
Article Google Scholar
Golosovsky, M. (2017). Power-law citation distributions are not scale-free. Physical Review E, 96(3), 032306.
Article ADS Google Scholar
Golosovsky, M. (2018). Mechanisms of complex network growth: Synthesis of the preferential attachment and fitness models. Physical Review E, 97(6), 062310.
Article ADS Google Scholar
Golosovsky, M., & Solomon, S. (2012). Stochastic dynamical model of a growing citation network based on a self-exciting point process. Physical Review Letters, 109(9), 098701.
Article ADS Google Scholar
Golosovsky, M., & Solomon, S. (2013). The transition towards immortality: Non-linear autocatalytic growth of citations to scientific papers. Journal of Statistical Physics, 151(1–2), 340–354.
Article ADS MathSciNet MATH Google Scholar
Golosovsky, M., & Solomon, S. (2017). Growing complex network of citations of scientific papers: Modeling and measurements. Physical Review E, 95(1), 012324.
Article ADS Google Scholar
Hajra, K. B., & Sen, P. (2006). Modelling aging characteristics in citation networks. Physica A: Statistical Mechanics and Its Applications, 368(2), 575–582.
Article ADS Google Scholar
Higham, K. W., Governale, M., Jaffe, A. B., & Zülicke, U. (2017). Fame and obsolescence: Disentangling growth and aging dynamics of patent citations. Physical Review E, 95(4), 042309.
Article ADS Google Scholar
Higham, K. W., Governale, M., Jaffe, A. B., & Zülicke, U. (2017). Unraveling the dynamics of growth, aging and inflation for citations to scientific articles from specific research fields. Journal of Informetrics, 11(4), 1190–1200.
Article Google Scholar
Jackson, M. O., & Rogers, B. W. (2007). Meeting strangers and friends of friends: How random are social networks? American Economic Review, 97(3), 890–915.
Article Google Scholar
Jeong, H., Néda, Z., & Barabási, A.-L. (2003). Measuring preferential attachment in evolving networks. Europhysics Letters, 61(4), 567–572.
Article ADS Google Scholar
Ke, Q., Ferrara, E., Radicchi, F., & Flammini, A. (2015). Defining and identifying sleeping beauties in science. Proceedings of the National Academy of Sciences, 112(24), 7426–7431.
Article ADS Google Scholar
Kong, J. S., Sarshar, N., & Roychowdhury, V. P. (2008). Experience versus talent shapes the structure of the Web. Proceedings of the National Academy of Sciences, 105(37), 13724–13729.
Article ADS Google Scholar
Krapivsky, P. L., & Redner, S. (2001). Organization of growing random networks. Physical Review E, 63(6), 066123.
Article ADS Google Scholar
Krapivsky, P. L., & Redner, S. (2005). Network growth by copying. Physical Review E, 71, 036118.
Article ADS Google Scholar
Kunegis, J., Blattner, M., & Moser, C. (2013). Preferential attachment in online networks. In Proceedings of the 5th Annual ACM Web Science Conference. New York, NY: Association for Computing Machinery.
Google Scholar
Lambiotte, R., & Ausloos, M. (2007). Growing network with j-redirection. Europhysics Letters, 77(5), 58002.
Article ADS MathSciNet Google Scholar
Lambiotte, R., Krapivsky, P. L., Bhat, U., & Redner, S. (2016). Structural transitions in densifying networks. Physical Review Letters, 117(21), 218301.
Article ADS Google Scholar
Leskovec, J., Backstrom, L., Kumar, R., & Tomkins, A. (2008). Microscopic evolution of social networks. In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 08. New York, NY: Association for Computing Machinery.
Google Scholar
Leskovec, J., Kleinberg, J., & Faloutsos, C. (2005). Graphs over time. In Proceeding of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining - KDD 05. New York, NY: Association for Computing Machinery.
Google Scholar
Liben-Nowell, D., & Kleinberg, J. (2003). The link prediction problem for social networks. In Proceedings of the Twelfth International Conference on Information and Knowledge Management - CIKM. New York, NY: Association for Computing Machinery.
Google Scholar
Luck, J. M., & Mehta, A. (2017). How the fittest compete for leadership: A tale of tails. Physical Review E, 95(6), 062306.
Article ADS Google Scholar
Martin, T., Ball, B., Karrer, B., & Newman, M. E. J. (2013). Coauthorship and citation patterns in the physical review. Physical Review E, 88(1), 012814.
Article ADS Google Scholar
Medo, M., Cimini, G., & Gualdi, S. (2011). Temporal effects in the growth of networks. Physical Review Letters, 107, 238701.
Article ADS Google Scholar
Menczer, F. (2004). Evolution of document networks. Proceedings of the National Academy of Sciences, 101(Supplement 1), 5261–5265.
Article ADS Google Scholar
Miller, B. A., & Bliss, N. T. (2012). A stochastic system for large network growth. IEEE Signal Processing Letters, 19(6), 356–359.
Article ADS Google Scholar
Mislove A., Koppula H. S., Gummadi K. P., Druschel P., & Bhattacharjee B. (2013). An empirical validation of growth models for complex networks. In A. Mukherjee, M. Choudhury, F. Peruani, N. Ganguly, & B. Mitra (Eds.), Dynamics on and of complex networks. Modeling and simulation in science, engineering and technology (Vol. 2). New York, NY: Birkhäuser.
Google Scholar
Mokryn, O., Wagner, A., Blattner, M., Ruppin, E., & Shavitt, Y. (2016). The role of temporal trends in growing networks. PLoS One, 11(8), e0156505.
Article Google Scholar
Newman, M. (2010). Networks. Oxford: Oxford University Press.
Book MATH Google Scholar
Newman, M. E. (2001). Clustering and preferential attachment in growing networks. Physical Review E, 64(2), 025102.
Article ADS Google Scholar
Newman, M. E. J. (2009). The first-mover advantage in scientific publication. Europhysics Letters, 86(6), 68001.
Article ADS Google Scholar
Newman, M. E. J. (2014). Prediction of highly cited papers. Europhysics Letters, 105(2), 28002.
Article ADS Google Scholar
Nguyen, K., & Tran, D. A. (2012). Fitness-based generative models for power-law networks. In Handbook of optimization in complex networks (pp. 39–53). Berlin: Springer.
Chapter Google Scholar
Ostroumova-Prokhorenkova, L., & Samosvat, E. (2016). Recency-based preferential attachment models. Journal of Complex Networks, 4(4), 475–499.
MathSciNet Google Scholar
Papadopoulos, F., Kitsak, M., Serrano, M. Á., Boguñá, M., & Krioukov, D. (2012). Popularity versus similarity in growing networks. Nature, 489(7417), 537–540.
Article ADS Google Scholar
Perc, M. (2014).The Matthew effect in empirical data. Journal of the Royal Society Interface, 11, 20140378.
Article Google Scholar
Pham, T., Sheridan, P., & Shimodaira, H. (2015). PAFit: A statistical method for measuring preferential attachment in temporal complex networks. PLoS One, 10(9), e0137796.
Article Google Scholar
Pham, T., Sheridan, P., & Shimodaira, H. (2016). Joint estimation of preferential attachment and node fitness in growing complex networks. Scientific Reports, 6, 32558.
Article ADS Google Scholar
Price, D. D. S. (1976). A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27(5), 292–306.
Article Google Scholar
Redner, S. (2005). Citation statistics from 110 years of Physical Review. Physics Today, 58(6), 49–54.
Article Google Scholar
Rosvall, M., Esquivel, A. V., Lancichinetti, A., West, J. D., & Lambiotte, R. (2014). Memory in network flows and its effects on spreading dynamics and community detection. Nature Communications, 5, 4630.
Article ADS Google Scholar
Sendiña-Nadal, I., Danziger, M. M., Wang, Z., Havlin, S., & Boccaletti, S. (2016). Assortativity and leadership emerge from anti-preferential attachment in heterogeneous networks. Scientific Reports, 6(1), 21297.
Article ADS Google Scholar
Servedio, V. D. P., Caldarelli, G., & Buttà, P. (2004). Vertex intrinsic fitness: How to produce arbitrary scale-free networks. Physical Review E, 70(5), 056126.
Article ADS Google Scholar
Simkin, M. V., & Roychowdhury, V. P. (2007). A mathematical theory of citing. Journal of the American Society for Information Science and Technology, 58(11), 1661–1673.
Article Google Scholar
Šubelj, L., & Bajec, M. (2013). Model of complex networks based on citation dynamics. In Proceedings of the WWW Workshop on Large Scale Network Analysis, 2013:(LSNA’13) (pp.527–530).
Google Scholar
Vazquez, A. (2001). Disordered networks generated by recursive searches. Europhysics Letters, 54(4), 430–435.
Article ADS Google Scholar
Wang, D., Song, C., & Barabsi, A. L. (2013). Quantifying long-term scientific impact. Science, 342(6154), 127–132.
Article ADS Google Scholar
Wang, M., Yu, G., & Yu, D. (2008). Measuring the preferential attachment mechanism in citation networks. Physica A: Statistical Mechanics and Its Applications, 387(18), 4692–4698.
Article ADS Google Scholar
Wu, Y., Fu, T. Z. J., & Chiu, D. M. (2014). Generalized preferential attachment considering aging. Journal of Informetrics, 8(3), 650–658.
Article Google Scholar
Wu, Z.-X., & Holme, P. (2009). Modeling scientific-citation patterns and other triangle-rich acyclic networks. Physical Review E, 80, 037101.
Article ADS Google Scholar
Xie, Z., Ouyang, Z., Liu, Q., & Li, J. (2016). A geometric graph model for citation networks of exponentially growing scientific papers. Physica A: Statistical Mechanics and Its Applications, 456, 167–175.
Article ADS Google Scholar
Zhou, J., Zeng, A., Fan, Y., & Di, Z. (2016) Ranking scientific publications with similarity-preferential mechanism. Scientometrics, 106(2), 805–816.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Racah Institute of Physics, Hebrew University of Jerusalem, Jerusalem, Israel
Michael Golosovsky

Authors

Michael Golosovsky
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Golosovsky, M. (2019). Comparison to Existing Models. In: Citation Analysis and Dynamics of Citation Networks. SpringerBriefs in Complexity. Springer, Cham. https://doi.org/10.1007/978-3-030-28169-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-28169-4_9
Published: 27 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28168-7
Online ISBN: 978-3-030-28169-4
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics