# Covariance Structure Behind Breaking of Ensemble Equivalence in Random Graphs

## Abstract

For a random graph subject to a topological constraint, the *microcanonical ensemble* requires the constraint to be met by every realisation of the graph (‘hard constraint’), while the *canonical ensemble* requires the constraint to be met only on average (‘soft constraint’). It is known that *breaking of ensemble equivalence* may occur when the size of the graph tends to infinity, signalled by a non-zero specific relative entropy of the two ensembles. In this paper we analyse a formula for the relative entropy of generic discrete random structures recently put forward by Squartini and Garlaschelli. We consider the case of a random graph with a given degree sequence (configuration model), and show that in the dense regime this formula correctly predicts that the specific relative entropy is determined by the scaling of the determinant of the matrix of canonical covariances of the constraints. The formula also correctly predicts that an extra correction term is required in the sparse regime and in the ultra-dense regime. We further show that the different expressions correspond to the degrees in the canonical ensemble being asymptotically *Gaussian* in the dense regime and asymptotically *Poisson* in the sparse regime (the latter confirms what we found in earlier work), and the dual degrees in the canonical ensemble being asymptotically *Poisson* in the ultra-dense regime. In general, we show that the degrees follow a multivariate version of the *Poisson–Binomial* distribution in the canonical ensemble.

## Keywords

Random graph Topological constraints Microcanonical ensemble Canonical ensemble Relative entropy Equivalence vs. nonequivalence Covariance matrix## Mathematics Subject Classification

60C05 60K35 82B20## 1 Introduction and Main Results

### 1.1 Background and Outline

*topological constraints*[7]. Statistical physics deals with the definition of the appropriate probability distribution over the set of configurations and with the calculation of the resulting properties of the system. Two key choices of probability distribution are:

- (1)
the

*microcanonical ensemble*, where the constraints are*hard*(i.e., are satisfied by each individual configuration); - (2)
the

*canonical ensemble*, where the constraints are*soft*(i.e., hold as ensemble averages, while individual configurations may violate the constraints).

*maximal*subject to the given constraints.)

In the limit as the size of the network diverges, the two ensembles are traditionally *assumed* to become equivalent, as a result of the expected vanishing of the fluctuations of the soft constraints (i.e., the soft constraints are expected to become asymptotically hard). However, it is known that this equivalence may be broken, as signalled by a non-zero specific relative entropy of the two ensembles (= on an appropriate scale). In earlier work various scenarios were identified for this phenomenon (see [2, 4, 8] and references therein). In the present paper we take a fresh look at breaking of ensemble equivalence by analysing a formula for the relative entropy, based on the *covariance structure* of the canonical ensemble, recently put forward by Squartini and Garlaschelli [6]. We consider the case of a random graph with a given degree sequence (configuration model) and show that this formula correctly predicts that the specific relative entropy is determined by the scaling of the determinant of the covariance matrix of the constraints in the dense regime, while it requires an extra correction term in the sparse regime and the ultra-dense regime. We also show that the different behaviours found in the different regimes correspond to the degrees being asymptotically Gaussian in the dense regime and asymptotically Poisson in the sparse regime, and the dual degrees being asymptotically Poisson in the ultra-dense regime. We further note that, in general, in the canonical ensemble the degrees are distributed according to a multivariate version of the *Poisson–Binomial* distribution [12], which admits the Gaussian distribution and the Poisson distribution as limits in appropriate regimes.

*principled choice*of the ensemble used in practical applications. Three examples serve as an illustration:

- (a)
*Pattern detection*is the identification of nontrivial structural properties in a real-world network through comparison with a suitable*null model*, i.e., a random graph model that preserves certain local topological properties of the network (like the degree sequence) but is otherwise completely random. - (b)
*Community detection*is the identification of groups of nodes that are more densely connected with each other than expected under a null model, which is a popular special case of pattern detection. - (c)
*Network reconstruction*employs purely local topological information to infer higher-order structural properties of a real-world network. This problem arises whenever the global properties of the network are not known, for instance, due to confidentiality or privacy issues, but local properties are. In such cases, optimal inference about the network can be achieved by maximising the entropy subject to the known local constraints, which again leads to the two ensembles considered here.

The remainder of this section is organised as follows. In Sect. 1.2 we define the two ensembles and their relative entropy. In Sect. 1.3 we introduce the constraints to be considered, which are on the *degree sequence*. In Sect. 1.4 we introduce the various regimes we will be interested in and state a formula for the relative entropy when the constraint is on the degree sequence. In Sect. 1.5 we state the formula for the relative entropy proposed in [6] and present our main theorem. In Sect. 1.6 we close with a discussion of the interpretation of this theorem and an outline of the remainder of the paper.

### 1.2 Microcanonical Ensemble, Canonical Ensemble, Relative Entropy

*n*nodes. Any graph \(G\in \mathcal {G}_n\) can be represented as an \(n \times n\) matrix with elements

*graphical*, i.e., realisable by at least one graph in \(\mathcal {G}_n\), the

*microcanonical probability distribution*on \(\mathcal {G}_n\) with

*hard constraint*\(\vec {C}^*\) is defined as

*canonical probability distribution*\(P_{\mathrm {can}}(G)\) on \(\mathcal {G}_n\) is defined as the solution of the maximisation of the

*entropy*

*soft constraint*\(\langle \vec {C}\rangle = \vec {C}^*\), where \(\langle \cdot \rangle \) denotes the average w.r.t. \(P_{\mathrm {can}}\). This gives

*Hamiltonian*and

*partition function*. In (1.5) the parameter \(\vec {\theta }\) must be set equal to the particular value \(\vec {\theta }^*\) that realises \(\langle \vec {C}\rangle = \vec {C}^*\). This value is unique and maximises the likelihood of the model given the data (see [3]).

*relative entropy*of \(P_{\mathrm {mic}}\) w.r.t. \(P_{\mathrm {can}}\) is [9]

*relative entropy*\(\alpha _n\)

*-density*is [6]

*scale parameter*. The limit of the relative entropy \(\alpha _n\)-density is defined as

*on scale*\(\alpha _n\) (or

*with speed*\(\alpha _n\)) if and only if

^{1}

*degree sequence*, then in the sparse regime the natural scale turns out to be \(\alpha _n=n\) [4, 8] (in which case \(s_{\alpha _\infty }\) is the specific relative entropy ‘per vertex’), while in the dense regime it turns out to be \(\alpha _n = n\log n\), as shown below. On the other hand, if the constraint is on the

*total numbers of edges and triangles*, with values different from what is typical for the Erdős–Renyi random graph in the dense regime, then the natural scale turns out to be \(\alpha _n=n^2\) [2] (in which case \(s_{\alpha _\infty }\) is the specific relative entropy ‘per edge’). Such a severe breaking of ensemble equivalence comes from ‘frustration’ in the constraints.

*any*graph in \(\mathcal {G}_n\) such that \(\vec {C}(G^*) =\vec {C}^*\) (recall that we have assumed that \(\vec {C}^*\) is realisable by at least one graph in \(\mathcal {G}_n\)). The definition in (1.10) then becomes

*single*configuration \(G^*\) realising the hard constraint. Apart from its theoretical importance, this fact greatly simplifies mathematical calculations.

To analyse breaking of ensemble equivalence, ideally we would like to be able to identify an underlying *large deviation principle* on a natural scale \(\alpha _n\). This is generally difficult, and so far has only been achieved in the dense regime with the help of *graphons* (see [2] and references therein). In the present paper we will approach the problem from a different angle, namely, by looking at the *covariance matrix of the constraints* in the canonical ensemble, as proposed in [6].

Note that all the quantities introduced above in principle depend on *n*. However, except for the symbols \(\mathcal {G}_n\) and \(S_n(P_{\mathrm {mic}}\mid P_{\mathrm {can}})\), we suppress the *n*-dependence from the notation.

### 1.3 Constraint on the Degree Sequence

*specific value*\(\vec {k}^*\), which we assume to be

*graphical*, i.e., there is at least one graph with degree sequence \(\vec {k}^*\). The constraint is therefore

*configuration model*and has been studied intensively (see [7, 8, 11]). For later use we recall the form of the canonical probability in the configuration model, namely,

### 1.4 Relevant Regimes

*sparse regime*, defined by the condition

*ultra-dense regime*in which the degrees are close to \(n-1\),

*dual*of the sparse regime. We will see in Appendix B that under the map \(k^*_i \mapsto n-1-k^*_i\) the microcanonical ensemble and the canonical ensemble preserve their relationship, in particular, their relative entropy is invariant.

It is a challenge to study breaking of ensemble equivalence *in between* the sparse regime and the ultra-dense regime, called the *dense regime*. In what follows we consider a subclass of the dense regime, called the \(\delta \)*-tame regime*, in which the graphs are subject to a certain uniformity condition.

### Definition 1.1

### Remark 1.2

The name \(\delta \)-tame is taken from [1], which studies the number of graphs with a \(\delta \)-tame degree sequence. Definition 1.1 is actually a reformulation of the definition given in [1]. See Appendix A for details.

It is natural to ask whether, conversely, condition (1.24) implies that the degree sequence is \(\delta '\)-tame for some \(\delta '=\delta '(\delta )\). Unfortunately, this question is not easy to settle, but the following lemma provides a partial answer.

### Lemma 1.3

### Proof

The proof follows from [1, Theorem 2.1]. In fact, by picking \(\beta =1-\alpha \) in that theorem, we find that we need \(\alpha >\tfrac{1}{4}\). The theorem also gives information about the values of \(\delta = \delta (\alpha )\) and \(n_0=n_0(\alpha )\). \(\square \)

### 1.5 Linking Ensemble Nonequivalence to the Canonical Covariances

In this section we investigate an important formula, recently put forward in [6], for the scaling of the relative entropy under a general constraint. The analysis in [6] allows for the possibility that not all the constraints (i.e., not all the components of the vector \(\vec {C}\)) are linearly independent. For instance, \(\vec {C}\) may contain redundant replicas of the same constraint(s), or linear combinations of them. Since in the present paper we only consider the case where \(\vec {C}\) is the degree sequence, the different components of \(\vec {C}\) (i.e., the different degrees) are linearly independent.

*K*-dimensional constraint \(\vec {C}^* = (C^*_i)_{i=1}^K\) with independent components is imposed, then a key result in [6] is the formula

*i*th eigenvalue of the \(K\times K\) covariance matrix

*Q*. This result can be formulated rigorously as

### Formula 1.1

*i*th eigenvalue of the

*K*-dimensional covariance matrix

*Q*(the notation \(K_n\) indicates that

*K*may depend on

*n*). Note that \(0\le I_{K,R} \le K\). Consequently, (1.32) is satisfied (and hence \(\tau _{\alpha _\infty }=0\)) when \(\lim _{n\rightarrow \infty } K_n/\alpha _n=0\), i.e., when the number \(K_n\) of constraints grows slower than \(\alpha _n\).

### Remark 1.4

We now present our main theorem, which considers the case where the constraint is on the degree sequence: \(K_n=n\) and \(\vec {C}^*=\vec {k}^*= (k_i^*)_{i=1}^n\). This case was studied in [4], for which \(\alpha _n = n\) in the *sparse regime with finite degrees*. Our results here focus on three new regimes, for which we need to increase \(\alpha _n\): the *sparse regime with growing degrees*, the \(\delta \)*-tame regime*, and the *ultra-dense regime with growing dual degrees*. In all these cases, since \(\lim _{n\rightarrow \infty } K_n/\alpha _n=\lim _{n\rightarrow \infty } n/\alpha _n=0\), Formula 1.1 states that (1.30) holds with \(\tau _{\tilde{\alpha }_n}=0\). Our theorem provides a rigorous and independent mathematical proof of this result.

### Theorem 1.5

- The sparse regime with growing degrees:$$\begin{aligned} \max _{1\le i \le n} k^*_i = o(\sqrt{n}\,),\qquad \lim _{n\rightarrow \infty }\min _{1\le i \le n} k^*_i = \infty . \end{aligned}$$(1.37)
- The ultra-dense regime with growing dual degrees:$$\begin{aligned} \max _{1\le i \le n}(n-1 - k^*_i) = o(\sqrt{n}\,),\qquad \lim _{n\rightarrow \infty } \min _{1\le i \le n} (n-1-k^*_i) = \infty . \end{aligned}$$(1.39)

### 1.6 Discussion and Outline

#### 1.6.1 Poisson–Binomial Degrees in the General Case

*multivariate Dirac distribution*with average \(\vec {k^*}\). This has the interesting interpretation that the relative entropy between the distributions \(P_{\mathrm {mic}}\) and \(P_{\mathrm {can}}\)

*on the set of graphs*coincides with the relative entropy between \(\delta [\vec {k^*}]\) and \(Q[\vec {k^*}]\)

*on the set of degree sequences*.

*Poisson–Binomial distribution*(or Poisson’s Binomial distribution; see Wang [12]). In the univariate case, the Poisson–Binomial distribution describes the probability of a certain number of successes out of a total number of independent and (in general) not identical Bernoulli trials [12]. In our case, the marginal probability that node

*i*has degree \(k_i\) in the canonical ensemble, irrespectively of the degree of any other node, is indeed a univariate Poisson–Binomial given by \(n-1\) independent Bernoulli trials with success probabilities \(\{p_{ij}^*\}_{j\ne i}\). The relation in (1.42) can therefore be restated as

It is known that the univariate Poisson–Binomial distribution admits two asymptotic limits: (1) a Poisson limit (if and only if, in our notation, \(\sum _{j\ne i}p_{ij}^*\rightarrow \lambda >0\) and \(\sum _{j\ne i} (p_{ij}^*)^2\rightarrow 0\) as \(n\rightarrow \infty \) [12]); (2) a Gaussian limit (if and only if \(p_{ij}^*\rightarrow \lambda _j>0\) for all \(j\ne i\) as \(n\rightarrow \infty \), as follows from a central limit theorem type of argument). If all the Bernoulli trials are identical, i.e., if all the probabilities \(\{p_{ij}^*\}_{j\ne i}\) are equal, then the univariate Poisson–Binomial distribution reduces to the ordinary Binomial distribution, which also exhibits the well-known Poisson and Gaussian limits. These results imply that also the general multivariate Poisson–Binomial distribution in (1.43) admits limiting behaviours that should be consistent with the Poisson and Gaussian limits discussed above for its marginals. This is precisely what we confirm below.

#### 1.6.2 Poisson Degrees in the Sparse Regime

*i*of the relative entropy of the

*Dirac distribution*with average \(k^*_i\) w.r.t. the

*Poisson distribution*with average \(k^*_i\). We see that, under the sparseness condition, the constraints act on the nodes essentially independently. We can therefore reinterpret (1.46) as the statement

*multivariate Poisson distribution*with average \(\vec {k^*}\). In other words, in this regime

Note that the Poisson regime was obtained in [4] under the condition in (1.21), which is less restrictive than the aforementioned condition \(k_i^*=\sum _{j\ne i}p_{ij}^*\rightarrow \lambda >0\), \(\sum _{j\ne i}(p_{ij}^*)^2\rightarrow 0\) under which the Poisson distribution is retrieved from the Poisson–Binomial distribution [12]. In particular, the condition in (1.21) includes both the case with growing degrees included in Theorem 1.5 (and consistent with Formula 1.1 with \(\tau _{\alpha _\infty }=0\)) and the case with finite degrees, which *cannot* be retrieved from Formula 1.1 with \(\tau _{\alpha _\infty }=0\), because it corresponds to the case where all the \(n=\alpha _n\) eigenvalues of *Q* remain finite as *n* diverges (as the entries of *Q* themselves do not diverge), and indeed (1.32) does not hold.

#### 1.6.3 Poisson Degrees in the Ultra-Dense Regime

Again, the case with finite dual degrees cannot be retrieved from Formula 1.1 with \(\tau _{\alpha _\infty }=0\), because it corresponds to the case where *Q* has a diverging (like \(n=\alpha _n\)) number of eigenvalues whose value remains finite as \(n\rightarrow \infty \), and (1.32) does not hold. By contrast, the case with growing dual degrees can be retrieved from Formula 1.1 with \(\tau _{\alpha _\infty }=0\) because (1.32) holds, as confirmed in Theorem 1.5.

#### 1.6.4 Gaussian Degrees in the Dense Regime

*multivariate Normal distribution*with mean \(\vec {k^*}\) and covariance matrix

*Q*. In other words, in this regime

*not*vanish, unlike in the other two regimes. Since all the degrees are growing in this regime, so are all the eigenvalues of

*Q*, implying (1.32) and consistently with Formula 1.1 with \(\tau _{\alpha _\infty }=0\), as proven in Theorem 1.5.

Note that the right-hand side of (1.51), being the relative entropy of a discrete distribution with respect to a continuous distribution, needs to be properly interpreted: the Dirac distribution \(\delta [\vec {k^*}]\) needs to be smoothened to a continuous distribution with support in a small ball around \(\vec {k^*}\). Since the degrees are large, this does not affect the asymptotics.

#### 1.6.5 Crossover Between the Regimes

*Q*, and hence all the degrees, diverge. Actually, (1.35) is expected to hold in the even more general hybrid case where there are both finite and growing degrees, provided the number of finite-valued eigenvalues of

*Q*grows slower than \(\alpha _n\) [6].

#### 1.6.6 Other Constraints

It would be interesting to investigate Formula 1.1 for constraints other than on the degrees. Such constraints are typically much harder to analyse. In [2] constraints are considered on the total number of edges and the total number of triangles *simultaneously* (\(K=2\)) in the dense regime. It was found that, with \(\alpha _n=n^2\), breaking of ensemble equivalence occurs for some ‘frustrated’ choices of these numbers. Clearly, this type of breaking of ensemble equivalence does not arise from the recently proposed [6] mechanism associated with a diverging number of constraints as in the cases considered in this paper, but from the more traditional [9] mechanism of a phase transition associated with the frustration phenomenon.

#### 1.6.7 Outline

Theorem 1.5 is proved in Sect. 2. In Appendix A we show that the canonical probabilities in (1.15) are the same as the probabilities used in [1] to define a \(\delta \)-tame degree sequence. In Appendix B we explain the duality between the sparse regime and the ultra-dense regime.

## 2 Proof of the Main Theorem

In Sect. 2.2 we prove Theorem 1.5. The proof is based on two lemmas, which we state and prove in Sect. 2.1.

### 2.1 Preparatory Lemmas

The following lemma gives an expression for the relative entropy.

### Lemma 2.1

*Q*is the covariance matrix in (1.27). This matrix \(Q=(q_{ij})\) takes the form

### Proof

*Q*and \(p^*\) are defined in (2.2) and (A.2) below, while \(e^C\) is sandwiched between two constants that depend on \(\delta \):

The following lemma shows that the diagonal approximation of \(\log (\det Q)/n\overline{f}_n\) is good when the degree sequence is \(\delta \)-tame.

### Lemma 2.2

*Q*on the diagonal and is zero off the diagonal.

### Proof

- (1)
\(\det (Q)\) is real,

- (2)
\(Q_D\) is non-singular with \(\det (Q_D)\) real,

- (3)
\(\lambda _i (A)>-1\), \(1 \le i \le n\),

*Q*off the diagonal and is zero on the diagonal, \(\lambda _i(A)\) is the

*i*th eigenvalue of

*A*(arranged in decreasing order), \(\lambda _{\mathrm {min}}(A) = \min _{1 \le i \le n}\lambda _i(A)\), and \(\rho (A) = \max _{1 \le i \le n}|\lambda _i(A)|\).

We begin by verifying (1)–(3).

(1) Since *Q* is a symmetric matrix with real entries, \(\det Q\) exists and is real.

*A*is symmetric. Moreover, since \(q_{ii} = \sum _{j\ne i} q_{ij}\), the matrix

*A*is also Markov. We therefore have

*A*starting from

*i*can return to

*i*with a positive probability after an arbitrary number of steps \(\ge 2\). Consequently, the last inequality in (2.13) is strict.

*A*, while \(\mu _{\min }(L)=\min _{1 \le i \le n} \lambda _i (L)\) and \(\gamma = \min _{1 \le i \le n} a_{ii}\), with \(L = (L_{ij})\) the matrix such that, for \(i \ne j\), \(L_{ij}=1\) if and only if \(a_{ij} > 0\), while \(L_{ii} = \sum _{j\ne i} L_{ij}\).

### 2.2 Proof of Theorem 1.5

### Proof

We deal with each of the three regimes in Theorem 1.5 separately.

#### 2.2.1 The Sparse Regime with Growing Degrees

#### 2.2.2 The Ultra-Dense Regime with Growing Dual Degrees

If \(\vec {k}^*= (k_i^*)_{i=1}^n\) is an ultra-dense degree sequence, then the dual \(\vec {\ell }^* = (\ell _i^*)_{i=1}^n = (n-1-k_i^*)_{i=1}^n\) is a sparse degree sequence. By Lemma B.2, the relative entropy is invariant under the map \(k_i^* \rightarrow \ell _i^* = n-1-k_i^*\). So is \(\bar{f_n}\), and hence the claim follows from the proof in the sparse regime.

#### 2.2.3 The \(\delta \)-Tame Regime

*-tame regime*. It follows from Lemma 2.2 that

## Footnotes

## Notes

### Acknowledgements

DG and AR are supported by EU-project 317532-MULTIPLEX. FdH and AR are supported by NWO Gravitation Grant 024.002.003–NETWORKS.

## References

- 1.Barvinok, A., Hartigan, J.A.: The number of graphs and a random graph with a given degree sequence. Random Struct. Algorithm
**42**, 301–348 (2013)MathSciNetCrossRefGoogle Scholar - 2.den Hollander, F., Mandjes, M., Roccaverde, A., Starreveld, N.J.: Ensemble equivalence for dense graphs. arXiv:1703.08058 (to appear in Electron. J. Probab.)
- 3.Garlaschelli, D., Loffredo, M.I.: Maximum likelihood: extracting unbiased information from complex networks. Phys. Rev. E
**78**, 015101 (2008)ADSCrossRefGoogle Scholar - 4.Garlaschelli, G., den Hollander, F., Roccaverde, A.: Ensemble equivalence in random graphs with modular structure. J. Phys. A
**50**, 015001 (2017)ADSMathSciNetCrossRefGoogle Scholar - 5.Ipsen, I.C.F., Lee, D.J.: Determinant Approximations. North Carolina State University, Raleigh (2003)Google Scholar
- 6.Squartini, T., Garlaschelli, D.: Reconnecting statistical physics and combinatorics beyond ensemble equivalence. arXiv:1710.11422
- 7.Squartini, T., Mastrandrea, R., Garlaschelli, D.: Unbiased sampling of network ensembles. New J. Phys.
**17**, 023052 (2015)ADSCrossRefGoogle Scholar - 8.Squartini, T., de Mol, J., den Hollander, F., Garlaschelli, D.: Breaking of ensemble equivalence in networks. Phys. Rev. Lett.
**115**, 268701 (2015)ADSCrossRefGoogle Scholar - 9.Touchette, H.: Equivalence and nonequivalence of ensembles: thermodynamic, macrostate, and measure levels. J. Stat. Phys.
**159**, 987–1016 (2015)ADSMathSciNetCrossRefGoogle Scholar - 10.Touchette, H.: Asymptotic equivalence of probability measures and stochastic processes. arXiv:1708.02890
- 11.van der Hofstad, R.W.: Random Graphs and Complex Networks. Cambridge University Press, New York (2017)CrossRefGoogle Scholar
- 12.Wang, Y.H.: On the number of successes in independent trials. Stat. Sin.
**3**, 295–312 (1993)MathSciNetzbMATHGoogle Scholar - 13.Zhang, X.-D.: The smallest eigenvalue for reversible Markov chains. Linear Algebra Appl.
**383**, 175–186 (2004)MathSciNetCrossRefGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.