Estimating Peer-Influence Effects Under Homophily: Randomized Treatments and Insights

Biswas, Niloy; Airoldi, Edoardo M.

doi:10.1007/978-3-319-73198-8_28

Niloy Biswas⁶ &
Edoardo M. Airoldi⁶

Part of the book series: Springer Proceedings in Complexity ((SPCOM))

Included in the following conference series:

International Workshop on Complex Networks

926 Accesses
2 Citations

Abstract

When doing causal inference on networks, there is interference among the units. In a social network setting, such interference among individuals is known as peer-influence. Estimating the causal effect of peer-influence under the presence of homophily presents various challenges. In this paper, we present results quantifying the error incurred from ignoring homophily when estimating peer-influence on networks. We then present randomized treatment strategies on networks which can help disentangle homophily from the estimation of peer-influence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

E.M. Airoldi, D.M. Blei, S.E. Fienberg, E.P. Xing, Mixed membership stochastic block- models. J. Mach. Learn. Res. (2008)
Google Scholar
J.D. Angrist, The perils of peer effects. Labour Econ. (2014)
Google Scholar
S.Aral, D. Walker, Creating social contagion through viral product design: a randomized trial of peer influence in networks. Manag. Sci. (2011)
Google Scholar
S. Athey, D. Eckles, G.W. Imbens, Exact P-values for network interference. J. Am. Stat. Assoc. (2016)
Google Scholar
E. Bakshy, D. Eckles, R. Yan, I. Rosenn, Social influence in social advertising: Evidence from field experiments, in Proceedings of the 13th ACM Conference on Electronic Commerce, 2012
Google Scholar
R.M. Bond, C.J. Fariss, J.J. Jones, A.D.I. Kramer, C. Marlow, J.E. Settle, J.H. Fowler, A 61-million-person experiment in social influence and political mobilization. Nature (2012)
Google Scholar
P. Holland, K. Laskey, S. Leinhardt, Stochastic block models: first steps. Soc. Netw. (1983)
Google Scholar
C.F. Manski, Identification of endogenous social effects: the reflection problem. Rev. Econ. Stud. (1993)
Google Scholar
C.R. Shalizi, A.C. Thomas, Homophily and contagion are generically confounded in observational social network studies. Sociol. Methods Res. (2011)
Google Scholar
P. Toulis, E. Kao, Estimation of causal peer influence effects. J. Mach. Learn. Res. (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Harvard University, Cambridge, USA
Niloy Biswas & Edoardo M. Airoldi

Authors

Niloy Biswas
View author publications
You can also search for this author in PubMed Google Scholar
Edoardo M. Airoldi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Niloy Biswas .

Editor information

Editors and Affiliations

Center for Complex Network Research, Northeastern University, Boston, Massachusetts, USA
Sean Cornelius
Network Science Institute, Northeastern University, Boston, Massachusetts, USA
Kate Coronges
Center for Data Science, New York University, New York, New York, USA
Bruno Gonçalves
Center for Network Science, and Department of Mathematics and its Applications, Central European University, Budapest, Hungary
Roberta Sinatra
Bouvé College of Health Sciences, Northeastern University, Boston, Massachusetts, USA
Alessandro Vespignani

Appendices

A Appendices

1.1 A.1 Peer-Influence Under Homophily: Results and Inference Strategies

Binary peer-influence effect with normalized homophily: Consider now binary peer-influence effect with normalized homophily. For the untreated individuals, we have

$$\begin{aligned} Y_i(Z_i=0,(Z_j)_{j \in {\mathcal {N}_i}}) = \alpha + \beta _0 \mathbf 1 _{\sum _{j \in {\mathcal {N}_i}} Z_j > 0 } + h_0 \sum _{j \in {\mathcal {N}_i}} \frac{X_j}{ |\mathcal {N}_i | } + \epsilon _i(0, \sigma ^2_Y) \end{aligned}$$

(11)

where $\epsilon _i(0, \sigma ^2_Y)$ are idependent and identically distributed with zero mean and $\sigma ^2_Y$ variance.

As before, consider estimating the peer-influence parameter $\beta _o$ using a difference in means estimator. Partition the set of untreated individuals into sets $M^{(0)}_0 := \{ i: Z_i=0, \sum _{j \in \mathcal {N}_i}Z_j = 0 \}$ (the set of untreated individuals with no treated neighbors) and $M^{(1)}_0:= \{ i: Z_i=0, \sum _{j \in \mathcal {N}_i}Z_j > 0 \}$ (the set of untreated individuals with at least one treated neighbors). Then, the difference in means estimator for $\beta _0$ is given by:

$$\begin{aligned} \hat{\beta _0} = \underset{i \in M^{(1)}_0}{avg} Y_i - \underset{i \in M^{(0)}_0}{avg} Y_i \end{aligned}$$

(12)

Unlike in the case with unnormalized homophily, the difference of means estimator for peer-influence remains unbiased in the presence of normalized homophily. This is further highlighted in Theorem 2 below. Furthermore, for most sparse and dense models for the underlying graph, Theorem 2 can be used to show that $\hat{\beta }_0$ is a consistent estimator of peer-influence under normalized homophily.

Theorem 2

Consider the difference in means estimator $\hat{\beta }_0$ for binary peer-influence effect $\beta _0$. Under the presence of normalized homophily in our model (11), the mean squared error of $\hat{\beta }_0$ (conditional on the treatment $\mathbf Z $) is:

$$\begin{aligned} \begin{aligned} \mathbb {E}&[ (\hat{\beta }_0 - \beta _0)^2 | \mathbf Z ] = \\&h_0^2 \sigma _X^2 \Bigg ( \underset{i,j \in M^{(0)}_0}{avg} \frac{| \mathcal {N}_i \cap \mathcal {N}_j|}{|\mathcal {N}_i||\mathcal {N}_j|} + \underset{i,j \in M^{(1)}_0}{avg} \frac{| \mathcal {N}_i \cap \mathcal {N}_j|}{|\mathcal {N}_i||\mathcal {N}_j|} - \ 2 \underset{i \in M^{(0)}_0, j \in M^{(1)}_0}{avg} \frac{| \mathcal {N}_i \cap \mathcal {N}_j|}{|\mathcal {N}_i||\mathcal {N}_j|} \Bigg ) + \sigma _Y^2 \Bigg ( \frac{1}{|M^{(0)}_0|} + \frac{1}{|M^{(1)}_0|} \Bigg ) \end{aligned} \end{aligned}$$

(13)

Linear peer-influence effect with unnormalized homophily: We now consider modeling peer-influence as a linear function of the number of treated neighbors $\mathbf{peer }( (Z_j)_{ j \in {\mathcal {N}_i} } ) = \sum _{j \in {\mathcal {N}_i}} Z_j$. For the untreated individuals under unnormalized homophily, this gives:

$$\begin{aligned} Y_i(Z_i=0,(Z_j)_{j \in {\mathcal {N}_i}}) = \alpha + \beta _0 \sum _{j \in {\mathcal {N}_i}} Z_j + h_0 \sum _{j \in {\mathcal {N}_i}} X_j + \epsilon _i(0, \sigma ^2_Y) \end{aligned}$$

(14)

where $\epsilon _i(0, \sigma ^2_Y)$ are idependent and identically distributed with zero mean and $\sigma ^2_Y$ variance.

Consider estimating the peer-influence parameter $\beta _0$. Generalizing our methodology from the binary peer-influence case, we now develop a stratified estimator for $\beta _0$. Let

$$M^{(k)}_0 := \{ i: Z_i=0, \sum _{j \in \mathcal {N}_i}Z_j = k \}$$

be the set of untreated individuals which have k treated neighbors. Then, an average of difference in means estimator for peer-influence is:

$$\begin{aligned} \hat{\beta }_0 \,{=}\, \frac{\sum _k \hat{\beta }_0^{(k)}}{\sum _k 1} \ for \ \hat{\beta }_0^{(k)} \,{=}\, \frac{1}{k} \Bigg ( \frac{\sum _{i \in M^{(k)}_0} Y_i}{| M^{(k)}_0 | } - \frac{\sum _{i \in M^{(0)}_0} Y_i}{| M^{(0)}_0 | } \Bigg ) = \frac{1}{k} \Bigg ( \underset{i \in M^{(k)}_0}{avg} Y_i - \underset{i \in M^{(0)}_0}{avg} Y_i \Bigg ). \end{aligned}$$

(15)

where we average over all k such that $| M^{(k)}_0 | > 0 $ (so that $\hat{\beta }_0^{(k)}$ is well-defined). Note that here we are averaging over the class of estimators $\hat{\beta }_0^{(k)}$ under the assumption of linear peer-influence. In the case of nonlinearity, we can also consider each $\hat{\beta }_0^{(k)}$ separately to understand the kth-level peer-influence effect in the network.

The presence of latent unnormalized homophily interferes and introduces bias to the estimation of linear peer-influence, as highlighted in Theorem 3 below.

Theorem 3

Consider the estimator $\hat{\beta }_0$ for linear peer-influence effect $\beta _0$. Under the presence of unnormalized homophily in our model (3), the mean squared error of $\hat{\beta }_0$ (conditional on the treatment $\mathbf Z $) is:

$$\begin{aligned} \begin{aligned} \mathbb {E}&[ (\hat{\beta }_0 - \beta _0)^2 | \mathbf Z ] = \\&\Bigg ( \frac{h_0}{\sum _{k>0} 1} \sum _{k>0} \frac{1}{k} \Bigg ( \underset{i \in M^{(k)}_0}{avg} |\mathcal {N}_i| - \underset{i \in M^{(0)}_0}{avg} |\mathcal {N}_i| \Bigg ) \Bigg )^2 \\&+ \frac{1}{(\sum _{k>0} 1)^2} \sum _{k,l>0} \frac{1}{kl} \Bigg [ h_0^2 \sigma _X^2 \Bigg ( \underset{i \in M^{(k)}_0 ,j \in M^{(l)}_0}{avg} | \mathcal {N}_i \cap \mathcal {N}_j| + \underset{i,j \in M^{(0)}_0}{avg} | \mathcal {N}_i \cap \mathcal {N}_j| - \ 2 \underset{i \in M^{(0)}_0, j \in M^{(k)}_0}{avg} | \mathcal {N}_i \cap \mathcal {N}_j| \Bigg ) \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad + \sigma _Y^2 \Bigg ( \frac{1}{|M^{(0)}_0|} + \frac{ \mathbf 1 _{k=l} }{|M^{(k)}_0|} \Bigg ) \Bigg ] \end{aligned} \end{aligned}$$

(16)

Equation (16) highlights that unbiasedness estimation via optimal treatment allocation may be difficult computationally, as now we need to ensure balance across all the strata $ (M^{(k)}_0)_{k \ge 0} $. This motivates an alternative approach of unbiased estimation.

Linear peer-influence effect with normalized homophily: For the peer-influence effect on untreated individuals under normalized homophily, we obtain:

$$\begin{aligned} Y_i(Z_i=0,(Z_j)_{j \in {\mathcal {N}_i}}) = \alpha + \beta _0 \sum _{j \in {\mathcal {N}_i}} Z_j + h_0 \sum _{j \in {\mathcal {N}_i}} \frac{X_j}{| \mathcal {N}_i |} + \epsilon _i(0, \sigma ^2_Y) \end{aligned}$$

(17)

where $\epsilon _i(0, \sigma ^2_Y)$ are idependent and identically distributed with zero mean and $\sigma ^2_Y$ variance.

To estimate the peer-influence parameter $\beta _0$, the same stratified estimator as in the linear peer-influence with unnormalized homophily case can be applied:

$$\begin{aligned} \hat{\beta }_0 \,{=}\, \frac{\sum _k \hat{\beta }_0^{(k)}}{\sum _k 1} \ for \ \hat{\beta }_0^{(k)} \,{=}\, \frac{1}{k} \Bigg ( \frac{\sum _{i \in M^{(k)}_0} Y_i}{| M^{(k)}_0 | } \,{-}\, \frac{\sum _{i \in M^{(0)}_0} Y_i}{| M^{(0)}_0 | } \Bigg ) \,{=}\, \frac{1}{k} \Bigg ( \underset{i \in M^{(k)}_0}{avg} Y_i - \underset{i \in M^{(0)}_0}{avg} Y_i \Bigg ). \end{aligned}$$

(18)

where $M^{(k)}_0 := \{ i: Z_i=0, \sum _{j \in \mathcal {N}_i}Z_j = k \}$ and we are averaging over all k such that $| M^{(k)}_0 | > 0 $.

In the presence of normalized homophily, $\hat{\beta }_0$ remains an unbiased estimator of peer-influence. This is highlighted in Theorem 4 below.

Theorem 4

Consider the estimator $\hat{\beta }_0$ for linear peer-influence effect $\beta _0$. Under the presence of normalized homophily in our model (11), $\hat{\beta }_0$ is unbiased and the mean squared error of $\hat{\beta }_0$ (conditional on the treatment $\mathbf Z $) is:

$$\begin{aligned} \begin{aligned} \mathbb {E}&[ (\hat{\beta }_0 - \beta _0)^2 | \mathbf Z ] = \\&\frac{1}{(\sum _{k>0} 1)^2} \sum _{k,l>0} \frac{1}{kl} \Bigg [ h_0^2 \sigma _X^2 \Bigg ( \underset{i \in M^{(k)}_0 ,j \in M^{(l)}_0}{avg} \frac{| \mathcal {N}_i \cap \mathcal {N}_j|}{|\mathcal {N}_i||\mathcal {N}_j|} + \underset{i,j \in M^{(0)}_0}{avg} \frac{| \mathcal {N}_i \cap \mathcal {N}_j|}{|\mathcal {N}_i||\mathcal {N}_j|} - \ 2 \underset{i \in M^{(0)}_0, j \in M^{(k)}_0}{avg} \frac{| \mathcal {N}_i \cap \mathcal {N}_j|}{|\mathcal {N}_i||\mathcal {N}_j|} \Bigg ) \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad + \sigma _Y^2 \Bigg ( \frac{1}{|M^{(0)}_0|} + \frac{ \mathbf 1 _{k=l} }{|M^{(k)}_0|} \Bigg ) \Bigg ] \end{aligned} \end{aligned}$$

(19)

The difference of means estimator for linear peer-influence remains unbiased in the presence of normalized homophily. Furthermore, for most sparse and dense models for the underlying graph, Theorem 2 can be used to show that $\hat{\beta }_0$ is a consistent estimator of linear peer-influence under normalized homophily.

1.2 A.2 Disentangling Homophily from Estimation of Peer-Influence: Randomized Treatment Strategies

An algorithm for inference of linear peer-influence. We now use our general framework to design randomized treatments for the inference of linear peer-influence effects under homophily. We proceed to find the optimal treatment probabilities $\theta _s$ for $s = 1,\ldots ,r$ under a stochastic block model with r communities as before.

Let $ M_0^{(k)} $ denote the set of untreated individuals which have k neighbors (note that we are abusing notation here: Now, $ M_0^{(1)} $ represents untreated individuals which have exactly 1 neighbor, rather than at least 1 neighbor as before in the binary peer-influence case). First, we derive a proposition about $M_0^{(k)}$ under our framework.

Proposition 2

Consider a stochastic block model (SBM) of N individuals in r communities. Denote the communities of the SBM by the sets $B_1,\ldots ,B_r$, which are of respective sizes $A_1,\ldots , A_r$ (where $A_1+ \cdots +A_r = N$). Let $\mathbf P $ be the $r \times r$ adjacency probability matrix between the r communities. We assign treatments independently to individuals such that individuals in $B_s$ are treated with probability $\theta _s$ for $s=1,\ldots ,r$. Under such setup, let $ M_0^{(k)} $ denote the set of untreated individuals which have k treated neighbors. For ease of notation, let $\{ s \in M_0^{(k)} \}$ denote the event that a fixed vertex in community s is in the set $M_0^{(k)}$. Then,

$$\begin{aligned} \mathbb {P}( s \in M_0^{(k)} ) = (1 - \theta _s) \sum _{ \begin{array}{c} t_1,\ldots ,t_r: \\ \forall v =1,\ldots ,r \ 0 \le t_v \le A_v -\mathbf 1 _{ \{v=s \} }, \\ t_1+ \cdots +t_r = k \end{array}} \Bigg ( \prod _{v=1}^r \mathbf{Bin }( t_v; A_v -\mathbf 1 _{ \{v=s \} }, \theta _v P_{s,v} ) \Bigg ) \end{aligned}$$

(20)

where $\mathbf{Bin }( t_v; A_v -\mathbf 1 _{ \{v=s \} }, \theta _v P_{s,v} ) = \left( {\begin{array}{c}A_v-\mathbf 1 _{ \{v=s \} }\\ t_v\end{array}}\right) \big ( \theta _v P_{s,v} \big )^{ t_v } \big ( 1 - \theta _v P_{s,v} \big )^{ A_v -\mathbf 1 _{ \{v=s \} } - t_v }$.

The main idea behind the homophily disentangling strategy is to ensure that in every community $B_s$ on our stochastic block model, there are equal (expected) numbers of individuals being affected by different levels of peer-influence. In the case of linear peer-influence, this means choosing treatment values such that inside every community s, each individual has an equal probability of being in sets $M_0^{(k)}$ for different peer-influence levels k. Under a stochastic block model, values of k range from 0 to $N-1$ (as one individual can have at most $N-1$ treated neighbors). However, in practice, we can choose to consider $k=0,1, \ldots , K$ where K is the maximum degree of the actual observed network. Therefore, through an optimal assignment of treatments, we wish to satisfy

$$ \forall s \,{=}\, 1,\ldots , r, \quad \mathbb {P}( s \in M_0^{(0)} ) \,{=}\, \mathbb {P}( s \in M_0^{(1)} ) = \cdots = \mathbb {P}( s \in M_0^{(K-1)} ) = \mathbb {P}( s \in M_0^{(K)} ), $$

where expressions for each $\mathbb {P}( s \in M_0^{(k)} )$ as functions of $\theta _s$ for $s=1,\ldots , r$ are obtained from Proposition 2 above. This gives Kr conditions to satisfy for r variables $\theta _s \in [0,1]$ (for $s=1,\ldots , r$), so we can approach this as a constrained optimization problem as considered in the binary peer-influence case before.

B Tables of Main Results

1.1 B.1 Analytical Results

1.2 B.2 Randomized Treatment Strategies to Disentangle Homophily

C Proofs

1.1 C.1 Proof of Theorem 1 (See p. xxx)

Theorem 1

Consider the difference in means estimator $\hat{\beta }_0$ for binary peer-influence effect $\beta _0$. Under the presence of unnormalized homophily in our model (3), the mean squared error of $\hat{\beta }_0$ (conditional on the treatment $\mathbf Z $) is:

$$\begin{aligned} \begin{aligned} \mathbb {E}&[ (\hat{\beta }_0 - \beta _0)^2 | \mathbf Z ] = \Bigg ( h_0 \Bigg ( \underset{i \in M^{(1)}_0}{avg} |\mathcal {N}_i| - \underset{i \in M^{(0)}_0}{avg} |\mathcal {N}_i| \Bigg ) \Bigg )^2 \\&\quad + h_0^2 \sigma _X^2 \Bigg ( \underset{i,j \in M^{(0)}_0}{avg} | \mathcal {N}_i \cap \mathcal {N}_j| + \underset{i,j \in M^{(1)}_0}{avg} | \mathcal {N}_i \cap \mathcal {N}_j| - \ 2 \underset{i \in M^{(0)}_0, j \in M^{(1)}_0}{avg} | \mathcal {N}_i \cap \mathcal {N}_j| \Bigg ) + \sigma _Y^2 \Bigg ( \frac{1}{|M^{(0)}_0|} + \frac{1}{|M^{(1)}_0|} \Bigg ) \end{aligned} \end{aligned}$$

(5)

Proof

Recall the definition of the difference in means estimator for binary peer-influence (4).

$$\begin{aligned} \hat{\beta _0} = \underset{i \in M^{(1)}_0}{avg} Y_i - \underset{i \in M^{(0)}_0}{avg} Y_i \end{aligned}$$

where $M^{(0)}_0 := \{ i: Z_i=0, \sum _{j \in \mathcal {N}_i}Z_j = 0 \}$ (the set of untreated individuals with no treated neighbors) and $M^{(1)}_0:= \{ i: Z_i=0, \sum _{j \in \mathcal {N}_i}Z_j > 0 \}$ (the set of untreated individuals with at least one treated neighbors). The response variables $(Y_i)_{i=1,\ldots , N}$ are defined by:

$$ Y_i(Z_i=0,(Z_j)_{j \in {\mathcal {N}_i}}) = \alpha + \beta _0 \mathbf 1 _{\sum _{j \in {\mathcal {N}_i}} Z_j > 0 } + h_0 \sum _{j \in {\mathcal {N}_i}} X_j + \epsilon _i(0, \sigma ^2_Y)$$

$$ Y_i(Z_i=1,(Z_j)_{j \in {\mathcal {N}_i}},(X_j)_{j \in {\mathcal {N}_i}}) = \tau + Y_i(Z_i=0,(Z_j)_{j \in {\mathcal {N}_i}},(X_j)_{j \in {\mathcal {N}_i}}) + \beta _1 \mathbf 1 _{\sum _{j \in {\mathcal {N}_i}} Z_j > 0 } + h_1 \sum _{j \in {\mathcal {N}_i}} X_j $$

$\epsilon _i(0, \sigma ^2_Y)$ for $i=1,\ldots , N$ are the noise terms in the network, indepedent and identically distributed with zero mean and variance $\sigma ^2_Y$. Note that the sets $M^{(0)}_0$ and $M^{(1)}_0$ are $\mathbf Z $ measurable and that latent homophily variables $\mathbf X = (X_j)_{j=1,\ldots , N}$ are independent of $\mathbf Z = (Z_j)_{j=1,\ldots , N}$. Therefore,

$$\begin{aligned} \mathbb {E}[ \hat{\beta _0} | \mathbf Z ]&= \frac{\sum _{i \in M^{(1)}_0} \mathbb {E} \Big [ Y_i | \mathbf Z \Big ] }{|M^{(1)}_0|} - \frac{\sum _{i \in M^{(0)}_0} \mathbb {E} \Big [ Y_i | \mathbf Z \Big ] }{|M^{(0)}_0|} \\&= \frac{\sum _{i \in M^{(1)}_0} \mathbb {E} \Big [ \beta _0 + \sum _{j \in \mathcal {N}_i}X_j \Big ] }{|M^{(1)}_0|} - \frac{\sum _{i \in M^{(0)}_0} \mathbb {E} \Big [ \sum _{j \in \mathcal {N}_i}X_j \Big ] }{|M^{(0)}_0|} \\&= \beta _0 + \frac{\sum _{i \in M^{(1)}_0} h_0 |\mathcal {N}_i| }{|M^{(1)}_0|} - \frac{\sum _{i \in M^{(0)}_0} h_0 |\mathcal {N}_i| }{|M^{(0)}_0|} \\&= \beta _0 + h_0 \Bigg ( \underset{i \in M^{(1)}_0}{avg} |\mathcal {N}_i| - \underset{i \in M^{(1)}_0}{avg} |\mathcal {N}_i| \Bigg ). \end{aligned}$$

This gives the bias of $\hat{\beta _0}$: $\mathbb {E} \Big [ \hat{\beta _0} - \beta _0 | \mathbf Z \Big ] = h_0 \Bigg ( \underset{i \in M^{(1)}_0}{avg} |\mathcal {N}_i| - \underset{i \in M^{(1)}_0}{avg} |\mathcal {N}_i| \Bigg )$. Similarly,

$$\begin{aligned} var[ \hat{\beta _0} | \mathbf Z ]&= var \Bigg ( \frac{\sum _{i \in M^{(1)}_0} Y_i }{|M^{(1)}_0|} - \frac{\sum _{j \in M^{(0)}_0} Y_j }{|M^{(0)}_0|} \ \Bigg | \ \mathbf Z \Bigg ) \\&= \frac{ var(\sum _{i \in M^{(1)}_0} Y_i \ \big | \ \mathbf Z )}{|M^{(1)}_0|^2} + \frac{ var(\sum _{j \in M^{(0)}_0} Y_j \ \big | \ \mathbf Z )}{|M^{(0)}_0|^2} - \frac{ 2 { cov}( \sum _{i \in M^{(1)}_0} Y_i , \sum _{j \in M^{(0)}_0} Y_j \ \big | \ \mathbf Z )}{|M^{(0)}_0| |M^{(1)}_0|} \\&= \frac{ \sum _{i \in M^{(1)}_0} \sum _{k \in M^{(1)}_0}{} { cov}( Y_i , Y_k \ \big | \ \mathbf Z )}{|M^{(1)}_0|^2} + \frac{ \sum _{j \in M^{(0)}_0} \sum _{l \in M^{(0)}_0} { cov}( Y_j, Y_l \ \big | \ \mathbf Z )}{|M^{(0)}_0|^2} \\&\qquad - \frac{ 2 \sum _{i \in M^{(1)}_0} \sum _{j \in M^{(0)}_0} { cov}( Y_i , Y_j \ \big | \ \mathbf Z )}{|M^{(0)}_0| |M^{(1)}_0|} \\&= \underset{i, k \in M^{(1)}_0}{avg} { cov}( Y_i , Y_k \ \big | \ \mathbf Z ) + \underset{j,l \in M^{(0)}_0}{avg} { cov}( Y_i , Y_k \ \big | \ \mathbf Z ) - 2 \underset{i \in M^{(0)}_0, j \in M^{(1)}_0}{avg} { cov}( Y_i , Y_j \ \big | \ \mathbf Z ). \end{aligned}$$

For $i \in M^{(1)}_0$ and $k \in M^{(1)}_0$, by the law of total covariance and as $\mathbf X $ are i.i.d.,

$$\begin{aligned} { cov}( Y_i , Y_k \ \big | \ \mathbf Z )&= \mathbb {E}[ { cov}( Y_i , Y_k \ \big | \ \mathbf X , \mathbf Z ) \ \Big | \ \mathbf Z ] + cov \Big ( \mathbb {E}[ Y_i \big | \mathbf X , \mathbf Z ], \mathbb {E}[ Y_k \big | \mathbf X , \mathbf Z ] \ \Big | \ \mathbf Z \Big ) \\&= \sigma _Y^2 \mathbbm {1}_{ \{i=k \} } \ + \ cov \Big ( \alpha + \beta _0 + h_0 \sum _{a \in \mathcal {N}_i} X_a, \alpha + \beta _0 + h_0 \sum _{b \in \mathcal {N}_k} X_b \Big ) \\&= \sigma _Y^2 \mathbbm {1}_{ \{i=k \} } \ + h_0^2 \ cov \Big ( \sum _{a \in \mathcal {N}_i} X_a, \sum _{b \in \mathcal {N}_k} X_b \Big ) \\&= \sigma _Y^2 \mathbbm {1}_{ \{i=k \} } \ + h_0^2 \sigma _X^2 |\mathcal {N}_i \cap \mathcal {N}_j | \end{aligned}$$

Similarly for $j \in M^{(0)}_0$ and $l \in M^{(0)}_0$,

$$\begin{aligned} { cov}( Y_j , Y_l \ \big | \ \mathbf Z )&= \sigma _Y^2 \mathbbm {1}_{ \{j=l \} } \ + \ cov \Big ( \alpha + h_0 \sum _{a \in \mathcal {N}_j} X_a, \alpha + h_0 \sum _{b \in \mathcal {N}_l} X_b \ \Big | \ \mathbf Z \Big ) \\&= \sigma _Y^2 \mathbbm {1}_{ \{j=l \} } \ + h_0^2 \sigma _X^2 |\mathcal {N}_j \cap \mathcal {N}_l | \end{aligned}$$

For $i \in M^{(1)}_0$ and $j \in M^{(0)}_0$, by the law of total covariance and as $\mathbf X $ are i.i.d.,

$$\begin{aligned} { cov}( Y_i , Y_j \ \big | \ \mathbf Z )&= \mathbb {E}[ { cov}( Y_i , Y_j \ \big | \ \mathbf X , \mathbf Z ) \ \Big | \ \mathbf Z ] + cov \Big ( \mathbb {E}[ Y_i \big | \mathbf X , \mathbf Z ], \mathbb {E}[ Y_k \big | \mathbf X , \mathbf Z ] \ \Big | \ \mathbf Z \Big ) \\&= 0 + \ cov \Big ( \alpha + \beta _0 + h_0 \sum _{a \in \mathcal {N}_i} X_a, \alpha + h_0 \sum _{b \in \mathcal {N}_k} X_b \Big ) \\&= h_0^2 \ cov \Big ( \sum _{a \in \mathcal {N}_i} X_a, \sum _{b \in \mathcal {N}_k} X_b \Big ) \\&= h_0^2 \sigma _X^2 |\mathcal {N}_i \cap \mathcal {N}_j |. \end{aligned}$$

Therefore,

$$\begin{aligned} var[ \hat{\beta _0} | \mathbf Z ]&= \underset{i, k \in M^{(1)}_0}{avg} { cov}( Y_i , Y_k \ \big | \ \mathbf Z ) + \underset{j,l \in M^{(0)}_0}{avg} { cov}( Y_i , Y_k \ \big | \ \mathbf Z ) - 2 \underset{i \in M^{(0)}_0, j \in M^{(1)}_0}{avg} { cov}( Y_i , Y_j \ \big | \ \mathbf Z ) \\&= \underset{i, k \in M^{(1)}_0}{avg} \bigg ( \sigma _Y^2 \mathbbm {1}_{ \{i=k \} } \ + h_0^2 \sigma _X^2 |\mathcal {N}_i \cap \mathcal {N}_k | \bigg ) + \underset{j,l \in M^{(0)}_0}{avg} \bigg ( \sigma _Y^2 \mathbbm {1}_{ \{j=l \} } \ + h_0^2 \sigma _X^2 |\mathcal {N}_j \cap \mathcal {N}_l | \bigg ) \\&\qquad - 2 \underset{i \in M^{(0)}_0, j \in M^{(1)}_0}{avg} \bigg (h_0^2 \sigma _X^2 |\mathcal {N}_j \cap \mathcal {N}_l | \bigg ) \\&= h_0^2 \sigma _X^2 \Bigg ( \underset{i,j \in M^{(0)}_0}{avg} | \mathcal {N}_i \cap \mathcal {N}_j| + \underset{i,j \in M^{(1)}_0}{avg} | \mathcal {N}_i \cap \mathcal {N}_j| - \ 2 \underset{i \in M^{(0)}_0, j \in M^{(1)}_0}{avg} | \mathcal {N}_i \cap \mathcal {N}_j| \Bigg ) \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad + \sigma _Y^2 \Bigg ( \frac{1}{|M^{(0)}_0|} + \frac{1}{|M^{(1)}_0|} \Bigg ) \end{aligned}$$

Now we can recall the bias–variance decomposition of the MSE to obtain

$$\begin{aligned} \mathbb {E}&[ (\hat{\beta }_0 - \beta _0)^2 | \mathbf Z ] = \Big ( \mathbb {E}[ \hat{\beta _0} - \beta _0 | \mathbf Z ] \Big )^2 + var[ \hat{\beta _0} | \mathbf Z ] \\&= \Bigg ( h_0 \Bigg ( \underset{i \in M^{(1)}_0}{avg} |\mathcal {N}_i| - \underset{i \in M^{(0)}_0}{avg} |\mathcal {N}_i| \Bigg ) \Bigg )^2 \\&\quad + h_0^2 \sigma _X^2 \Bigg ( \underset{i,j \in M^{(0)}_0}{avg} | \mathcal {N}_i \cap \mathcal {N}_j| + \underset{i,j \in M^{(1)}_0}{avg} | \mathcal {N}_i \cap \mathcal {N}_j| - \ 2 \underset{i \in M^{(0)}_0, j \in M^{(1)}_0}{avg} | \mathcal {N}_i \cap \mathcal {N}_j| \Bigg )\\&\quad + \sigma _Y^2 \Bigg ( \frac{1}{|M^{(0)}_0|} + \frac{1}{|M^{(1)}_0|} \Bigg ) \end{aligned}$$

as required. $\square $