Missing Data from a Causal Perspective

Mohan, Karthika; Pearl, Judea

doi:10.1007/978-3-319-28379-1_13

Missing Data from a Causal Perspective

Karthika Mohan¹⁶ &
Judea Pearl¹⁶

Conference paper
First Online: 08 January 2016

1337 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9505))

Abstract

This paper applies graph based causal inference procedures for recovering information from missing data. We establish conditions that permit and prohibit recoverability. In the event of theoretical impediments to recoverability, we develop graph based procedures using auxiliary variables and external data to overcome such impediments. We demonstrate the perils of model-blind recovery procedures both in determining whether or not a query is recoverable and in choosing an estimation procedure when recoverability holds.

Download conference paper PDF

1 Introduction

The missing data (or incomplete data) problem, characterized by the absence of values for one or more variables in a dataset is a major impediment to both theoretical and empirical research and leaves no branch of experimental science untouched. The vast amount of literature on missing data problems in such diverse fields as computer science, geology, archeology, biology, statistics and epidemiology attests to both its extent and pervasiveness [8, 12, 15, 32]. Simply ignoring the problem by deleting all tuples with missing values will, in most cases, significantly distort the outcome of a study, regardless of the size of the dataset [1, 6].

Existing methods of dealing with missing data such as Expectation Maximization Algorithm and Multiple Imputation are based on the theoretical work of Rubin [27] and Little and Rubin [28] who formulated conditions under which the damage of missingness would be minimized. However, theoretical guarantees are provided only for a subset of problems falling into the Missing At Random (MAR) category thereby leaving the vast space of MNAR problems relatively unexplored.

In this paper we view missingness from a causal perspective and take the following steps to answer questions pertaining to consistent estimation of queries of interest. Given an incomplete dataset our first step is to postulate a model based on causal assumptions of the underlying data generation process. Our second step is to determine whether the data rejects the postulated model by identifiable testable implications of that model. Our third and final step, which is also the primary focus of this paper, is to determine from the postulated model if any method exists that produces consistent estimates of the queries of interest? A negative answer confirms the presence of a theoretical impediment to estimation. In other words, a bias is inevitable.

2 Missingness Graphs

Missingness graphs as discussed below was first defined in [17] and we adopt the same notations. Let $G (\mathbb {V},E) $ be the causal DAG where $\mathbb {V}=V\cup U \cup V^* \cup \mathbb {R}$. V is the set of observable nodes. Nodes in the graph correspond to variables in the data set. U is the set of unobserved nodes (also called latent variables). E is the set of edges in the DAG. We use bi-directed edges as a shorthand notation to denote the existence of a U variable as common parent of two variables in $V \cup \mathbb {R}$. V is partitioned into ${V}_{o}$ and ${V_m}$ such that $V_o \subseteq V$ is the set of variables that are observed in all records in the population and $V_m \subseteq V$ is the set of variables that are missing in at least one record. Variable X is termed as fully observed if $X \in V_o$, partially observed if $X \in V_m$ and substantive if $X \in V_o \cup V_m$. Associated with every partially observed variable $V_i \in V_m$ are two other variables $R_{v_i}$ and $V_i^*$, where $V_i^*$ is a proxy variable that is actually observed, and $R_{v_i}$ represents the status of the causal mechanism responsible for the missingness of $V_i^*$; formally,

$$\begin{aligned} v_i^*=f (r_{v_i},v_i) = \left\{ \begin{array}{l l} v_i &{} \quad \text {if }r_{v_i}=0\\ m &{} \quad \text {if }r_{v_i}=1\\ \end{array} \right. \end{aligned}$$

(1)

$V^*$ is the set of all proxy variables and $\mathbb {R}$ is the set of all causal mechanisms that are responsible for missingness. R variables may not be parents of variables in $V \cup U$. We call this graphical representation Missingness Graph (or m-graph). An example of an m-graph is given in Fig. 1. We use the following shorthand. For any variable X, let $X'$ be a shorthand for $X=0$. For any set $W\subseteq V_m\cup V_o\cup R$, let $W_r$, $W_o$ and $W_m$ be the shorthand for $W \cap R$, $W \cap V_o$ and $W \cap V_m$ respectively. Let $R_w$ be a shorthand for $R_{V_m \cap W}$ i.e. $R_w$ is the set containing missingness mechanisms of all partially observed variables in W. Note that $R_w$ and $W_r$ are not the same. $G_{\underline{X}}$ and $G_{\overline{X}}$ represent graphs formed by removing from G all edges leaving and entering X, respectively.

A manifest distribution $P(V_o,V^*,R)$ is the distribution that governs the available dataset. An underlying distribution $P(V_o,V_m,R)$ is said to be compatible with a given manifest distribution $P(V_o,V^*,R)$ if the latter can be obtained from the former using Eq. 1. Manifest distribution $P_m$ is compatible with a given underlying distribution $P_u$ if $\forall X$, $X \subseteq V_m$ and $Y= V_m {\setminus } X$, the following equality holds true.

$$\begin{aligned} P_m(R'_x, R_y, X^*,Y^*,V_o)&= P_u(R'_x, R_y, X,V_o) \end{aligned}$$

where $R'_x$ denotes $R_x=0$ and $R_y$ denotes $R_y=1$.

3 Recoverability

Given a manifest distribution $P(V^*,V_o, R)$ and an m-graph G that depicts the missingness process, query Q is recoverable if we can compute a consistent estimate of Q as if no data were missing. Formally,

Definition 1

(Recoverability). Given a m-graph G, and a target relation Q defined on the variables in V, Q is said to be recoverable in G if there exists an algorithm that produces a consistent estimate of Q for every dataset D such that P(D) is (1) compatible with G and (2) strictly positive i.e. $P(V_o,V^*,\mathbb {R})>0$.

For an introduction to the notion of recoverability see, [17, 20].

3.1 Recovering from MCAR and MAR Data

Examine the m-graph in Fig. 1, X is the treatment and Y is the outcome. Let us assume that some patients who underwent treatment are not likely to report the outcome, and hence the arrow $X \rightarrow R_y$. Under these circumstances, can we recover P (X, Y)?

From the manifest distribution, we can compute $P (X,Y^*,R_y) $. From the m-graph G, we see that $Y^*$ is a collider and X is a fork. Hence by d-separation, $Y\bot \!\!\!\bot R_y|X$. Thus

$$\begin{aligned} P (X,Y)&=P (Y|X) P (X) \\&= P (Y|X,R_y=0) P (X)\,\,(\text {using}\, Y \bot \!\!\!\bot R_y|X) \\&= P (Y^*|X,R_y=0) P (X)\,\,(\mathrm {using\;Eq.\,1}) \end{aligned}$$

Since both factors in the estimand are estimable from the manifest distribution, P (X, Y) is recoverable.

The scenario discussed above is a typical instance of Missing At Random (MAR). When data are Missing At Random (MAR), we have $\mathbb {R}\bot \!\!\!\bot V_m | V_o$. Therefore $P(V)=P(V_m|V_o)P(V_o)=P(V_m|V_o, R=0)P(V_o)$. In other words, the joint distribution P(V) is recoverable given MAR data. Estimation methods applicable to MAR are applicable to MCAR as well because by the weak union axiom of graphoids, Missing Completely at Random (MCAR: $(V_m, V_o) \bot \!\!\!\bot R$) implies Missing At Random (MAR: $V_m \bot \!\!\!\bot R|V_o$). Therefore, it implicitly follows that queries (such as joint distribution and (identifiable) causal effects) that are recoverable given MAR datasets are recoverable given MCAR datasets as well.

4 Recoverability Procedures for MNAR Data

Data that are neither MAR nor MCAR fall into the Missing Not At Random (MNAR) category. In this section we will detail with examples three distinct recovery procedures.

4.1 Sequential Factorization

Consider an observational study that measured the variables X, Y, W and Z where we wish to estimate the effect of treatment (X) on outcome (Y). The interactions between the variables and the underlying missingness process are depicted in Fig. 2. We notice that all variables are corrupted by missing values. The least bothersome missingness is that of Y which is caused by a random process such as an accidental deletion of cases while the most troubling missingness is that of W which is caused by its own underlying value- a typical example is the case of very rich and very poor people being reluctant to reveal their income.

Recovering Causal Effect of X on Y: By backdoor criterion [19], we have two admissible sets, $\{Z\}$ and $\{W\}$ which yield the following estimands, respectively:

$$\begin{aligned} P(y|do(x))&= \sum _z P(y|xz) P(z) \\&= \sum _w P(y|xw) P(w) \end{aligned}$$

We choose the first estimand over the second because the latter contains P(W) which we know to be non-recoverable [17].^{Footnote 1} Therefore, to recover the causal effect we have to recover both P(y|xz) and P(z).

Recovering P(z) In order to d-separate Z from $R_z$, one needs to condition of X and to d-separate X from $R_x$ one needs to condition on Y. Therefore, we can write:

$$\begin{aligned} P(z)&= \sum _{x,y}P(z,x,y)\nonumber \\&=\sum _{x,y}P(z|x,y)P(x|y) P(y)\\&= \sum _{x,y}P(z|x,y,R_x=0,R_y=0,R_z=0)P(x|y,R_x=0,R_y=0) P(y|R_y=0)\nonumber \\&\text{(Using } Z \bot \!\!\!\bot (R_{z},R_x,R_y)|(X,Y)\text{, } X \bot \!\!\!\bot (R_x,R_y)|Y \text{ and } Y \bot \!\!\!\bot R_y\text{, } \text{ respectively) }\nonumber \\&= \sum _{x,y}P(z^*|x^*,y^*,R_x=0,R_y=0,R_z=0)P(x^*|y^*,R_x=0,R_y=0) P(y^*|R_y=0)\nonumber \end{aligned}$$

(2)

In the process of recovering P(z) we have in fact recovered P(x, y, z). Therefore it follows that P(y|x, z) is recoverable. Finally, the causal effect may be recovered as:

$$\begin{aligned} P(y|do(x)) = \sum _z&\frac{P(z^*|x^*,y^*,R_x=0,R_y=0,R_z=0)P(x^*|y^*,R_x=0,R_y=0) P(y^*|R_y=0)}{\sum _y P(z^*|x^*,y^*,R_x=0,R_y=0,R_z=0)P(x^*|y^*,R_x=0,R_y=0) P(y^*|R_y=0)} \\&\times \\&\sum _{x,y}P(z^*|x^*,y^*,R_x=0,R_y=0,R_z=0)P(x^*|y^*,R_x=0,R_y=0) P(y^*|R_y=0) \end{aligned}$$

Recovery Procedure: Given an m-graph with no edges between R variables, a sufficient condition for recoverability of query Q is that it be decomposable into sub-queries of the form P(Y|X) such that $Y \bot \!\!\!\bot (R_x,R_y)|X$. This recovery procedure called as seuential factorization (generalized in Theorem 1 below) is sensitive to the ordering of variables in the factorization, which in turn is dictated by the graph. For instance, in Eq. 2 had we factorized P(x, y, z) as P(y|x, z) P(x|z) P(z), we would not have had the permission to insert the R terms in any factor.

Recovering in the presence of edges between R variables: A quick inspection reveals that the factorization in Eq. 2 guarantees recoverability even when an edge $R_x \rightarrow R_z$ is added. However, addition of the (reversed) edge $R_z \rightarrow R_x$ would require conditioning on $R_z$ and Y to d-separate X from $R_x$. The procedure for recovering the marginal distribution P(Z) is presented below:

$$\begin{aligned} P(z)&= \sum _{x,y,r_z}P(z,x,y,r_z)\nonumber \\&=\sum _{x,y,r_z}P(z|x,y,r_z)P(x|y,r_z) P(y|r_z)P(r_z)\\&= \sum _{x,y}P(z|x,y,R_x=0,R_y=0,r_z=0)\sum _{r_z}P(x|y,R_x=0,R_y=0,r_z) P(y|R_y=0,r_z)P(r_z)\nonumber \\&\text{(Using } Z \bot \!\!\!\bot (R_{z},R_x,R_y)|(X,Y)\text{, } X \bot \!\!\!\bot (R_x,R_y)|(Y,R_z) \text{ and } Y \bot \!\!\!\bot R_y|R_z\text{, } \text{ respectively) }\nonumber \end{aligned}$$

(3)

The following definition and theorem in [18] formalizes the preceding recovery procedure.

Definition 2

(General Ordered factorization). Given a graph G and a set O of ordered $V \cup R$ variables $Y_1<Y_2 < \ldots < Y_k$, a general ordered factorization relative to G, denoted by f(O), is a product of conditional probabilities $f(O)= \prod _i P(Y_i|X_i)$ where $X_i \subseteq \{Y_{i+1}, \ldots , Y_n\}$ is a minimal set such that $Y_i\bot \!\!\!\bot (\{Y_{i+1}, \ldots , Y_n\}{\setminus } X_i)|X_i$ holds in G.

Theorem 1

(Sequential Factorization). A sufficient condition for recoverability of a relation Q defined over substantive variables is that Q be decomposable into a general ordered factorization, or a sum of such factorizations, such that every factor $Q_i=P(Y_i | X_i)$ satisfies, (1) $Y_i \bot \!\!\!\bot (R_{y_i}, R_{x_i}) | X_i{\setminus } \{R_{y_i}, R_{x_i}\}$, if $Y_i \in (V_o \cup V_m)$ and (2) $R_z \bot \!\!\!\bot R_{X_i}|X_i $ if $Y_i=R_z$ for any $Z \in V_m$, $Z \notin X_i$ and $X_r \cap R_{X_m}=\emptyset $.

4.2 R-Factorization

Consider the model in Fig. 3(a) in which missingness in X is caused by Y and vice-versa. This type of missingness model is called entangled because in order to d-separate any variable from its missingness mechanism one needs to condition on the other. Factorizing P(x, y) as P(x|y)P(y) or P(y|x)P(x) does not satisfy sequential factorization criterion since neither $X \bot \!\!\!\bot (R_x,R_y)|Y$ nor $Y \bot \!\!\!\bot (R_x,R_y)|X$ holds in the graph. This deadlock can however be disentangled by the following method:

$$\begin{aligned} P(X,Y)&= P(X,Y) \frac{P(R_x=0,R_y=0|X,Y)}{P(R_x=0,R_y=0|X,Y)} \\&= \frac{P(R_x=0,R_y=0) P(X,Y|R_x=0,R_y=0)}{P(R_x=0,R_y=0|X,Y)}\\&= \frac{P(R_x=0,R_y=0) P(X,Y|R_x=0,R_y=0)}{P(R_x=0|Y,R_y=0)P(R_y=0|X,R_x=0)}\\&\text{(using } R_x \bot \!\!\!\bot (R_y,X) |Y \text{ and } R_y \bot \!\!\!\bot (R_x,Y)|X\text{) }\\&=\frac{P(R_x=0,R_y=0) P(X^*,Y^*|R_x=0,R_y=0)}{P(R_x=0|Y^*,R_y=0)P(R_y=0|X^*,R_x=0)}\\ \end{aligned}$$

The following theorem generalizes this recovery procedure:

Theorem 2

(R-factorization). Given a m-graph G with no edges between the R variables and no latent variables as parents of R variables, a necessary and sufficient condition for recovering the joint distribution P(V) is that no variable X be a parent of its missingness mechanism $R_{X}$. Moreover, when recoverable, P(V) is given by

$$\begin{aligned} P(v)=\frac{P(R=0,v)}{\prod _i P(R_i=0|pa^o_{r_i},pa^m_{r_i}, R_{Pa^m_{r_i}}=0)}, \end{aligned}$$

(4)

where $Pa^o_{r_i}\subseteq V_o$ and $Pa^m_{r_i}\subseteq V_m$ are the parents of $R_i$.

Interestingly, given a model in which R variables are connected by an edge sometimes we have to use a combination of sequential and R factorization. Examine the model in Fig. 3(b). The query of interest is the joint distribution P(x, y, z) and the recovery procedure inspired by Theorem 2 follows:

$$\begin{aligned} P(x,y,z)&= \frac{P(x,y,z,r_x=0,r_y=0,r_z=0)}{P(r_x=0|y)P(r_z=0|x,r_x=0)P(r_y=0|z,r_x=0,r_z=0)} \end{aligned}$$

In order to recover $P(r_x=0|y)$ we rely on sequential factorization as shown below:

$$\begin{aligned} P(y,r_x)&= \sum _{x,z} P(x,y,z,r_x)\\&=\sum _{x,z}\frac{P(x,y,,z,r_x,r_z=0,r_y=0)}{P(r_z=0|x,r_x=0)P(r_y=0|z,r_x,r_z=0)}\\&=\sum _{x,z}\frac{P(x|y,z,r_x=0,r_z=0,r_y=0)P(y,z,r_x,r_z=0,r_y=0)}{P(r_z=0|x,r_x=0)P(r_y=0|z,r_x,r_z=0)}\\&\text{(using } X \bot \!\!\!\bot R_x|(Y,Z,R_y,R_z) \text{ i.e. } \text{ sequential } \text{ factorization) } \end{aligned}$$

Recoverability of $P(y,r_x)$ implies that $P(r_x=0|y)$ is recoverable. Hence joint distribution P(x, y, z) is recoverable given Fig. 3(b).

4.3 Interventional Factorization

Consider the model in Fig. 4. Let the query of interest be P(w, x, y, z). We will first factorize P(w, x, y, z) in a manner similar to R factorization:

$$\begin{aligned} P(w,x,y,z)&= \frac{P(w,x,y,z,r_x=0,r_y=0)}{P(r_x=0|r_y=0,y,z)P(r_y=0|x,z)} \end{aligned}$$

The recovery of the joint distribution depends on the recovery of $P(r_y=0|x,z)$. We notice that

$$\begin{aligned} P(R_y|do(Z=z),X)&= P(R_y |Z=z, X) \text{(using } \text{ rule-2 } \text{ of } \text{ do-calculus) } \end{aligned}$$

The interventional distribution can be computed as given below:

$$\begin{aligned} P(x^*,y^*,w,r_x,r_y|do(z))&= \frac{P(x^*,y^*,w,r_x,r_y,z}{P(z|w)} \nonumber \\ P(r_y,x^*,r_x |do(z))&= \sum _{w,y^*}\frac{P(x^*,y^*,w,r_x,r_y|do(z))}{P(z|w)} \end{aligned}$$

(5)

In order to recover $P(r_y=0|x,z)$, we will recover $P(x,r_y|do(z))$ and express it in terms of proxy variables.

$$\begin{aligned} P(x,r_y|do(z))&= P(x|r_y,r_x=0, do(z)) P(r_y|do(z))\nonumber \\&= P(x^*|r_y,r_x=0, do(z)) P(r_y|do(z)) \end{aligned}$$

(6)

Each factor in Eq. 6 can be computed from the intervential distribution derived in Eq. 5.

A general algorithm incorporating all these three recovery procedures in a slightly more relaxed setting is discussed in [26].

5 Recourses to Non-recoverability

Joint distribution is not recoverable given the m-graphs in Fig. 5 [18]. In this section we will show how auxiliary variables and external data can be utilized to aid recoverability.

Auxiliary variables are variables that are anciliary to the substantive research questions but are potential correlates of missingness mechanisms or partially observed variables [6]. However as noted in [29], not all variables satisfying this criterion may be used as auxiliary variables.

Selection Criteria For Auxiliary Variables: Firstly an auxiliary variable should not be a collider or a descendant of a collider on the path from a partially observed variable to its missingness mechanism. For example in Fig. 5(b) neither Y nor its descendants may serve as auxiliary variables while recovering P(X). Secondly, in the presence of an inducing path between X and $R_x$ as shown in Fig. 5(c), the ideal auxiliary variables are latent variables $L_1$ or $L_2$. Conditioning on either of these will d-separate X from $R_x$ and facilitate the recovery of P(X).

Recovery Aided By External Data: It is often the case that incorporating data from external sources can aid recovery. For example, consider a manifest distribution in which age is a partially observed variable. Distribution of age for a given population may be easily available from an external agency such as the census bureau. The question we ask is how can this data be combined with the existing missing data set to recover a query of interest.

Consider the Fig. 5(a), suppose the query of interest is P(X, Y). P(Y|X) is recoverable by sequential factorization. If from an external source we obtain P(X), then P(y, x) may be recovered as $P(y|x^*,r_x=0) P(x)$. In Fig. 5(b) however, P(Y) and P(X) are recoverable. If we can obtain either P(y|x) or P(x|y) from an external source, then P(x, y) can be recovered.

6 Perils of Model Blind Recovery Procedures

Model-blind algorithms are algorithms that attempt to handle missing-data problems on the basis of the data alone, without making any assumptions about the structure of the missingness process. We unveil a fundamental limitation of model-blind algorithms by presenting two statistically indistinguishable models such that a given query is recoverable in one and non-recoverable in the other.

The two graphs in Fig. 6(a) and (b) cannot be distinguished by any statistical means, since Fig. 6(a) has no testable implications [16] and Fig. 6(b) is a complete graph. However in Fig. 6(a) $P (X,Y)=P(X^*|Y,R_x)P(Y)$ is recoverable while in Fig. 6(b) P (X, Y) is not recoverable (by Theorem 2 in [17]).

An even stronger limitation is demonstrated below. We show that no model-blind algorithm exists even in those cases where recoverability is feasible. We exemplify our claim below by constructing two statistically indistinguishable models, $G_1$ and $G_2$, dictating different estimation procedure $E_1$ and $E_2$ respectively; yet Q is not recoverable in $G_1$ by $S_2$ or in $G_2$ by $S_1$.

Consider the graphs in Fig. 6(a) and (c); they are statistically indistinguishable since neither has testable implications. Let the target relation of interest be $Q=P (X) $. In Fig. 6(a), Q may be estimated as $P (X) = \sum _y P (X|Y,R_x=0) P (Y) $ since $X \bot \!\!\!\bot R_x|Y$ and in Fig. 6(b), Q can be derived as $P (X) = P (X|R_x=0) $ since $X \bot \!\!\!\bot R_x$.

7 Related Work

Deletion based methods such as listwise deletion that are easy to understand as well as implement, guarantee consistent estimates only for certain categories of missingness such as MCAR [24]. Maximum Likelihood method is known to yield consistent estimates under MAR assumption; expectation maximization algorithm and gradient based algorithms are widely used for searching for ML estimates under incomplete data [4, 5, 10, 11]. Most work in machine learning assumes MAR and proceeds with ML or Bayesian inference. However, there are exceptions such as recent work on collaborative filtering and recommender systems which develop probabilistic models that explicitly incorporate missing data mechanism [13–15].

Other methods for handling missing data can be classified into two: (a) Inverse Probability Weighted Methods and (b) Imputation based methods [23]. Inverse Probability Weighing methods analyze and assign weights to complete records based on estimated probabilities of completeness [22, 32]. Imputation based methods substitute a reasonable guess in the place of a missing value [1] and Multiple Imputation [12] is an imputation method that is less sensitive to a bad starting point.

Missing data is a special case of coarsened data and data are said to be coarsened at random (CAR) if the coarsening mechanism is only a function of the observed data [9]. [21] introduced a methodology for parameter estimation from data structures for which full data has a non-zero probability of being fully observed and their methodology was later extended to deal with censored data in which complete data on subjects are never observed [31].

The use of graphical models for handling missing data is a relatively new development. [3] used graphical models for analyzing missing information in the form of missing cases (due to sample selection bias). Attrition is a common occurrence in longitudinal studies and arises when subjects drop out of the study [7, 25, 30] analysed the problem of attrition using causal graphs. [27, 28] cautioned the practitioner that contrary to popular belief (as stated in [2, 6]), not all auxiliary variables reduce bias. Both [7, 28] associate missingness with a single variable and interactions among several missingness mechanisms are unexplored.

[17] employed a formal representation called Missingness Graphs to depict the missingness process, defined the notion of recoverability and derived conditions under which queries would be recoverable when datasets are categorized as Missing Not At Random (MNAR). Tests to detect misspecifications in the m-graph are discussed in [16].

8 Conclusions

This chapter presents the missing data problem from a causal perspective and provided procedures for estimating queries of interest for datasets falling into the MNAR (Missing Not At Random) Category. We demonstrated how auxiliary variables and data from external sources can be used to circumvent theoretical impediments to recoverability. Finally we showed that model-blind recovery techniques such as Multiple Imputation are prone to error and are insufficient to guarantee consistent estimates.

Notes

1.
The presence of a non-recoverable factor in a summand does not always imply the non-recoverability of the summand. See Example-3 in [18].

References

Allison, P.D.: Missing Data Series: Quantitative Applications in the Social Sciences (2002)
Google Scholar
Collins, L.M., Schafer, J.L., Kam, C.-M.: A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol. Methods 6(4), 330 (2001)
Article Google Scholar
Daniel, R.M., Kenward, M.G., Cousens, S.N., De Stavola, B.L.: Using causal diagrams to guide analysis in missing data problems. Stat. Methods Med. Res. 21(3), 243–256 (2012)
Article MathSciNet MATH Google Scholar
Darwiche, A.: Modeling and Reasoning with Bayesian Networks. Cambridge University Press, New York (2009)
Book MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc. B. (Methodol.) 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Enders, C.K.: Applied Missing Data Analysis. Guilford Publications, New York (2010)
Google Scholar
Garcia, F.M.: Definition and diagnosis of problematic attrition in randomized controlled experiments. Working paper, April 2013. http://ssrn.com/abstract=2267120
Graham, J.W.: Missing Data: Analysis and Design. Statistics for Social and Behavioral Sciences. Springer, New York (2012)
Book MATH Google Scholar
Heitjan, D.F., Rubin, D.B.: Ignorability and coarse data. Ann. Stat. 19(4), 2244–2253 (1991)
Article MathSciNet MATH Google Scholar
Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. Cambridge University Press, New York (2009)
MATH Google Scholar
Lauritzen, S.L.: The EM algorithm for graphical association models with missing data. Comput. Stat. Data Anal. 19(2), 191–201 (1995)
Article MathSciNet MATH Google Scholar
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (2002)
Book MATH Google Scholar
Marlin, B.M., Zemel, R.S.: Collaborative prediction and ranking with non-random missing data. In: Proceedings of the Third ACM Conference on Recommender Systems, pp. 5–12. ACM (2009)
Google Scholar
Marlin, B.M., Zemel, R.S., Roweis, S., Slaney, M.: Collaborative filtering and the missing at random assumption. In: UAI (2007)
Google Scholar
Marlin, B.M., Zemel, R.S., Roweis, S.T., Slaney, M.: Recommender systems: missing data and statistical model estimation. In: IJCAI (2011)
Google Scholar
Mohan, K., Pearl, J.: On the testability of models with missing data. In: Proceedings of AISTAT (2014)
Google Scholar
Mohan, K., Pearl, J., Tian, J.: Graphical models for inference with missing data. Adv. Neural Inf. Process. Syst. 26, 1277–1285 (2013)
Google Scholar
Mohan, K., Pearl J.: Graphical models for recovering probabilistic and causal queries from missing data. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 1520–1528 (2014)
Google Scholar
Pearl, J.: Causality: Models, Reasoning and Inference. Cambridge University Press, New York (2009)
Book MATH Google Scholar
Pearl, J., Mohan, K.: Recoverability and testability of missing data: Introduction and summary of results. Technical report R-417, UCLA (2013). http://ftp.cs.ucla.edu/pub/stat_ser/r417.pdf
Robins, J.M., Rotnitzky, A.: Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell, N.P., Dietz, K., Farewell, V.T. (eds.) AIDS Epidemiology, pp. 297–331. Springer, New York (1992)
Chapter Google Scholar
Robins, J.M., Rotnitzky, A., Zhao, L.P.: Estimation of regression coefficients when some regressors are not always observed. J. Am. Stat. Assoc. 89(427), 846–866 (1994)
Article MathSciNet MATH Google Scholar
Rothman, K.J., Greenland, S., Lash, T.L.: Modern Epidemiology. Lippincott Williams & Wilkins, Philadelphia (2008)
Google Scholar
Rubin, D.B.: Inference and missing data. Biometrika 63, 581–592 (1976)
Article MathSciNet MATH Google Scholar
Shadish, W.R.: Revisiting field experimentation: field notes for the future. Psychol. Methods 7(1), 3 (2002)
Article Google Scholar
Shpitser, I., Mohan, K., Pearl, J.: Missing data as a causal and probabilistic problem. In: Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence (2015)
Google Scholar
Thoemmes, F., Mohan, K.: Graphical representation of missing data problems. Struct. Equ. Model. Multi. J. 37(1), 1–13 (2015)
Google Scholar
Thoemmes, F., Rose, N.: Selection of auxiliary variables in missing data problems: Not all auxiliary variables are created equal. Technical report R-002, Cornell University (2013)
Google Scholar
Thoemmes, F., Mohan, K.: Graphical representation of missing data problems. Struct. Equ. Model. Multi. J. 22(4), 1–13 (2015)
Google Scholar
Twisk, J., de Vente, W.: Attrition in longitudinal studies: how to deal with missing data. J. clin. epidemiol. 55(4), 329–337 (2002)
Article Google Scholar
Van Der Laan, M.J., Robins, J.M.: Locally efficient estimation with current status data and time-dependent covariates. J. Am. Stat. Assoc. 93(442), 693–701 (1998)
Article MathSciNet MATH Google Scholar
Van der Laan, M.J., Robins, J.M.: Unified Methods for Censored Longitudinal Data and Causality. Springer, New York (2003)
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of California, Los Angeles, CA, 90095, USA
Karthika Mohan & Judea Pearl

Authors

Karthika Mohan
View author publications
You can also search for this author in PubMed Google Scholar
Judea Pearl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karthika Mohan .

Editor information

Editors and Affiliations

Osaka University, Osaka, Japan
Joe Suzuki
The University of Electro-Communications, Tokyo, Japan
Maomi Ueno

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mohan, K., Pearl, J. (2015). Missing Data from a Causal Perspective. In: Suzuki, J., Ueno, M. (eds) Advanced Methodologies for Bayesian Networks. AMBN 2015. Lecture Notes in Computer Science(), vol 9505. Springer, Cham. https://doi.org/10.1007/978-3-319-28379-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-28379-1_13
Published: 08 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28378-4
Online ISBN: 978-3-319-28379-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics