Overcoming the Computing Barriers in Statistical Causal Inference

Zhang, Kai; Chen, Ding-Geng

doi:10.1007/978-3-319-41259-7_7

Kai Zhang⁶ &
Ding-Geng Chen⁷

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

3033 Accesses

Abstract

The massive development in statistical causal inference to the era of big data commonly seen in public health applications can be always hindered due to the computational barriers. In this chapter we discuss a practical concern on computing barriers in statistical causal inference with example in optimal pair matching and consequently offer a novel solution by constructing a stratification tree based on exact matching and propensity scores. We demonstrate the implementation of this novel method with a large observational study from Philadelphia obstetric unit closure from 1995 to 2003 with 59 observed covariates in each of the 132,786 birth deliveries and 5,998,111 potential controls. Algorithms and R program code are also provided for interested readers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hansen, B.B., Klopfer, S.O.: Optimal full matching and related designs via network flows. J. Comput. Graph. Stat. 15, 609–627 (2006)
Article MathSciNet Google Scholar
Rosenbaum, P.R.: Observational Studies. Springer Series in Statistics. Springer, New York (2002)
Book MATH Google Scholar
Rosenbaum, P.R.: Design of Observational Studies. Springer, New York (2010)
Book MATH Google Scholar
Rosenbaum, P.R., Rubin, D.B.: Reducing bias in observational studies using subclassification on the propensity score. J. Am. Stat. Assoc. 79, 516–524 (1984)
Article Google Scholar
Zhang, K., Small, D.S., Lorch, S., Srinivas, S., Rosenbaum, P.R.: Using split samples and evidence factors in an observational study of neonatal outcomes. J. Am. Stat. Assoc. 106, 511–524 (2011)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

Zhang’s research is partially supported by NSF DMS-1309619, DMS-1613112, and IIS-1633212. Chen’s research was supported in part by NIH grants from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD, R01HD075635, PIs: Xinguang. Chen and Ding-Geng Chen). This material was also partially based upon work supported by the NSF under Grant DMS-1127914 to the Statistical and Applied Mathematical Sciences Institute. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Zhang thanks Dylan S. Small for very helpful suggestions.

Author information

Authors and Affiliations

Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC, USA
Kai Zhang
School of Social Work & Department of Biostatistics, Gilling School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
Ding-Geng Chen

Authors

Kai Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ding-Geng Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kai Zhang .

Editor information

Editors and Affiliations

Department of Epidemiology School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, USA
Hua He
Christiana Care Health System, Value Institute, Newark, Delaware, USA
Pan Wu
School of Social Work and Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina, USA
Ding-Geng (Din) Chen

Appendix: R Code for Propensity Score Stratification

The following R function opt_pstrat implements the PSS algorithm in Step 3 described above and forms the subclasses. The function takes three arguments as inputs:

1.
indicator: This argument takes a binary vector which takes value 1 for treated units and 0 for control ones.
2.
pscore: This argument takes a vector of propensity scores of each unit.
3.
sizemax: This argument takes a preset tolerance level on the size of the distance matrix. The default value is 9,000,000.

The function opt_pstrat returns with the following values:

1.
flag: This output returns 1 for successful stratification and 2 otherwise.
2.
cutoffs: This output returns the cutoff points where the subclasses are split.
3.
t.pstrata: This output returns a vector listing the number of treated units in each subclass formed.
4.
c.pstrata: This output returns a vector listing the number of control units in each subclass formed.
5.
prodsize.pstrata: This output returns a vector listing the size of distance matrix in each subclass formed.

opt_pstrat <− function ( indicator, pscore, sizemax =9000000){

cutoffs <− max( pscore )

t. strata <− NULL

c. strata <− NULL

s i z e. strata <− NULL

indicator_iter <− indicator

pscore_iter <− pscore

num_strata_formed <− 0

while (sum( indicator_iter )>0 & sum(1− indicator_iter )>0){

n <− length ( indicator_iter )

t_ind <− which ( indicator_iter==1)

n_treated <− sum( indicator_iter )

treated_pscore_iter <− pscore_iter [ t_ind ]

t_geq_t <− n_treated+1−rank ( treated_pscore_iter,

t i e s. method=”min”)

c_geq_t <− n +1−rank ( pscore_iter, t i e s. method=”min”

) [ t_ind]−t_geq_t

matchable <− c_geq_t >= t_geq_t

i f (sum( matchable)>0){

matchable_set <− t_ind [ c_geq_t >= t_geq_t ]

} e l s e {

print (”No way to s t r a t i f y: c_geq_t < t_geq_t.”

); stop}

s i z e. dist <− t_geq_t * c_geq_t

i f (min( s i z e. dist [ matchable])> sizemax ){

print (”No way to s t r a t i f y: min( s i z e. dist)>sizemax. ” );

return ( l i s t ( flag =2))

}

cutoff. s i z e. ind <− which ( s i z e. dist== max( s i z e. dist [

matchable ] [ s i z e. dist [ matchable]<sizemax ] ) ) [ 1 ]

cutoff <− pscore_iter [ t_ind [ cutoff. s i z e. ind ] ]

cutoffs <− c ( cutoffs, cutoff )

t. strata <− c ( t. strata,sum( treated_pscore_iter >=cutoff ))

c. strata <− c ( c. strata,sum( pscore_iter>=cutoff)−sum(

treated_pscore_iter >= cutoff ))

s i z e. strata <− c ( s i z e. strata,sum( treated_pscore_iter >=

cutoff ) * ( sum( pscore_iter>=cutoff)−sum(

treated_pscore_iter >=cutoff )))

num_strata_formed <− num_strata_formed+1

print ( num_strata_formed )

indicator_iter <− indicator_iter [ pscore_iter <cutoff ]

pscore_iter <− pscore_iter [ pscore_iter <cutoff ]

}

i f (sum( indicator_iter )==0){

print (” S t r a t i f i c a t i o n Finished: Treated Units Used Up.”)

return ( l i s t ( flag =1,num_pstrata = num_strata_formed,

cutoffs=rev ( cutoffs ), t. pstrata=rev ( t. strata ), c. pstrata=

rev ( c. strata ), prodsize. pstrata=rev ( s i z e. strata )))

}

i f (sum ( ! indicator_iter )==0){

print (” S t r a t i f i c a t i o n Finished: Control Units Used Up.

Cannot Form New Strata.”)

return ( l i s t ( flag =2,num_pstrata = num_strata_formed,

cutoffs=rev ( cutoffs ), t. pstrata=rev ( t. strata ), c. pstrata=

rev ( c. strata ), prodsize. pstrata=rev ( s i z e. strata )))

}

As described in the main text, the function opt_pstrat is appliedwhen each stratum goes through Step 3. The outputs of this functionprovide useful information on whether to further split or match within thesubclasses.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhang, K., Chen, DG. (2016). Overcoming the Computing Barriers in Statistical Causal Inference. In: He, H., Wu, P., Chen, DG. (eds) Statistical Causal Inferences and Their Applications in Public Health Research. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-41259-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-41259-7_7
Published: 27 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41257-3
Online ISBN: 978-3-319-41259-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Overcoming the Computing Barriers in Statistical Causal Inference

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: R Code for Propensity Score Stratification

Appendix: R Code for Propensity Score Stratification

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation