Skip to main content

Overcoming the Computing Barriers in Statistical Causal Inference

  • Chapter
  • First Online:
Statistical Causal Inferences and Their Applications in Public Health Research

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

  • 3033 Accesses

Abstract

The massive development in statistical causal inference to the era of big data commonly seen in public health applications can be always hindered due to the computational barriers. In this chapter we discuss a practical concern on computing barriers in statistical causal inference with example in optimal pair matching and consequently offer a novel solution by constructing a stratification tree based on exact matching and propensity scores. We demonstrate the implementation of this novel method with a large observational study from Philadelphia obstetric unit closure from 1995 to 2003 with 59 observed covariates in each of the 132,786 birth deliveries and 5,998,111 potential controls. Algorithms and R program code are also provided for interested readers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hansen, B.B., Klopfer, S.O.: Optimal full matching and related designs via network flows. J. Comput. Graph. Stat. 15, 609–627 (2006)

    Article  MathSciNet  Google Scholar 

  2. Rosenbaum, P.R.: Observational Studies. Springer Series in Statistics. Springer, New York (2002)

    Book  MATH  Google Scholar 

  3. Rosenbaum, P.R.: Design of Observational Studies. Springer, New York (2010)

    Book  MATH  Google Scholar 

  4. Rosenbaum, P.R., Rubin, D.B.: Reducing bias in observational studies using subclassification on the propensity score. J. Am. Stat. Assoc. 79, 516–524 (1984)

    Article  Google Scholar 

  5. Zhang, K., Small, D.S., Lorch, S., Srinivas, S., Rosenbaum, P.R.: Using split samples and evidence factors in an observational study of neonatal outcomes. J. Am. Stat. Assoc. 106, 511–524 (2011)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Zhang’s research is partially supported by NSF DMS-1309619, DMS-1613112, and IIS-1633212. Chen’s research was supported in part by NIH grants from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD, R01HD075635, PIs: Xinguang. Chen and Ding-Geng Chen). This material was also partially based upon work supported by the NSF under Grant DMS-1127914 to the Statistical and Applied Mathematical Sciences Institute. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Zhang thanks Dylan S. Small for very helpful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kai Zhang .

Editor information

Editors and Affiliations

Appendix: R Code for Propensity Score Stratification

Appendix: R Code for Propensity Score Stratification

The following R function opt_pstrat implements the PSS algorithm in Step 3 described above and forms the subclasses. The function takes three arguments as inputs:

  1. 1.

    indicator: This argument takes a binary vector which takes value 1 for treated units and 0 for control ones.

  2. 2.

    pscore: This argument takes a vector of propensity scores of each unit.

  3. 3.

    sizemax: This argument takes a preset tolerance level on the size of the distance matrix. The default value is 9,000,000.

The function opt_pstrat returns with the following values:

  1. 1.

    flag: This output returns 1 for successful stratification and 2 otherwise.

  2. 2.

    cutoffs: This output returns the cutoff points where the subclasses are split.

  3. 3.

    t.pstrata: This output returns a vector listing the number of treated units in each subclass formed.

  4. 4.

    c.pstrata: This output returns a vector listing the number of control units in each subclass formed.

  5. 5.

    prodsize.pstrata: This output returns a vector listing the size of distance matrix in each subclass formed.

opt_pstrat <−  function ( indicator, pscore, sizemax =9000000){

    cutoffs  <−  max( pscore )

    t. strata  <−  NULL

    c. strata  <−  NULL

    s i z e. strata  <−  NULL

    indicator_iter  <−  indicator

    pscore_iter  <−  pscore

    num_strata_formed <−  0

    while (sum( indicator_iter )>0 &  sum(1− indicator_iter )>0){

        n <−  length ( indicator_iter )

        t_ind <−  which ( indicator_iter==1)

        n_treated <−  sum( indicator_iter )

        treated_pscore_iter  <−  pscore_iter [ t_ind ]

        t_geq_t <−  n_treated+1−rank ( treated_pscore_iter,

        t i e s. method=”min”)

        c_geq_t <−  n +1−rank ( pscore_iter, t i e s. method=”min”

        ) [ t_ind]−t_geq_t

        matchable <−  c_geq_t >=  t_geq_t

        i f  (sum( matchable)>0){

            matchable_set <−  t_ind [ c_geq_t >=  t_geq_t ]

        } e l s e {

            print (”No way to s t r a t i f y:  c_geq_t <  t_geq_t.”

        ); stop}

        s i z e. dist  <−  t_geq_t * c_geq_t

        i f  (min( s i z e. dist [ matchable])> sizemax ){

            print (”No way to s t r a t i f y:  min( s i z e. dist)>sizemax. ” );

            return ( l i s t ( flag =2))

        }

        cutoff. s i z e. ind <−  which ( s i z e. dist== max( s i z e. dist [

        matchable ] [ s i z e. dist [ matchable]<sizemax ] ) ) [ 1 ]

        cutoff  <−  pscore_iter [ t_ind [ cutoff. s i z e. ind ] ]

        cutoffs  <−  c ( cutoffs, cutoff )

        t. strata  <−  c ( t. strata,sum( treated_pscore_iter >=cutoff ))

        c. strata  <−  c ( c. strata,sum( pscore_iter>=cutoff)−sum(

        treated_pscore_iter  >=  cutoff ))

        s i z e. strata  <−  c ( s i z e. strata,sum( treated_pscore_iter >=

        cutoff ) * ( sum( pscore_iter>=cutoff)−sum(

        treated_pscore_iter >=cutoff )))

        num_strata_formed <−  num_strata_formed+1

        print ( num_strata_formed )

        indicator_iter  <−  indicator_iter [ pscore_iter <cutoff ]

        pscore_iter  <−  pscore_iter [ pscore_iter <cutoff ]

    }

    i f  (sum( indicator_iter )==0){

        print (” S t r a t i f i c a t i o n  Finished:  Treated Units Used Up.”)

        return ( l i s t ( flag =1,num_pstrata = num_strata_formed,

        cutoffs=rev ( cutoffs ), t. pstrata=rev ( t. strata ), c. pstrata=

        rev ( c. strata ), prodsize. pstrata=rev ( s i z e. strata )))

    }

    i f  (sum ( ! indicator_iter )==0){

        print (” S t r a t i f i c a t i o n  Finished:  Control Units Used Up.

        Cannot Form New Strata.”)

        return ( l i s t ( flag =2,num_pstrata = num_strata_formed,

        cutoffs=rev ( cutoffs ), t. pstrata=rev ( t. strata ), c. pstrata=

        rev ( c. strata ), prodsize. pstrata=rev ( s i z e. strata )))

    }

}

As described in the main text, the function opt_pstrat is appliedwhen each stratum goes through Step 3. The outputs of this functionprovide useful information on whether to further split or match within thesubclasses.

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Zhang, K., Chen, DG. (2016). Overcoming the Computing Barriers in Statistical Causal Inference. In: He, H., Wu, P., Chen, DG. (eds) Statistical Causal Inferences and Their Applications in Public Health Research. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-41259-7_7

Download citation

Publish with us

Policies and ethics