, Volume 45, Issue 2, pp 261–294 | Cite as

Multilevel structural equation modeling-based quasi-experimental synthetic cohort design

  • Qiu WangEmail author
  • Richard T. Houang
  • Kimberly Maier
Original Paper


This paper provides a theoretical foundation to examine the effectiveness of post-hoc adjustment approaches such as propensity score matching in reducing the selection bias of synthetic cohort design (SCD) for causal inference and program evaluation. Compared with the Solomon four-group design, the SCD often encounters selection bias due to the imbalance of covariates between the two cohorts. The efficiency of SCD is ensured by the historical equivalence of groups (HEoG) assumption, indicating the comparability between the two cohorts. The multilevel structural equation modeling framework is used to define the HEoG assumption. According to the mathematical proof, HEoG ensures that the use of SCD results in an unbiased estimator of the schooling effect. Practical considerations and suggestions for future research and use of SCD are discussed.


Propensity score matching Solomon four-group design Multilevel analysis Quasi-longitudinal design Causal inference Multilevel structural equation modeling Matching Synthetic cohort design 

Mathematics Subject Classification

62-P25 62B15 62-07 


  1. Battistin E, Chesher A (2014) Treatment effect estimation with covariate measurement error. J Econom 178(2):707–715MathSciNetzbMATHCrossRefGoogle Scholar
  2. Berger V (2005) Selection bias and covariate imbalances in randomized clinical trials. Wiley, New YorkCrossRefGoogle Scholar
  3. Biemer PP, Groves RM, Lyberg LE (2004) Measurement errors in surveys. Wiley, HobokenCrossRefGoogle Scholar
  4. Bloom H (2005) Learning More from Social Experiments: Evolving Analytic Approaches. Russell Sage Foundation, New YorkGoogle Scholar
  5. Bloom HS, Richburg-Hayes L, Black AR (2007) Using covariates to improve precision for studies that randomize schools to evaluate educational interventions. Educ Eval Policy Anal 29(1):30–59CrossRefGoogle Scholar
  6. Bollen KA (1989) Structural equations with latent variables. Wiley, New YorkzbMATHCrossRefGoogle Scholar
  7. Burstein L (1992) The IEA study of mathematics III: student growth and classroom processes. Pergamon Press, OxfordGoogle Scholar
  8. Campbell RT, Hudson CM (1985) Synthetic cohorts from panel surveys. Res Aging 7(1):81–93CrossRefGoogle Scholar
  9. Campbell DT, Stanley JC (1963) Experimental and quasi-experimental designs for research. Rand McNally College Publishing Company, SkokieGoogle Scholar
  10. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement error in nonlinear models: a modern perspective. CRC Press, Boca RatonzbMATHCrossRefGoogle Scholar
  11. Cheung GW, Rensvold RB (2002) Evaluating goodness-of-fit indexes for testing measurement invariance. Struct Equ Model Multidiscip J 9(2):233–255MathSciNetCrossRefGoogle Scholar
  12. Cochran WG (1957) Analysis of covariance: its nature and uses. Biometrics 13(3):261–281MathSciNetCrossRefGoogle Scholar
  13. Cochran WG (1968) Errors of measurement in statistics. Technometrics 10(4):637–666zbMATHCrossRefGoogle Scholar
  14. Cochran WG, Chambers SP (1965) The planning of observational studies of human populations. J R Stat Soc Ser A (General) 128(2):234–266CrossRefGoogle Scholar
  15. Cochran WG, Rubin DB (1973) Controlling bias in observational studies: a review. Sankhy Indian J Stat Ser A 35:417–446zbMATHGoogle Scholar
  16. Cox DR, Reid N (2000) The theory of the design of experiments. CRC Press, Boca RatonzbMATHGoogle Scholar
  17. Elder GH (1998) The life course as developmental theory. Child Dev 69(1):1–12CrossRefGoogle Scholar
  18. Freedman LS, Green SB, Byar DP (1990) Assessing the gain in efficiency due to matching in a community intervention study. Stat Med 9(8):943–952CrossRefGoogle Scholar
  19. Fuller WA (1987) Measurement error models. Wiley, New YorkzbMATHCrossRefGoogle Scholar
  20. Fuller WA (1995) Estimation in the presence of measurement error. Int Stat Rev (Revue Internationale de Statistique) 63(2):121–141zbMATHCrossRefGoogle Scholar
  21. Hansen MH, Hurwitz WN, Bershad MA (1961) Measurement errors in censuses and surveys. Bull Inst Int Stat 38(2):359–374Google Scholar
  22. Haviland AM, Nagin DS (2005) Causal inferences with group based trajectory models. Psychometrika 70(3):557–578MathSciNetzbMATHCrossRefGoogle Scholar
  23. Heckman JJ (1979) Sample selection bias as a specification error. Econom J Econom Soc 47(1):153–161MathSciNetzbMATHGoogle Scholar
  24. Hedges LV (2007) Correcting a significance test for clustering. J Educ Behav Stat 32(2):151–179CrossRefGoogle Scholar
  25. Heimberg RG, Stein MB, Hiripi E, Kessler RC (2000) Trends in the prevalence of social phobia in the united states: a synthetic cohort analysis of changes over four decades. Eur Psychiatry 15(1):29–37CrossRefGoogle Scholar
  26. Hong G, Raudenbush SW (2006) Evaluating kindergarten retention policy. J Am Stat Assoc 101(475):901–910zbMATHCrossRefGoogle Scholar
  27. Huberty CJ, Olejnik S (2006) Applied MANOVA and discriminant analysis. Wiley, New YorkzbMATHCrossRefGoogle Scholar
  28. International Association for the Evaluation of Educational Achievement (1977) The second international mathematics study. IEA, AmsterdamGoogle Scholar
  29. Jakubowski M et al (2015) Latent variables and propensity score matching: a simulation study with application to data from the programme for international student assessment in poland. Empir Econ 48(3):1287–1325MathSciNetCrossRefGoogle Scholar
  30. Jöreskog KG, Sörbom D (1996) LISREL 8: user’s reference guide. Scientific software International, ChicagoGoogle Scholar
  31. Kaplan D (1999) An extension of the propensity score adjustment method for the analysis of group differences in mimic models. Multivar Behav Res 34(4):467–492CrossRefGoogle Scholar
  32. Kaplan D (2008) Structural equation modeling: foundations and extensions. Sage Publications, Thousand OakszbMATHGoogle Scholar
  33. Kessler RC, Stein MB, Berglund P (1998) Social phobia subtypes in the national comorbidity survey. Am J Psychiatry 155(5):613–619CrossRefGoogle Scholar
  34. Lee SY (2007) Structural equation modeling: a Bayesian approach. Wiley, New YorkCrossRefGoogle Scholar
  35. Leon AC, Hedeker D (2005) A mixed-effects quintile-stratified propensity adjustment for effectiveness analyses of ordered categorical doses. Stat Med 24(4):647–658MathSciNetCrossRefGoogle Scholar
  36. Li YP, Propert KJ, Rosenbaum PR (2001) Balanced risk set matching. J Am Stat Assoc 96(455):870–882MathSciNetzbMATHCrossRefGoogle Scholar
  37. Lord FM (1980) Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates, MahwahGoogle Scholar
  38. Lord FM, Novick MR (1968) Statistical theories of mental test scores. Addison-Wesley Publishing Company, BostonzbMATHGoogle Scholar
  39. Mahalanobis PC (1946) A sample survey of after-effects of the bengal famine of 1943. Sankhy Indian J Stat 7(4):337–400Google Scholar
  40. McCaffrey DF, Lockwood JR, Setodji CM (2013) Inverse probability weighting with error-prone covariates. Biometrika 100(3):671–680MathSciNetzbMATHCrossRefGoogle Scholar
  41. Muthén LK, Muthén BO (1998–2012) Mplus user’s guide. Muthén & MuthénGoogle Scholar
  42. Muthén BO (1994) Multilevel covariance structure analysis. Sociol Methods Res 22(3):376–398CrossRefGoogle Scholar
  43. Muthén BO, KG Jöreskog (1983) Selectivity problems in quasi-experimental studies. Eval Rev 7(2):139–174CrossRefGoogle Scholar
  44. Raab GM, Butcher I (2001) Balance in cluster randomized trials. Stat Med 20(3):351–365CrossRefGoogle Scholar
  45. Raudenbush SW, Liu XF (2001) Effects of study duration, frequency of observation, and sample size on power in studies of group differences in polynomial change. Psychol Methods 6(4):387–401CrossRefGoogle Scholar
  46. Reckase M (2009) Multidimensional item response theory, vol 150. Springer, BerlinzbMATHCrossRefGoogle Scholar
  47. Rosenbaum PR (1986) Dropping out of high school in the united states: an observational study. J Educ Behav Stat 11(3):207–224CrossRefGoogle Scholar
  48. Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41–55MathSciNetzbMATHCrossRefGoogle Scholar
  49. Rubin DB (1973) Matching to remove bias in observational studies. Biometrics 29(1):159–183CrossRefGoogle Scholar
  50. Rubin DB (1978) Bayesian inference for causal effects: the role of randomization. Ann Stat 6(1):34–58MathSciNetzbMATHCrossRefGoogle Scholar
  51. Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Methods 7(2):147 (ISSN 1939-1463)CrossRefGoogle Scholar
  52. Schmidt WH, Burstein L (1992) Concomitants of growth in mathematics achievement during the population a school year. In: Burstein L (ed) The IEA study of mathematics III: student growth and classroom processes. Pergamon Press, Oxford, pp 309–327CrossRefGoogle Scholar
  53. Schmidt WH, Houang TR (1986) Ein vergleich von drei analyseverfahren fur hierarchist strukturierte daten. In: Saldern MV (ed) Mehrebenenanalyse. PVU, Weinheim, pp 71–81Google Scholar
  54. Solomon RL (1949) An extension of control group design. Psychol Bull 46(2):137–150CrossRefGoogle Scholar
  55. Song M, Herman R (2010) Critical issues and common pitfalls in designing and conducting impact studies in education. Educ Eval Policy Anal 32(3):351–371CrossRefGoogle Scholar
  56. Spiegelman D, Schneeweiss S, McDermott A (1997) Measurement error correction for logistic regression models with an “alloyed gold standard”. Am J Epidemiol 145(2):184CrossRefGoogle Scholar
  57. Steiner PM, Cook TD, Shadish WR (2011) On the importance of reliable covariate measurement in selection bias adjustments using propensity scores. J Educ Behav Stat 36(2):213–236CrossRefGoogle Scholar
  58. Stuart EA, Rubin DB (2008) Matching with multiple control groups with adjustment for group differences. J Educ Behav Stat 33(3):279–306CrossRefGoogle Scholar
  59. Tatsuoka MM (1971) Multivariate analysis: techniques for educational and psychological research. Wiley, New YorkzbMATHGoogle Scholar
  60. Wang Q (2015) Propensity score matching on multilevel data. In: Pan W, Bai H (eds) Propensity score analysis: fundamentals and developments. Guilford, New York, pp 217–235Google Scholar
  61. Wang Q, Maier K, Houang R (2017a) Omitted variables, R 2, and bias reduction in matching hierarchical data: a monte carlo study. J Stat Adv Theory Appl 17(1):43–81CrossRefGoogle Scholar
  62. Wang Q, Houang R, Maier K (2017b) Bias reduction rates for latent variable matching versus matching through surrogate variables with measurement errors. Interdiscip Educ Psychol 1(1):9CrossRefGoogle Scholar
  63. Webb-Vargas Y, Rudolph KE, Lenis D, Murakami P, Stuart EA (2015) An imputation-based solution to using mismeasured covariates in propensity score analysis. Stat Methods Med Res 26(4):1824–1837. MathSciNetCrossRefGoogle Scholar
  64. Wiley DE, Wolfe RG (1992) Major survey design issues for the IEA third international mathematics and science study. Prospects 22(3):297–304CrossRefGoogle Scholar
  65. Wolfe RG (1987) Second international mathematics study: training manual for use of the databank of the longitudinal, classroom process surveys for population a in the IEA second international mathematics study. Contractor’s report. Center for Education Statistics, Washington, DCGoogle Scholar
  66. Wooldridge JM (2002) Econometric analysis of cross section and panel data. MIT Press, CambridgezbMATHGoogle Scholar

Copyright information

© The Behaviormetric Society 2018

Authors and Affiliations

  1. 1.Measurement and Research Methodology, Department of Higher Education, School of EducationSyracuse UniversitySyracuseUSA
  2. 2.Center for the Study of Curriculum Policy, College of EducationMichigan State UniversityEast LansingUSA
  3. 3.Department of Counseling, Educational Psychology, and Special Education, College of EducationMichigan State UniversityEast LansingUSA

Personalised recommendations