Skip to main content

Regression Discontinuity Design: Recent Developments and a Guide to Practice for Researchers in Higher Education

  • Chapter
  • First Online:

Part of the book series: Higher Education: Handbook of Theory and Research ((HATR,volume 27))

Abstract

In an educational research climate, where the understanding of causal relationships between predictors and outcome variables is of primary interest, however, randomized experiments are either unethical or unfeasible, researchers have been turning to research designs and statistical techniques that produce causal results in a quasiexperimental manner. One such quasiexperimental design is regression discontinuity (RD), which has been shown to require fewer assumptions than most other designs and the assumptions it does require are verified with nearly the same ease as a randomized experiment. We provide a user friendly guide to RD designs for the educational researcher. We begin by discussing terminology associated with both experimental and RD designs. Next, we describe the process of checking the assumptions associated with an RD design and use an empirical example to illustrate this process. We then expand on the empirical example to show how to obtain estimates of treatment effects in both “sharp” and “fuzzy” RD designs employing both parametric and nonparametric frameworks. Finally, we discuss new developments in RD research and some future applications in educational research. A technical appendix is also included with additional details on the selection of parameters in the estimation procedure and we include a URL link to statistical code to conduct RD analysis.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    For a more complete discussion see Angrist and Pischke (2009).

  2. 2.

    All that is really required is that E(Y|X) is continuous for X = c.

  3. 3.

    Of course, if we truly knew that the \( E(\varepsilon |X) \) was linear in X, we could reduce the variance of the estimate by using all the data. However, if the \( E(\varepsilon |X) \) is nonlinear in X it will be approximately linear at any point such as the cut-off value. So, limiting the data to an interval around the cut-off point, while leading to a less precise estimate, reduces the potential bias of the estimate.

  4. 4.

    For a proof of this result, let \( \varepsilon ={{\alpha }_{0}}+{{\alpha }_{1}}Z+\nu\). Now, suppose we assume that E(υ|X) is a continuous function of X but that E(Z|X) jumps at X = c. So, the term \( E(\upsilon |c+d)-E(\nu |c-d) \) approaches zero as d approaches zero while the term \( E(Z|c+d)-E(Z|c-d) \) approaches some nonzero number call it \( \xi\). Hence, E(ε|X) is discontinuous at the cut-off value c and the basic RD assumption is violated.

  5. 5.

    The eight noncognitive questions measure: positive self-concept; realistic self-appraisal; successfully handling the system (racism); preferences for long-term goals; availability of a strong support person; leadership experience; community involvement; knowledge acquired in a particular field, while the applicants are rated on three cognitive dimensions: rigor of course work; number of math, science and language courses; and a scholarly essay.

  6. 6.

    For more information on the GMS program see DesJardins et al. (2010). A quasiexperimental investigation of how the Gates millennium scholars program is related to college students’ time use and activities. Educational Evaluation and Policy Analysis, 32(4), 456–475.

  7. 7.

    For the analyses of the baseline and second follow-up survey data, see DesJardins et al. (2010).

  8. 8.

    This is not surprising given the fact that the cut-off point for a particular ethnic group depends on the total number of applicants for that ethnic group which is unknown by any applicant (or by those who score the exam) at the time they complete the test.

  9. 9.

    The key difference is that in nonparametric techniques this bin-width, or more generally the bandwidth, decreases as the sample size increases.

  10. 10.

    Given the discreteness of the total score, we eliminate those observations whose relative score is less than -10 or whose relative score is greater than 9. This results in dropping 1,152 observations.

  11. 11.

    Only nine individuals who were offered a GMS declined the offer.

  12. 12.

    For simplicity, we estimate linear probability models for dichotomous dependent variables. We adjust the standard errors for heteroskedasticity. Probit estimates yield similar results. This was done because in the fuzzy model estimations, presented below, instrumental variable estimation with a linear probability model was more stable than instrumental variable probit estimation across the many different model specifications that we estimated.

  13. 13.

    As with the linear term, all quadratic terms are interacted with ethnicity and cohort.

  14. 14.

    For technical reasons this estimation technique is preferred to other nonparametric techniques when applied to RD models.

  15. 15.

    The estimates with the GMS data are similar when other kernel densities are used.

  16. 16.

    The mean squared error (MSE) equals bias squared plus the variance.

References

  • Angrist, J. D., Lavy, V. (1999). Using Maimonides’ rule to estimate the effect of class size on scholastic achievement. The Quarterly Journal of Economics, 114(2), 533–575.

    Article  Google Scholar 

  • Angrist, J. D., Pischke, J. S. (2009). Mostly harmless econometrics: An empiricist’s companion. Princeton: Princeton University Press.

    Google Scholar 

  • Bayer, P., Ferreira, F., & McMillan, R. (2007). A unified framework for measuring preferences for schools and neighborhoods. Journal of Political Economy, 115(4), 588–638.

    Article  Google Scholar 

  • Berlinski, S., Galiani, S., & McEwan, P. J. (2011). Preschool and maternal labor market outcomes: Evidence from a regression discontinuity design. Economic Development and Cultural Change, 59(2), 313–344.

    Article  Google Scholar 

  • Black, S. E. (1999). Do better schools matter? Parental evaluation of elementary education. Quarterly Journal of Economics, 114(2), 577–599.

    Article  Google Scholar 

  • Browning, M., & Heinesen, E. (2007). Class size, teacher hours and educational attainment. The Scandinavian Journal of Economics, 109(2), 415–438.

    Article  Google Scholar 

  • Card, D., Lee, D. S., & Pei, Z. (2009). Quasi-experimental identification and estimation in the regression kink design. Industrial relations section working paper. Princeton: Princeton University.

    Google Scholar 

  • Chay, K. Y., McEwan, P. J., & Urquiola, M. (2005). The central role of noise in evaluating interventions that use test scores to rank schools. The American Economic Review, 95(4), 1237–1258.

    Article  Google Scholar 

  • Cook, T. (2008). “Waiting for life to arrive”: A history of the regression-discontinuity design in psychology, statistics and economics. Journal of Econometrics, 142(2), 636–654.

    Article  Google Scholar 

  • Damon, C. (2009). The performance and competitive effects of school autonomy. The Journal of Political Economy, 117(4), 745–783.

    Article  Google Scholar 

  • DesJardins, S. L., McCall, B. P., Ott, M., & Kim, J. (2010). A quasi-experimental investigation of how the Gates Millennium Scholars Program is related to college students’ time use and activities. Educational Evaluation and Policy Analysis, 32(4), 456–475.

    Article  Google Scholar 

  • DesJardins, S. L., & McCall, B. P. (2011). The impact of the Gates Millennium Scholars Program on college and post-college related choices of low-income minority students. Unpublished Manuscript.

    Google Scholar 

  • Dobkin, C., Gil. R., & Marion, J. (2010). Skipping class in college and exam performance: Evidence from a regression discontinuity classroom experiment. Economics of Education Review, 29(4), 566–575.

    Article  Google Scholar 

  • Fan, J., & Gijbels, I. (1992). Variable bandwidth and local linear regression smoothers. Annals of Statistics, 20(4), 2008–2036.

    Article  Google Scholar 

  • Fan, J., & Gijbels, I. (1996). Local polynomial modeling and its applications. New York: Chapman and Hall.

    Google Scholar 

  • Fitzpatrick, M. D. (2010). Preschoolers enrolled and mothers at work? The effects of universal prekindergarten. Journal of Labor Economics, 28(1), 51–85.

    Article  Google Scholar 

  • Goux, D., & Maurin, E. (2010). Public school availability for two-year olds and mothers’ labour supply. Labour Economics, 17(6), 951–962.

    Article  Google Scholar 

  • Hahn, J., Todd, P., & Van der Klaauw, W. (2001). Identification and estimation of treatment effects with regression-discontinuity design. Econometrica, 69(1), 201–209.

    Article  Google Scholar 

  • Henry, G. T., Fortner, C. K., & Thompson, C. L. (2010). Targeted funding for educationally disadvantaged students: A regression discontinuity estimate of the impact on high school student achievement. Educational Evaluation and Policy Analysis, 32(2), 183–204.

    Article  Google Scholar 

  • Jacob, B. A., & Lefgren, L. (2004a). The impact of teacher training on student achievement: Quasi-experimental evidence from school reform efforts in Chicago. The Journal of Human Resources, 39(1), 50–79.

    Article  Google Scholar 

  • Jacob, B. A., & Lefgren, L. (2004b). Remedial education and student achievement: A regression-discontinuity analysis. The Review of Economics and Statistics, 86(1), 226–244.

    Article  Google Scholar 

  • Lee, D., & Card, D. (2008). Regression discontinuity inference with specification error. Journal of Econometrics, 142(2), 655–674.

    Article  Google Scholar 

  • Lee, D., & Lemieux, T. (2009). Regression discontinuity design in economics. National Bureau of Economics working papers, Cambridge.

    Google Scholar 

  • Lesik, S. A. (2007). Do developmental mathematics programs have a causal impact on student retention: An application of discrete-time survival and regression discontinuity analysis. Research in Higher Education, 48(5), 583–608.

    Article  Google Scholar 

  • Ludwig, J., & Miller, D. L. (2007). Does head start improve children’s life chances? Evidence from a regression discontinuity approach. Quarterly Journal of Economics, 122(1), 159–208.

    Article  Google Scholar 

  • Martorell, P., & McFarlin, I. (2011). Help or hindrance? The effects of college remediation on academic and labor economic outcomes. Review of Economics and Statistics, 93(2), 436–454.

    Article  Google Scholar 

  • Matsudaira, J. (2008). Mandatory summer school and student achievement. Journal of Econometrics, 142(2), 829–850.

    Article  Google Scholar 

  • McCrary, J. (2008). Manipulation of the running variable in the regression discontinuity design: A density test. Journal of Econometrics, 142(2), 698–714.

    Article  Google Scholar 

  • Moss, B. G., & Yeaton, W. H. (2006). Shaping policies related to developmental education: An evaluation using the regression-discontinuity design. Educational Evaluation and Policy Analysis, 28(3), 215–229.

    Article  Google Scholar 

  • Oreopoulos, P. (2006). Estimating average and local average treatment effects of education when compulsory schooling laws really matter. The American Economic Review, 96(1), 152–175.

    Article  Google Scholar 

  • Papay, J. P., Willett, J. B., & Murnane, R. J. (2011). Extending the regression-discontinuity approach to multiple assignment variables. Journal of Econometrics, 161(2), 203–207.

    Article  Google Scholar 

  • Reardon, S. F., Arshan, N., Atteberry, A., & Kurlaender, M. (2010). Effects of failing a high school exit exam on course taking, achievement, persistence, and graduation. Educational Evaluation and Policy Analysis, 32(4), 498–520.

    Article  Google Scholar 

  • Shadish, W. R., Cook, T. D., & Campbell, D. T. (2001). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton-Mifflin.

    Google Scholar 

  • Trochim, W. M. K. (1984). Research design for program evaluation: The regressiondiscontinuity approach. Beverly Hills: Sage Publications.

    Google Scholar 

  • Thistlethwaite, D. L., & Campbell, D. T. (1960). Regression-discontinuity analysis: An alternative to the ex post facto experiment. The Journal of Educational Psychology, 51(6), 309–317.

    Article  Google Scholar 

  • Van der Klaauw, W. (2002). Estimating the Effect of Financial Aid Offers on College Enrollment: A regression-discontinuity approach. International Economic Review, 43(4), 1249–1287.

    Article  Google Scholar 

  • U.S. Department of Education. (2008). What works clearinghouse, procedures and standards handbook version 2. Washington: U.S. Department of Education.

    Google Scholar 

Download references

Acknowledgment

The authors would like to thank Associate Editor Stephen L. DesJardins for assistance during the preparation of this chapter.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brian P. McCall .

Editor information

Editors and Affiliations

Appendix: Local Polynomial Regression Estimates and Optimal Bandwidth Determination

Appendix: Local Polynomial Regression Estimates and Optimal Bandwidth Determination

In this appendix, we will describe in more detail how the nonparametric estimates are computed. Recall that the treatment effect \( \tau\) for a RD design equals

$$ \tau =\frac{{{\lim }_{d\to 0}}E(Y|c+d>X>c)-{{\lim }_{d\to 0}}E(Y|c>X>c-d)}{{{\lim }_{d\to 0}}\Pr (T=1|c+d>X>c)-{{\lim }_{d\to 0}}\Pr (T=0|c>X>c-d)} $$
(5)

Now with the GMS program, no individuals below the cut-off score received a scholarship so \({{\lim }_{d\to 0}}\Pr (T=0|c>X>c-d)=0\) and

$$ \tau =\frac{{{\lim }_{d\to 0}}E(Y|c+d>X>c)-{{\lim }_{d\to 0}}E(Y|c>X>c-d)}{{{\lim }_{d\to 0}}\Pr (T=1|c+d>X>c)}. $$
(6)

To derive a consistent estimator of \( \tau\) where consistency means that as the sample size gets large our estimate \( \hat{\tau } \) approaches the true population value \( \tau\) with probability one, we will need to consistently estimate the three terms in Eq. 6, \(E(Y|c+d>X>c)\), \(E(Y|c>X>c-d)\), and \(\Pr (T=1|c+d>X>c)\), for some “small” value of d (i.e., using data close to the cut-off score). In order to do this, we will apply a technique called local polynomial regression (Fan and Gijbels 1992).

Consider the nonparametric regression model:

$$ Y=m(X)+\varepsilon . $$
(7)

In Eq. 7, m(X) is any continuous function of X instead of, e.g., a linear function of X: b 0 + b 1 X. So, instead of estimating two unknown population parameters b 0 and b 1, we are trying to estimate some unknown population function m(X). Local polynomial regression estimates the function m(x) by obtaining estimates of its value for a large number of values of x. For a particular value of x and x 0, the value m(x 0) is estimated by running a weighted polynomial regression using the sample where points in the sample that are closer to x 0 receive larger weights in the regression. More formally, suppose that we wish to estimate the value m(x 0) using a local polynomial regression of order p. Define the data matrix X by:

$$ X=\left[\begin{matrix} 1 & {{X}_{1}}-{{x}_{0}} & \cdots& {{({{X}_{1}}-{{x}_{0}})}^{p}}\\ \vdots& \vdots& \vdots& \vdots \\ 1 & {{X}_{n}}-{{x}_{0}} & \cdots& {{({{X}_{n}}-{{x}_{0}})}^{p}}\end{matrix} \right] $$
(8)

where each value of X in the sample is measured relative to its distance from x 0. Let y be the vector of values of the dependent variable:

$$ y=\left( \begin{matrix} {{Y}_{1}}\\ {{Y}_{2}}\\ \vdots \\ {{Y}_{n}}\end{matrix} \right). $$
(9)

In local polynomial regression, we will want to run a weighted regression where those points closer to x 0 receive larger weights in the regression. To this end, define the diagonal weighting matrix W by:

$$ \mathbf{W}=\text{diag}\left\{ {{K}_{h}}({{X}_{i}}-{{x}_{0}}) \right\}{.} $$
(10)

In Eq. 10, K h is a kernel density weighting function with bandwidth h and is defined by \({{K}_{h}}(\cdot )={[K({\cdot }/{h}\;)]}/{h}\;\) for some kernel density function K and some bandwidth h. These kernel density functions reach the maximum height at 0 and decline monotonically as you move away from 0. While there are several possible choices K, we use the Gaussian kernel function which is defined by \(K(u)={1}/{\sqrt{2\pi }}\;\exp ({-}({1}/{2}\;){{u}^{2}} )\). Once a bandwidth is determined then we estimate the local polynomial coefficients at x 0,

$$ \hat{\beta }=\left( \begin{matrix} {{{\hat{\beta }}}_{0}}\\ {{{\hat{\beta }}}_{1}}\\ \vdots \\ {{{\hat{\beta }}}_{p}}\\\end{matrix} \right) $$

by minimizing the weighted sum of squared errors \( \textbf{y}-\mathbf{{X}'}\beta \):

$$ \underset{{\hat{\beta }}}{\mathop{\min }}\,(\mathbf{y}-X\beta )'W(\mathbf{y}-X'\beta ). $$
(11)

For theoretical reasons (Fan and Gijbels 1996), it is preferable to estimate odd-ordered polynomial models. In our estimates of the treatment effect \( \tau\) using the GMS data, we will simply estimate a local linear regression (p = 1). However, we need to estimate separate local linear regressions for \(E(Y|c+d>X>c)\), \(E(Y|c>X>c-d)\), and \(\Pr (T=1|c+d>X>c)\). Here, we only use the data from the right of the cut-off value when estimating \(E(Y|c+d>X>c)\) and \(\Pr (T=1|c+d>X>c)\) and only use data to the left of the cut-off value when estimating \(E(Y|c>X>c-d)\). Letting x be the closet point on our grid of x values to the left of c (which in our GMS application is c - 0.1) and x + be the point closest on our grid of x values to the right of c (c + 0.1), the estimated value of \( \tau\) equals

$$ \hat{\tau }=\frac{\hat{E}(Y|X={{x}_{+}})-\hat{E}(Y|X={{x}_{-}})}{\hat{\text{P}}\text{r}(T=1|X={{x}_{+}})}. $$
(12)

To implement this technique it is necessary to choose a bandwidth. Recall that the bandwidth determines the amount of smoothing. The larger the bandwidth, the bigger the potential for bias while a smaller bandwidth leads to an estimate with higher variance. We would like to choose a bandwidth which balances the potential of bias with variance. So, we choose the bandwidth that minimizes the mean square error which equals the sum bias squared plus variance.

Let \({{s}_{r,0}}=\int_{0}^{\infty }{K(u){{u}^{r}}du.}\)

It can be shown that for a local linear regression model the optimal bandwidth, to the left of the cut-off point equals, (see Fan and Gijbels 1992, 1996) equals

$$ {{h}_{opt}}({{x}_{0}})=C(K){{\left[ \frac{{{\sigma }^{2}}({{x}_{0}})}{{{\left\{ {m}''({{x}_{0}}) \right\}}^{2}}f({{x}_{0}})n} \right]}^{{}^{1}\!\!\diagup\!\!{}_{5}\;}} $$
(13)

where

$$ C(K)={{\left[ \frac{\int_{0}^{\infty }{\left[ {{s}^{2}}_{2,0}-t{{s}_{1,0}} \right]{{K}^{2}}(t)dt}}{{{\left\{ {{s}^{2}}_{2,0}-{{s}_{1,0}}{{s}_{3,0}} \right\}}^{2}}} \right]}^{{}^{1}\!\!\diagup\!\!{}_{5}\;}}, $$

\({{\sigma }^{2}}({{x}_{0}})\) is the variance of \( \varepsilon\) at x 0, \({m}''({{x}_{0}})\) is the second derivative of m at x 0, and \(f({{x}_{0}})\) is the density of x at x 0.

The idea here is that the larger the variance of the error at x 0, all else equal, the larger the optimal bandwidth is in order to minimize the mean square error since a larger bandwidth will smooth the data more and reduce the variance of the estimate. On the other hand, the larger \({m}''({{x}_{0}})\) the smaller the bandwidth, all else equal, because the slope of the function \(m({{x}_{0}})\) changes more quickly and so a smaller bandwidth reduces the amount of bias.

For the Gaussian kernel density function that we employ in our estimates C(K) = 0.794. Several of these quantities are unknown; e.g., \({m}''({{x}_{0}})\) depends on m, which is what we are trying to estimate in the first place. So, we must employ a two-step method to obtain the optimal bandwidth.

In the first step, we compute what is termed as the “Rule of Thumb” (ROT) bandwidth which we denote by h ROT. To compute h ROT we first obtain a rough estimate of m(x) using a fourth-order (quartic) polynomial and weighting all the data equally. From these estimates we compute

$$ \tilde{m}(x)={{\tilde{\beta }}_{0}}+{{\tilde{\beta }}_{1}}x+\cdots +{{\tilde{\beta }}_{4}}{{x}^{4}} $$

which results in an estimate of \({m}''({{x}_{0}})\):

$$ {\tilde{m}}''(x)=2{{\tilde{\beta }}_{2}}+6{{\tilde{\beta }}_{3}}x+12{{\tilde{\beta }}_{4}}{{x}^{2}}\quad$$

and

$${{\sigma }^{2}}: {{\tilde{\sigma }}^{2}}=\sum\limits_{i=1}^{N}{\frac{{{( {{y}_{i}}-\tilde{m}(x) )}^{2}}}{N-5}}\quad$$

Then,

$$ {{h}_{\text{ROT}}}=0.794\times {{\left[ \frac{{{{\tilde{\sigma }}}^{2}}}{\sum\nolimits_{i=1}^{n}{{{\left\{ {\tilde{m}}''({{x}_{i}})-{{K}_{{{h}_{\text{ROT}}}}}({{x}_{i}}-{{x}_{0}}) \right\}}^{2}}}} \right]}^{{}^{1}\!\!\diagup\!\!{}_{5}\;}} $$

In the second step, we estimate a third-order local polynomial regression using the rule of thumb bandwidth h ROT and the Gaussian kernel density function and obtain a refined estimate of m(x) at x 0

$$ \hat{m}({{x}_{0}})={{\hat{\beta }}_{0}}+{{\hat{\beta }}_{1}}{{x}_{0}}+{{\hat{\beta }}_{2}}{{x}^{2}}_{0}+{{\hat{\beta }}_{3}}{{x}^{3}}_{0}{.} $$

From this we get:

$$ {\hat{m}}{''}({{x}_{0}})=2{{\hat{\beta}}_{2}}+6{{\hat{\beta}}_{3}}{{x}_{0}} $$

and

$$ {{\hat{\sigma }}^{2}}({{x}_{0}})=\frac{\sum\limits_{i=1}^{N}{{{( {{y}_{i}}-\hat{m}({{x}_{i}}) )}^{2}}{{K}_{{{h}_{\text{ROT}}}}}({{x}_{i}}-{{x}_{0}})}}{\text{tr}\left\{ \mathbf{W}-\mathbf{WX}{{\left\{ \mathbf{{X}'WX} \right\}}^{-1}}\mathbf{{X}'W} \right\}}. $$

From Eq. 13 the optimal bandwidth is then

$$ {{\hat{h}}_{\text{opt}}}({{x}_{0}})=0.794{{\left[ \frac{{{{\hat{\sigma }}}^{2}}({{x}_{0}})}{{{\left\{\hat{{m}}''({{x}_{0}}) \right\}}^{2}}\hat{f}({{x}_{0}})n} \right]}^{{}^{1}\!\!\diagup\!\!{}_{5}\;}} $$

where \( \hat{f}({{x}_{0}}) \) is estimated using a Gaussian kernel density estimator:

$$ \hat{f}({{x}_{0}})=\frac{1}{nh}\sum\limits_{i=1}^{n}{K\left( \frac{{{x}_{0}}-{{x}_{i}}}{h} \right)} $$

where h is chosen to minimize the mean squared error.

When computing the optimal bandwidth for \(\hat{E}(y|{{x}_{+}})\) and \( \hat{\text{P}}\text{r}(T=1|{{x}_{+}}) \) we only use the data to the right of the cut-off point c when x 0 = x +. When computing the optimal bandwidth for \(\hat{E}(y|{{x}_{-}})\) we use the data to the left of the cut-off point and x 0 = x -. The Stata programs necessary to compute these estimates can be found at http://www-personal.umich.edu/~bpmccall/Programs/Handbook_Chapter/.

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media B.V.

About this chapter

Cite this chapter

McCall, B.P., Bielby, R.M. (2012). Regression Discontinuity Design: Recent Developments and a Guide to Practice for Researchers in Higher Education. In: Smart, J., Paulsen, M. (eds) Higher Education: Handbook of Theory and Research. Higher Education: Handbook of Theory and Research, vol 27. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-2950-6_5

Download citation

Publish with us

Policies and ethics