

There’s a literature on fitting these with local linear fits (of which loess is one approach, not necessarily the best), but as stated in the P.S., I think the big problem is in not trying to adjust for other pre-treatment variables. This entry was posted in Causal Inference, Economics, Public Health, Zombies by Andrew. Maybe the general point here is that it takes a lot for this sort of statistical analysis to be convincing, especially when no pattern is apparent in the raw data. I don’t find those analyses convincing either, but that’s another story. I still think it’s nuts for them to use this quadratic model, and even more nuts to include different curves on the two sides of the boundary-this just seems like noise mining to me-but they did also do analyses that adjusted for other pre-treatment variables. Commenter Sam looked at the above-linked article more carefully and reports that their main analysis adjusts for other pre-treatment predictors and also includes alternative specifications. We also discuss the topic in a less critical and more constructive, how-to-do-it perspective in section 21.3 of Regression and Other Stories. If you want more of my thinking on the topic, you can google *statmodeling “regression discontinuity”* to see some posts, for example here and here, and you can look at my articles with Imbens and Zelizer. A student asked for more detail regarding my concerns with certain regression discontinuity analyses. But it’s not the only factor, and it’s a weird circumstance of the way that certain statistical methods have been developed and taught that researchers so often seem to act as if it is. Yes, it can be an important factor because of the lack of overlap between exposed and control groups in that variable. Once you frame the problem as an observational study, it should be clear that the running variable is just one of many potential adjustment factors. It’s poor statistical practice to take the existence of a discontinuity and use this to not adjust for other pre-treatment predictors. Actually, these are observational studies, and there can be all sorts of differences between exposed and unexposed cases. I’d say that the original sin of the “regression discontinuity” framing is the idea that there’s some sort of purity of the natural experiment so that the analysis should be performed only conditioning on the running variable. Just to explain some more: Sometimes people frame this as a problem of trying to figure out the correct specification for the running variable: should it be linear, or polynomial, or locally linear, whatever? But I don’t think this is the right way to think of things. People are taught this is what to do, it spits out publishable results, everybody’s happy. Nothing special here, just another day at the sausage factory. They also make the classic error of comparing the statistical significance of different comparisons. He reports that the graphs come from a study from the University of Pennsylvania trying to estimate the effect of a care management intervention for high risk patients. Ethan Steinberg sends in these delightful regression discontinuity graphs:
