New Perspectives on Regression Adjustment in Causal Inference, with Applications to Educational Program Evaluations
UNIVERSITY OF MICHIGAN
Statistics Department
Dissertation Defense
for
Adam Sales
Title: New Perspectives on Regression Adjustment in Causal Inference,
with Applications to Educational Program Evaluation
Chair: Associate Professor Ben Hansen
Cognate Member: Professor Susan Dynarski
Members: Professor Walter Mebane
Professor Kerby Shedden
Date/Time: Thursday, August 15 2013 at 1:00 p.m.
Location: 438 West Hall
Causal inference from observational data—that is, data that did not come from an experiment—is notoriously difficult: because the probability distribution of the treatment variable Z is unknown, measured or unmeasured variables that correlate with both Z and the outcome Y may confound causal estimates. This report will suggest methods for designing and modeling causal observational studies that combine designbased techniques with regression to account for measured covariates X.
RegressionDiscontinuity designs occur when treatment assignment is a function of a variable T: when T exceeds a threshold c, treatment is assigned. Conventionally, researchers analyze RDDs by regressing Y on both T and Z. We argue for modeling RDDs as naturallyrandomized experiments. Doing so involves two steps: modeling the relationship between Y and T, and using that design to infer and estimate effects of Z on Y. We illustrate this approach by reanalyzing a dataset used to estimate the effects of academic probation on students' grade point averages.
The rest of the report focuses on propensityscore stratification with highdimensional data (p>>n). If treatment assignment is a random unknown function of X, researchers can adjust causal estimates for X by estimating propensity scores: the probability of treatment assignment conditional on X. Researchers then stratify subjects based on their propensity scores and model the data as if treatment were randomized within strata. However, when the dimension of X is large, propensityscore estimation is impossible. We propose a method in which a subset of X is used to estimate propensity scores. Next, the entire matrix X can be used to model Y, using a highdimensional regression technique; the model is trained on subjects excluded from the stratification. The model's predictions of Y can then be used to test balance on, and adjust for, the entire set of covariates in X. We illustrate this method by evaluating two highschool educational programs.
Speaker: 
Adam Sales

