# A simple exposition of local average treatment effects

This post presents a simple explanation of the concept of “local average treatment effects” in the context of instrumental variables estimation.  I borrow shamelessly from the somewhat more advanced presentation in Imbens and Wooldridge’s lecture notes, which is a good place to look for further reading.

The basic idea underlying LATE is to acknowledge that different people (or different units more generally) generally have different causal effects for any given “treatment,” broadly defined.  It is common to talk about “the” causal effect of, say, education on earnings, or interest rates on growth, or pharmaceuticals on health, but if different people respond differently to education or to medical treatments and different countries respond differently to macroeconomic interventions, it’s not clear what we mean by “the” causal effect.  We can still talk coherently about distributions of causal effects, though, and we may be interested in estimated various averages of those causal effects.  Local average treatment effects (LATEs) are one such average.

For concreteness, let’s suppose the government decides to lend a hand to empirical researchers by implementing the following goofy policy: a randomly selected group of high school kids are randomized to get an offer of either $0 or$5,000 to acquire a college degree.  We wish to use this natural experiment to estimate “the” effect of getting a college degree on, say, wages.  We collect data on all these folks comprised of: a dummy variable $Z_i$ which equals one if person $i$ was offered $5,000 and zero if they were offered zero, a dummy variable $D_i$ which indicates the student actually received a college degree, and wages, $y_i$. Whether someone actually goes to college depends, for some people, on whether they are offered nothing or$5,000.  So let $D_i(Z_i)$ denote person $i's$ college choice as function of $Z_i$.  In this simple case, this divides the population into four mutually exclusive groups:

$D_i(Z_i=0)$ $D_i(Z_i=1)$ type
0 0 never-taker (N)
0 1 complier (C)
1 0 hipster (H)
1 1 always-taker (A)

That is, some people will not go to college regardless of their offer (the never-takers), some will always go to college regardless of their offer (the always-takers), some will go to college if they are offered $5,000 but not if they are offered nothing (the compliers), and some may do the opposite of what they’re “assigned” to do and go to college if offered nothing and not go if they’re offered$5,000 (conventionally the “defiers,” but I prefer the “hipsters,” as one of my students helpfully suggested). Let N, A, C, and H denote membership in these groups, and let $\pi_N$ denote the proportion of the population who are never-takers, and similarly define $\pi_A, \pi_C$, and $\pi_H$.

We can always write the observed outcome $y_i$ as

$$y_i = \beta_0 + \beta_i D_i + u_i$$

where $\beta_0$ is a constant which can be interpreted as the average wage in the population if no one goes to college, $\beta_i$ is the causal effect of college for person $i$, and $u_i$ is a mean-zero variable representing all causes of wages other than college.  How much do we expect the wages of people who were offered nothing to differ from those offered $5,000? The expected wage of the folks offered nothing is $$E[ y_i | Z_i=0 ] = \beta_0 + E [ \beta_i D_i | Z_i=0 ]$$ because $E[u_i | Z_i = 0] = E[u_i] = 0$, as $Z_i$ is randomized and hence independent of $u_i$. We can decompose the second term by remembering that every person is one of the four types defined above. Conditional on being offered nothing, $D_i=0$ for the people in the complier and never-taker groups, so $\beta_i D_i$ is zero for these people. The mean of $\beta_i D_i$ conditional on $Z_i=0$ and on being a hipster is $E(\beta_i | H)$, the average causal effect among the hipster subpopulation, since $D_i=1$ for these folks when they’re offered nothing, and we do not need to condition on $Z_i$ because $Z_i$ is randomized and hence independent of $\beta_i$. Following the same reasoning for the always-takers and substituting into the equation above, we find, $$E[ y_i | Z_i = 0 ] = \beta_0 + \pi_H [ E(\beta_i | H) ] + \pi_A [ E(\beta_i | A) ],$$ which says that mean wage among people offered nothing depends on the mean wage of all people absent college ($\beta_0$), the average change in wages induced by college among people who were offered nothing but went to college anyways $E(\beta_i|H)$, the average change in wages induced by college among the folks who go to college whether offered nothing or$5,000, $E(\beta_i|A)$, and the proportions of the population in these groups. Similarly, the average wage of people offered $5,000 is $$E[ y_i | Z_i = 1 ] = \beta_0 + \pi_C [ E(\beta_i | C) ] + \pi_A [ E(\beta_i | A) ],$$ and the difference across the two groups is then $$E[ y_i | Z_i = 1 ]-E[ y_i | Z_i = 0 ] = \pi_C [ E(\beta_i | C) ] – \pi_H [ E(\beta_i | H) ],$$ which depends only on average causal effects among the compliers and the hipsters and the proportions of people in these two groups. The never and always-takers don’t change their behavior in response to the offer in their letters, so their outcomes don’t affect the change in average wages across those offered nothing or$5,000.

Now consider the difference in the proportion attending college across the people offered nothing or $5,000. The probability that someone offered nothing goes to college is $\pi_A + \pi_H$, the proportion who always go plus the proportion who go only if they’re offered nothing. Similarly, the probability that someone offered$5,000 goes is $\pi_C + \pi_A$, the proportion who always go plus the proportion who go only if they’re offered $5,000. The difference in proportions across the two groups is then $\pi_C - \pi_H$. The ratio of changes in wages to changes in participation across those offered nothing or$5,000 is

$$\beta_{IV} = \frac{ E[ y_i | Z_i = 1 ] – E[ y_i | Z_i = 0 ] }{ Pr(D_i=1|Z_i =1 ) – Pr(D_i=1|Z_i=0)}$$

which, when we plug in sample averages to estimate population moments, is called the Wald estimator, and is what we get by regressing $y$ on a constant and $X$ using $Z$ as an instrument. Substituting our calculations above, we find

$$\beta_{IV} = \frac{\pi_C E(\beta_i | C) – \pi_H E(\beta_i | H)}{\pi_C – \pi_H}.$$

In this case, the IV estimator doesn’t converge to any economically meaningful quantity. Suppose for example that every person in the population experiences a positive causal effect of treatment ($\beta_i > 0 \> \forall i$).  Then we would hope that our estimator at least converges to something positive, but we could instead get a negative estimate, even asymptotically, if either there are more compliers than hipsters in the population ($\pi_C - \pi_H>0$) but hipsters have big enough average causal effects such that the numerator is negative, or if the numerator is positive but there are more hipsters than compliers in the population such that the denominator is negative. Notice this is so despite the fact that by assumption the instrument is uncorrelated with the error term $u$ and affects the treatment $D$, that is, conventional textbook assumptions are satisfied.

But, if there are no hipsters ($\pi_H=0$), then the estimator converges to

$$\beta_{IV} = \frac{ \pi_C E( \beta_i | C ) }{ \pi_C } = E(\beta_i | C),$$

the average causal effect of treatment among compliers. This is the “local average treatment effect” for this particular instrument for this population: it’s not the average effect among the entire population and it’s not the average effect of college among those who actually go to college, it’s an average “local” to people whose behavior changes when the instrument changes, here, among people who only go to college if they’re offered five grand to do so.  People whose college decisions aren’t affected by whether they are offered nothing or $5,000 don’t appear at all in our estimate, because this experiment reveals no information about the causal effect of college for those people, and we need to assume away the presence of irritating hipsters whose presence who render our estimate essentially meaningless. A different experiment that also satisfies all the textbook assumptions for instrumental variables will generally recover a different LATE. Suppose the government had randomly offered nothing or$10,000 rather than $5,000 as an inducement to go to college. Even assuming we are not plagued by the presence of hipsters, the set of people who are compliers now differs from that in the original experiment, because someone who will only go to college for any offer between$5,000 and \$10,000 are now compliers but were formerly never-takers.  Since those folks will generally have different effects of going to college than the former set of compliers, we get, even asymptotically, different estimates of the “the” effect of going to college on wages, even though both instruments are by assumption valid.