I make three points. First, that the article is much more consistent with the literature if one reads “poverty” every time the article uses the word “inequality.” Second, that the fact that income and health are correlated across people or across regions does not tell us that income causes health. And finally, that the research does suggest that decreasing poverty will increase health, but that we should not expect substantial reductions in health care expenditures as a result. I close with some notes on policy implications.

The article conflates two issues which are conceptually distinct and have different policy implications: the effect of poverty on health, and the effect of income inequality on health. The evidence suggests that poverty reduces health, but there is little solid evidence, and much skepticism in the literature, over the idea that inequality *per se* leads to lower health, and even more skepticism over the notion that any effect of inequality on health occurs through “stress.” In a previous, fairly technical, blog post I presented an overview of some of the empirical literature on inequality and health. The very short version is: there is pretty good evidence that an individual’s income causes her health, but this effect is much stronger at low levels of income (as in the figure below). There is no good evidence that, holding an individual’s income constant, increases in income inequality decrease that individual’s health. And finally, a variety of factors, notably education and health in childhood, cause both adult income and adult health.

Pointing out that correlation does not imply causation is tiresome, but here this issue is important as much of the article consists of listing correlations between income and health and interpreting those correlations as if they are evidence of causation from income to health:

Virtually every measure of population health – from child mortality to rates of cancer, cardiovascular disease and traumatic injury – is worse in poor areas than in wealthier ones.

Canadians with an income of $15,000 or less have three times the risk of developing diabetes than those who earn more than $80,000.

Similarly, the risk of dying of cancer within five years of diagnosis is 47 per cent higher in the low-income group than the high-income one.

People living in poor neighbourhoods have a 37 per cent greater risk of suffering a heart attack than those in wealthier areas. But those in middle-income neighbourhoods have a 21 per cent great risk than residents of the richest areas. Most health problems follow a similar gradient.

Consider that, compared with the highest income group, in the lowest income group the rate of stillbirths is 24 per cent higher, the infant mortality rate is 58 per cent higher and cases of sudden infant death syndrome are 83 per cent higher.

Children born to low-income parents are twice as likely to end up in special education classes and three times as likely to suffer mental health problems than those in the highest income group.

Statistics Canada found, for example, that at age 25, life expectancy varies between the highest and lowest income groups by 7.1 years for men and 4.9 years for women.

Some of the best-known research on inequality was done by Sir Michael Marmot, a professor at University College in London. He tracked civil servants in the U.K. and found that mortality rates correlated perfectly with social status and income – in other words, the lowest paid died much younger.

These correlations don’t tell us anything about the effect of income on health, as we would observe such correlations even if there is no causal effect of income on health.

To see why, suppose that income does not cause health for anyone, but also, along with the literature, that education does health. If we were to arrange people in a line from lowest income to highest income, as we move up the line we will tend to find healthier people, but we’d also find that people farther along in the line tend to have more education. We might find all the correlations listed in the article, including correlations between mother’s income and infant’s health, even though we started by assuming that income does not cause health. In the real world, it is likely that there is some effect of adult income on adult health, but we also know that education and a wide variety of other characteristics affect both health and income. That result prevents us from interpreting the finding that higher income and better health are linked as evidence that income causes health.

Some research attempts to statistically control for as many of these influences as possible, but researchers face an additional difficulty: health causes income. Again suppose for the sake of argument that there is no effect of income on health, but also that healthier people tend to fare better in the labor market. If we look across people or across regions, we’ll again find that people with higher incomes tend to be healthier, but we again started with the assumption that income does not cause health.

Some earlier findings purporting to find evidence of effects of income on health are now thought to reflect either “reverse” causation from health to income, or difficult to measure “third variables” which cause both health and income. In addition to education, another particularly important such variable is childhood health, which has been shown to largely explain the correlation between income and adult health in the work by Michael Marmot cited by Mr. Picard (Case and Paxson 2011).

Much of the literature suggests that changes in income have little effect on health, at least for employed adults. Longitudinal studies tracking people over time show that high health predicts which workers will receive promotions, but getting a promotion does not seem to have any effect on a worker’s health, implying that health causes income but not the other way around. Further, comparing two workers with the same initial health, the worker whose income goes up faster is not less likely to die than than his lower-paid counterpart, and there is little evidence that “stress” due to relatively low income is the primary mechanism through which any effect of adult income on health operates (Gardner and Oswald 2004, Adams, Hurd, McFadden, Merrill, and Ribeiro 2003, Jones and Wildman 2008).

Finally, putting aside the complex relationship between income and health, consider the claim that income inequality causes higher health care expenditures,

One study estimates that if those in the bottom 20 per cent of income earned as much as those one step higher on the income ladder, the savings to the health system would be $7.6-billion a year.

The source for this figure appears to be this report from the Ontario Association of Food Banks. The author takes cross-sectional data on health expenditures by income quintile and estimates that these expenditures are $7.6B higher in the lowest quintile than in the second-lowest quintile. But there are two serious problems with the interpretation in the Globe article. First, the article interprets this correlation as entirely causal, and as we have seen, that interpretation is consistent with neither theory nor evidence. Second, making people healthier does not necessarily decrease long-run demand on the health care system—in fact, increasing a person’s health may wind up increasing that person’s demand on the system, as they are more likely to live longer and consume health care resources through old age. This isn’t necessarily the case–possibly policies which decrease poverty would have the net effect of decreasing long-run demand on the health care system–but we’d need much more evidence to draw that conclusion.

The point that better health may have little or a counter-intuitively positive effect on health care spending should not be interpreted as an argument against policies which reduce poverty—we value health in and of itself, not solely or even largely because of its impact on health care spending. But we we ought not offer the enticement of fiscal savings though changes in demand on the publicly-funded health care system as a major reason to enact such anti-poverty measures.

We have seen that the issues surrounding income, income inequality, and health are tricky and the subject of much ongoing research. If we should not conclude that income inequality *per se* is the main culprit nor that policies which affect incomes are likely to have large effects on health care expenditures, what policy implications should we draw from this research?

We know that public policies affecting income can improve the long-run health of children. Simply increasing the incomes of families in poverty can improve children’s health, for example, Hoynes, Miller, and Simon (2012) show that increased generosity of the U.S. EITC (essentially a negative income tax) improved infant health outcomes, which are likely to play out as better outcomes in many dimensions over those infants’ entire lives.

More generally, there is evidence that poverty causes low health. There is evidence that childhood deprivation harms children, and also that harm done to children’s well-being continues to affect those children throughout their lives. And there is evidence that sound public policy interventions can be effective in reducing such problems, including well-targeted income redistribution but also, and perhaps more importantly, improved quality and quantity of education.

]]>One explanation for this puzzle is that Americans who vote are less likely to support legalization than those who do not vote. Voters tend to be older, and possibly have other characteristics which are associated with opposition to drug policy reform.

The Gallup poll result is consistent with, but shows somewhat more support than, recent results from the General Social Survey (GSS), which suggest that roughly half of Americans supported legalization in 2011 and 2012. Unlike the Gallup polls, the GSS data are available to researchers, and include a wide variety of information on respondents, including whether or not they voted in the most recent Presidential election.

I set out to show using all GSS waves between 1975 and 2012 which include the legalization support question (n=26,870) that people who vote are less likely to support legalization than people who do not vote, which would help explain why politicians seem out of sync with public sentiment.

But I found the opposite.

The graph shows support for legalization for all respondents, for respondents who voted in the last Presidential election, and for respondents who did not vote in the last election. Up until about a decade ago, voters tended to be moderately less likely to support legalization than non-voters. But since 2004 voters have been modestly more likely to support legalization than non-voters. Restricting attention to post 2004 sample waves, voters are 3.8 percentage points more likely support than non-voters (z=2.61).

So the puzzle is not resolved by differences in attitudes towards legalization across voters and non-voters.

I also ran a few regressions to check if what we see in the graph goes away with a few basic controls (notably age), and how support varies with demographic characteristics. The table below shows results from regression models in which the dependent variable is a dummy indicating support for legalization. Each cell contains the estimated parameter with the associated t-ratio below. The covariate of interest is “voted last elec.”, a dummy indicating the respondent voted in the preceding Presidential election. (All estimates are marginal effects from probit regressions with robust standard errors.)

The first column shows that, over the entire sample from 1975 through 2012, voting in the last election is associated with about three percentage points lower support for legalization (z=4.59). The second column adds complete sets of age and year dummies, which are enough to flip the sign on the voter dummy. Holding age constant and removing any common trend over time in voting propensities and support for legalization, U.S. voters are slightly more likely to support legalization than non-voters (by 1.4 percentage points, z=2.49). The results on the age effects (unreported) suggest that all else equal older cohorts are less likely to support than younger cohorts, providing some sketchy evidence in favor of the notion that support will increase over time as older cohorts, er, shrug this mortal coil.

The model presented in the last column adds controls for religion, education, political leanings, and region, which does essentially nothing to the estimated association between voting and support for legalization.

The results also show that religious belief and behavior are strong predictors of support: non-religious people are substantially more likely to support legalization than otherwise identical non-religious people (by about 17 percentage points, z=13.71), observant religious people are much less likely to support than religious but non-observant people (by about 14 percentage points, z=14.15).

The omitted education category is high school dropouts. More education strongly predicts higher support for legalization, other things equal.

Finally, holding all else equal, people with moderate political leanings (the omitted category) are more likely to support than conservatives (by about 6 percentage points, z=7.85) and less likely to support than liberals (by about 11 percentage points, z=13.68).

Model 1 | Model 2 | Model 3 | |

voted last elec. | -0.027 | 0.014 | 0.014 |

-4.59 | 2.49 | 2.36 | |

Catholic | 0.016 | ||

2.45 | |||

No religion | 0.168 | ||

13.71 | |||

Other religion | 0.096 | ||

7.19 | |||

Observant | -0.140 | ||

-14.15 | |||

high school | 0.042 | ||

5.33 | |||

junior college | 0.070 | ||

4.84 | |||

bachelor | 0.075 | ||

6.57 | |||

graduate | 0.109 | ||

7.33 | |||

conservative | -0.058 | ||

-7.85 | |||

liberal | 0.114 | ||

13.68 | |||

middle atlantic | -0.035 | ||

-2.85 | |||

e. nor. central | -0.033 | ||

-2.75 | |||

w. nor. central | -0.058 | ||

-4.48 | |||

south atlantic | -0.046 | ||

-3.86 | |||

e. sou. central | -0.046 | ||

-3.32 | |||

w. sou. central | -0.055 | ||

-4.38 | |||

mountain | -0.018 | ||

-1.25 | |||

pacific | 0.017 | ||

1.25 | |||

age dummies | no | yes | yes |

year dummies | no | yes | yes |

n | 26,870 | 26,870 | 26,870 |

There are some good discussions of Hansen’s most influential contribution, the Generalized Method of Moments (GMM) in the economic blogsphere, examples include Guan Yang, John Cochrane, and Jeff Leek. This post presents another, which differs mostly in that the discussion does not focus on applications in asset pricing. The basic ideas in Hansen (1982) are elaborations and generalizations of ideas presented in Sargan (1958), which develops overidentified instrumental variables estimators in a modern context, a method mostly used to infer causal effects from observational data.

Suppose we have a sample on some variable of size and we would like to estimate the mean of denoted . In this simple case, the method of moments tells us to estimate by replacing the population condition

$$ E[ y – \mu ] = 0 $$

with its sample analog,

$$ \frac{1}{n}\sum_i [ y_i – \hat\mu ] = 0, $$

where our estimator is the value of the parameter which makes the equation above true. The method of moments estimator of is simply the sample mean of , denoted .

We draw another observations on a different random variable . Suppose theory tells us that the mean of is the same as the mean of . Following the same reasoning as above, we could estimate using the sample mean of , . But using either of these estimates alone cannot be efficient, as we are wasting the information in the sample we don’t use. Theory tells us that both of these conditions are true

\begin{align}

E(y) – \mu &= 0 \\

E(w) – \mu &=0

\end{align}

but we cannot generally choose to make both of the sample analogs of these conditions true,

\begin{align}

m_1 &= \bar y – \hat\mu = 0 \\

m_2 &= \bar w – \hat\mu =0,

\end{align}

so the method of moments can’t be directly applied. But we can generalize (hence, GMM) and make these two moment conditions and as close to being true as we can in the sense that we can make the squared deviations as small as possible. We could choose to

$$ \textrm{min}_{\mu} (\bar y – \mu)^2 + (\bar w – \mu)^2,$$

minimizing this objective yields consistent (since and are each consistent) but usually inefficient estimates, since we should take into account that and might have different variances and might be correlated. Intuitively, if is much noisier than , we should place less weight on observations on because they contain less information about than observations on . Suppose for simplicity that these are independent samples and thus uncorrelated, but that the variance of is higher than the variance of . We can get rid of these unequal variances by dividing by the standard deviations (here and throughout the post we’ll assume for simplicity that we know all the variance parameters, abstracting from much of the complication of GMM estimation) and choose to

$$ \textrm{min}_{\mu} \label{eq:gmm}

\left ( \frac{\bar y – \mu}{\sigma_{\bar y}}\right)^2

+ \left ( \frac{\bar w – \mu}{\sigma_{\bar w}}\right )^2.$$

Note this is equivalent to weighting each moment condition the reciprocal of its standard deviation, so that we place more weight on the more precise condition. In general the moments will not be uncorrelated, and we should take that into account too.

We can form a test statistic against our theory that the means in these two samples are identical. Suppose we just the observations on , calculate , and then check to see how well that estimate explains the observations on the other variable ,

$$ \sum_i ( w_i – \bar y)^2. $$

If the two samples really have the same mean, then as the sample grows converges to , and this expression is the sum of zero-mean squared normals, and we could base a test statistic on that result. Intuitively, if our theory is false, then the do not tend to zero mean random variables, and when we square them the results tend to be larger than if they did have zero mean.

That’s not the best way to test our theory, however. If our theory is true then the objective tends to the sum of two squared zero-mean variables, but if our theory is incorrect the objective function tends to the sum of squared non-zero mean variables. A test can be based on this idea: if the realized value of the objective function when we minimize it would have to be very far out in the tail of the distribution obtained when the theory is correct (here, the distribution), we have evidence against our theory that and have the same mean. This is a simple example of Hansen’s test, or the J test. Note that if we only observe one of or but not both, we have zero degrees of freedom left over to test the assumptions of the model and we cannot conduct this test.

The reasoning above can be applied to a very wide variety of problems, yielding various GMM estimators depending on which variables have zero mean under some theory. Consider a univariate linear regression model of the form

$$ y = \beta x + u, $$

where we interpret this equation as causal: is the causal effect of a one-unit change in on holding other causes of , , constant (and we assume that all variables have zero mean for simplicity). Suppose the data came from a randomized experiment on and that we have a random sample. Then and are uncorrelated, in other words, the random variables have mean zero,

$$ E [ x_i u_i ] = E[ x_i(y_i – \beta x_i) ] = 0, $$

the sample analog of which is

$$ \label{eq:sum} \frac{1}{n} \sum_i x_i(y_i – \beta x_i)=0.$$

Since we have one parameter and one equation, we can always make this condition true, and the solution is easily seen to be the OLS estimator, even if the errors are heteroskedastic or correlated. We cannot test our theory that and are uncorrelated, since we have one parameter and one equation to solve, so we can always make the sample analog of the moment condition true.

Now suppose that did not come from a perfect randomized experiment, instead we have observational data and no reason to suppose that causes of we do observe () are uncorrelated with causes we don’t observe (). The condition no longer holds and an estimator based on that condition will have undesirable properties. But suppose we observe a variable which has the property that

$$E( z_{1i} u_i ) = E [ z_{1i} (y_i – \beta x_i) ] = 0,$$

that is, that should not covary with if is held fixed. The sample analog of this condition gives us the method of moments estimator of , which turns out to be the simple linear instrumental variables estimator, the ratio of the covariance between and to the covariance between and . We cannot test our theory that only affects because affects because, with one equation and one parameter, we can always find a value of to make the sample analog of this condition true.

Now suppose we have available a second instrument, , which theory tells us also affects only because affects .

The diagram illustrates the model. The variable , colored red to denote that we can’t observe this variable, confounds the relationship between and , implying that covariance between and does not reveal , the causal effect of on .

But we can estimate the covariances between and and between and , inspection of the diagram tells us these should be equal to the and . A one-unit increase in causes an unit increase in , and since a one-unit increase in causes a unit increase in , a one-unit increase in causes a change in . And likewise for : the diagram tells us that covariance between and can only occur if , and we can infer a value for from that covariance divided by the covariance between and .

So we have two distinct causal paths, either of which allows us to estimate the causal effect of on , just like in the introductory example we had two different ways of estimating the sample mean . GMM tells us how to optimally combine these two insights to produce the most precise single estimate of under the theory that and only affect because they affect , just like above GMM told us how to optimally combine two samples which have the same mean under some theory.

Theory tells us that both of these conditions are true

\begin{align}

E(z_{1i}u_i) &= E[ z_{1i}(y_i – \beta x_i) ] = 0 \\

E(z_{2i}u_i) &= E[ z_{2i}(y_i – \beta x_i) ] = 0 \\

\end{align}

but we cannot in general choose the single parameter to make both of the sample analogs of these conditions true. Suppose for simplicity that and are uncorrelated and that the are heteroskedastic but uncorrelated. Then the sample analogs of the theoretical moments above have sample counterparts

\begin{align}

m_1 =& \frac{1}{n} \sum_i [z_{1i}(y_i – \beta x_i)]\\

m_2 =& \frac{1}{n} \sum_i [z_{2i}(y_i – \beta x_i)]\\

\end{align}

and variances , for . Then selecting to

$$

\textrm{min}_{\beta} \left( \frac{m_1}{\sqrt{V(m_1)}}\right)^2 + \left( \frac{m_2}{\sqrt{V(m_2)}}\right)^2

$$

yields the GMM estimator of the causal effect of on . As in the introductory example, the moment conditions are weighted by the reciprocal of their standard deviations, since we want to put more weight on the more precise condition. More generally, we should also take into account that and (and sometimes the ) will generally be correlated.

Just like in the simple case above of estimating a mean from two samples, if our theory is true then the minimized value of the objective function is asymptotically distributed , so estimation by GMM produces a test of the theory as a by-product of estimation (in this context, this test stat is also called the Sargan test). Intuitively, if and are both uncorrelated with , then we could form the simple IV estimator using just , and the residuals from that exercise should be uncorrelated (up to sampling noise) with . If they are not, then we’re not sure what’s wrong—either or could be correlated with —but we conclude that something is wrong with our theory. This is not the most efficient test, though. As in the introductory example, the value of the minimized objective function forms a test statistic against the null that our theory is correct.

In the preceding example, GMM allows us to estimate the causal effect of on using a theory that says that two variables and only affect because they affect . GMM tells us how to make use of our theory to make our estimates as precise as possible, and as a by-product of estimation provides a test statistic against the null hypothesis that our theory is correct (warning: our theory could be also incorrect not because the ‘s affect for some other reason than through , but rather because the causal effect varies across units in the population, see e.g. Heckman, Urzua, and Vytlacil 2006).

GMM can be applied to much more complicated problems to estimate causal effects in a wide variety of nonlinear regression models (e.g., Windmeijer 2006), and to estimate the deep parameters in estimable choice models which can be used to produce out-of-sample predictions which sidestep the Lucas critique, e.g., Hotz and Miller (1993) or Ferrall (2012).

]]>The Chen and Pearl paper has been around for a while in working paper form and recently came out in the Real World Economics Review, also available here from the authors with much clearer typesetting.

The additional textbooks I discuss below are: Amemiya (1985), Kmenta (1986), Davidson and MacKinnon (1993), Gujarati (1999), Hayashi (2000), Wooldridge (2002), Davidson and MacKinnon (2004), Deilman (2005), and Cameron and Trivedi (2005).

**The Issue: Causality in regression models.**

A scientist is attempting to understand the relationship between, say, health and smoking. Let y denote some measure of health and let x denote a measure of smoking intensity, say, number of cigarettes smoked per day. A simple model for health supposes the two outcomes are related by,

.

In short, Chen and Pearl consider these issues: how do econometrics textbooks clearly explain what the parameter means in this model, are they consistent in that interpretation, and generally how well are issues of causality addressed?

That simple-looking equation is much trickier than it appears, as first formally discussed in the econometrics literature by Trygve Haavelmo during the Second World War. For recent discussions, see for example Heckman (2005, 2008), Heckman and Pinto (2013), or blog discussions such as on Pearl’s blog or Andrew Gelman’s blog (note comments from Pearl and from Guido Imbens). First suppose we *define* the random variable u as the difference between y and its conditional expectation:

(1)

then it is easy to show that the error term must be mean-independent of . In econometric jargon, we obtain exogeneity by definition. In this interpretation, the parameter is implicitly defined through,

,

that is, is by definition the gradient of . In the smoking and health example, is by definition how much health changes on average as we consider a person who smokes one more cigarette per day (specifically *without* the caveat, “other things being equal”).

This interpretation of this model is merely “agnostic” or “predictive.” An insurance agency, for example, might be interested in estimating under this interpretation: the answer might help them understand how their payouts will vary if they accept customers who smoke more. But econometricians and other scientists are only rarely interested in such a predictive relationship. Instead, we want to know the causal effect of smoking on health, and the predictive regression generally does not recover that causal effect. Suppose for example we lived in a universe in which a given person’s health is unaffected by their smoking, but also that behaviors and characteristics which lead to low health also tend to lead to more smoking. Then we would tend to estimate negative values for even though by assumption (in whatever universe we’re discussing) smoking does not cause any person’s health.

For this reason econometricians rarely interpret the error term as simply the deviation between the outcome and its conditional expectation. Rather, in a structural interpretation of the equation, takes a causal interpretation and u is interpreted as summarizing all causes of y other than x. It is well-known that any of: (1) “reverse” causation, (2) omitted variables correlated with the regressors, or (3) measurement error in the regressors, lead to correlation between u and x, which in turn means that the parameter is not defined as the derivative of with respect to . We would like to know how a randomly selected person’s health changes if we could intervene and exogenously flip smoking status; the problem is that the correlation between smoking and health calculated from observational data does not generally give us any answer to that question.

**Textbook discussion of the issue. **

The seemingly straightforward issue is not straightforward at all, and exactly what we mean by “causal,” even in the context of simple regression models such as above, is a subject of ongoing multidisciplinary research. Nonetheless, since inferring causal relations from observational data is the defining characteristic of econometric analysis, it seems very reasonable to require that econometrics textbooks should contain lucid discussions of causal relationships and, in so doing, define parameters clearly and unambiguously. Disturbingly, Chen and Pearl find that six popular econometrics textbooks fail, to a greater or lesser extent, to do so.

Chen and Pearl evaluate texts on 10 criteria, which amount to: does the textbook provide as least as much information about causal interpretation as this post does very briefly above, is the text consistent on those interpretations, and does the text provide the equivalent of Pearl’s “do(x)” operator to define causal effects? Other than the “do(x)” criterion, which I don’t think is fair because Pearl’s concept has not caught on the econometrics literature and (even it ought to catch on) should therefore not (yet?) appear in current econometrics textbooks, the criteria seem very fair to me. Pity the poor student who attempts to understand how to interpret a structural econometric model after reading this startling passage in Kennedy, for example:

Using the dictionary meaning of causality, it is impossible to test for causality. Granger developed a special definition of causality which econometricians use in place of the dictionary definition: strictly speaking, econometricians should say “Granger-cause” in place of “cause,” but usually they do not. A variable x is said to Granger-cause y if prediction of the current value of y is enhanced by using past values of x.

This is the only passage in the book in which the word “causality” is used, and the claims in that passage are not correct, in no small part because so-called Granger causality is not a causal concept. Although in my view that passage is by far the worst discussion in the six texts discussed, Chen and Pearl show persuasively that each of the discussed textbooks are at times at least vague in their discussion of causal relations. On the other hand, Chen and Pearl are perhaps somewhat uncharitable in some of their discussion. For example, they make much of this passage from Greene,

[ In the model ] does measure the value of a college education (assuming the rest of the regression model is correctly specified)? The answer is no if the typical individual who chooses to go to college would have relatively high earnings whether or not he or she went to college…

but in context this appears to be a typo: the passage is rescued if “the OLS estimate of” is inserted in front of , and the passage makes no sense if that or an equivalent edit is not made, and Greene in many, many other places clearly differentiates between mere correlations and causal parameters. Chen and Pearl, however, are not satisfied with an answer Greene gave them in a a personal communication as to the meaning of a structural parameter:

In a personal correspondence (2012), Greene wrote, “The precise definition of effect of what on what is subject to interpretation and some ambiguity depending on the setting. I find that model coefficients are usually not the answer I seek, but instead are part of the correct answer. I’m not sure how to answer your query about exactly, precisely carved in stone, what should be.”

I tentatively side with Greene here, although Chen and Pearl do not specify exactly what question Greene was asked. In structural models, the structural parameters are not necessarily causal effects in and of themselves, they are rather assumed to be invariant with respect to some well-specified class of disturbance. For example, the deep parameters characterizing Harold Zurcher’s replacement of bus engines are not themselves causal effects, but given estimates of those parameters, the model can answer meaningful causal questions. Exactly what a structural coefficient means is model-dependent.

**Some results from other textbooks.**

Without going into nearly as much detail as Chen and Pearl, I took a look through some other econometrics textbooks to check to see how they discuss, or do not discuss, causality. Specifically, I looked to see whether the regression parameters are anywhere incorrectly defined as gradients of the conditional expectation of the dependent variable, and I tried to find explicit discussions of causal interpretation of estimated models. The texts surveyed below vary widely in level and vintage, including everything from introductory undergraduate to advanced graduate texts, from 1985 through 2005.

**Amemiya (1985), Advanced Econometrics.**

This textbook is now old, well, ancient, by academic standards, and is relatively technically demanding. Opens, on page 1, by dubiously asserting that the goal of econometrics is to estimate parameters which define the joint distribution of a set of random variables . As far as I can tell, the word “causal” does not appear anywhere, nor are there examples of predictive vs causal interpretation of parameters. Any notions of causality are implicit and framed in purely statistical terms. However, does not incorrectly defines as the gradient of .

**Kmenta (1986), Elements of Econometrics**

Does not incorrectly define as the gradient of .

There is a fairly long, yet confusing discussion of causality at the start of the chapter on simultaneous systems.

Although the concepts of causality and exogeneity are not identical, it is nevertheless possible to conclude that if a variable Y is–in some sense–caused by a variable X, Y cannot be considered exogenous in a system in which X also occurs. A widely discussed definition of causality has been proposed by Granger.

This is the textbook that I learned undergraduate econometrics from. I don’t remember how I thought of causality in econometric models at the time (possibly because I really didn’t like econometrics as an undergraduate). But it’s hard to see how a student could make much headway in understanding causality from that passage. Causality is first introduced “in some sense” deliberately avoiding a definition. An incorrect claim that if one variable causes another they cannot both be treated as exogenous in a system follows: that is simply not true, nothing in regression models precludes causal relationships between exogenous variables (as a trivial example, the square of an exogenous covariate is routinely used to capture nonlinear relationships between variables, which is deterministic and monocausal relationship). And then the notion of Granger-causality is introduced as the only formally defined causal concept in econometrics.

**Davidson and MacKinnon (1993), Estimation and Inference in Econometrics**

The parameters in the linear regression model are defined in Chapter 1 very abstractly as the set of real numbers defining the subspace spanned by the column vectors of the regressors. is never incorrectly defined as the gradient of . Simultaneity and omitted variable bias are discussed in purely statistical, as opposed to causal, terms in Chapter 7.

Discusses causality explicitly in section 18.2, “Exogeneity and causality.” The clearest passage is,

But we have not yet discussed the conditions under which one can validly treat a variable as explanatory. This includes the use of such variables as regressors in least squares estimation and as instruments in instrumental variables or GMM estimation. For conditional inference to be valid, the explanatory variables much be predetermined or exogenous in one or other a variety of senses to be defined below.

which is not very clear at all: the authors intend, I think, the first sentence to mean, “But we have not yet discussed the conditions under which one can treat the coefficient on a variable as reflecting a causal effect.” The matter is then further muddied as later in this subsection the concept of Granger causality is introduced, without clearly differentiating between so-called Granger-causality and causality.

There is an implicit discussion of causality when estimation of supply and demand functions is introduced as an issue to motivate instrumental variable estimation: if we remember from theory that the slopes of these functions are indeed causal effects, then the discussion amounts to asserting that OLS does not recover causal effects in this context.

**Gujarati (1999), Essentials of Econometrics, second edition.**

Does not incorrectly define as the gradient of .

Implicitly defines regression parameters as causal effects (without using the word “causal”) on page 7. On page 8, correctly defines the error term as unobserved causes of the dependent variable, and notes,

Before proceeding further, a warning regarding causation is in order…. Does regression imply causation? Not necessarily. As Kendall and Stuart note, “A statistical relationship, however strong and however suggestive, can never establish causal connection: our ideas of causation must come from outside statistics, ultimately from some theory or other.”

A variant of this warning is repeated on page 124, although somewhat oddly then proceeds to give uses for regression analysis which do not include the estimation of causal effects.

Gives examples of omitted variables bias and simultaneity bias which implicitly define the structural parameters as causal effects, and refers again to these parameters when introducing instrumental variables, a topic not pursued in this introductory-level text.

**Hayashi (2000), Econometrics.**

Defines regression parameters as causal effects (without using the word “causal”) on page 4, but also claims on the same page that an econometric model is a “set of joint distributions satisfying a set of assumptions,” which leaves it unclear whether the author intends regression parameters to reflect causal effects or parameters defining statistical distributions.

Introduces the issue of endogeneity noting that, “The most important assumption made for the OLS [sic] is the orthogonality between the error term and the regressors. Without it, the OLS estimator is not even consistent.” Much like Davidson and MacKinnon (1993), differentiates between causation and mere correlation using estimation of the slopes of supply and demand curves as an example, albeit without using any variant of the word, “cause.”

**Wooldridge (2002), Econometric Analysis of Cross-Section and Panel Data.**

Chen and Pearl discuss “baby” Wooldridge, the undergrad text. Does Papa Wooldridge fare better?

The opening passage of the text, Section 1.1 of the Introduction, begins,

The goal of most empirical studies in economics and other social sciences is to determine whether a change in one variable, say w, causes a change in another variable, say y…. Because economic variables are properly interpreted as random variables, we should use ideas from probability to formalize the sense in which change in w causes a change in y. The notion of ceteris paribus… is the crux of establishing a causal relationship. Simply finding that two variables are correlated is rarely enough….”

Goes on to define regression parameters as partial derivatives of conditional expectations, although not of but of (in our notation) of .

Includes the first, to the best of my knowledge, lengthy discussion of the counterfactuals/treatment effects literature (Chapter 18), and links the preceding discussion of regression models to the treatment effects literature.

**Davidson and MacKinnon (2004), Econometric Theory and Methods.
**

We can make a fixed-effects type observation here, as we have the another text from James and Russell, about a decade later than the 1993 text discussed above. How do the 1993 and 2004 books differ? The introductory passage on page 1 introduces regression parameters and implies their definition depends on how the error term is defined, although at this point exactly what means is deliberately left vague, it’s interpretation is “quite arbitrary,” the authors correctly note. After introducing the equivalent of the model , the text states (in our notation),

At this stage we should note that, as long as we say nothing about the unobserved quantity , [the equation] does not tell us anything. In fact, we can allow to be quite arbitrary, since for any given [value] the model… can always be be made to be true by defining suitably.

A similar passage on page 313 notes that, when a regressor is measured with error, OLS estimation gives the desired result if the error term is defined as simply the difference between the observed outcome and its expectation with respect to the observed regressor, but “in most cases” in econometrics that definition does allow us to estimate the parameters we wish to estimate.

More or less the same discussion of supply and demand as in the 1993 text can again be interpreted as an implicit discussion of causality.

**Dielman (2005), Applied Regression Analysis, 4th ed.**

Incorrectly defines as the slope of on page 75, although in the context of a model explicitly described as a “descriptive regression.” Does not immediately clarify, however, when a regression model should be interpreted as merely descriptive.

Discusses “causal” versus “extrapolative” regression models in the narrow context of time series modeling on page 112, but does not make it clear what the intended difference between these concepts is, nor is it clear why this discussion is limited to time series models. Claims that the issue with causal models is, “causal models require the identification of variables that are related to the dependent variable in a causal manner. Then data must be gathered on these explanatory variables to use the model.” This makes it seem that simple correlations can be used to infer causal relations so long as we can observe both the variables. However, also notes on page 118 that “A common mistake made when using regression analysis is to assume that a strong fit (a high ) of a regression of y on x automatically means `x causes y.'” There is then a brief discussion of endogeneity through simultaneity and through omitted variables, which is quite clear, particularly for an introductory text.

**Cameron and Trivedi (2005), Microeconometrics: Methods and Applications. **

A few sentences into the introduction on page 1, notes that,

A key distinction in econometrics is between essentially descriptive models and data summaries at various levels of statistical sophistication and models that go beyond mere associations and attempt to estimate causal parameters. The classic definitions of causality in econometrics derive from the Cowles Commission simultaneous equations model that draw sharp distinctions between exogenous and endogenous variables, and between structural and reduced form parameters. Although reduced form models are useful for some purposes, knowledge of structural or causal parameters is essential for policy analysis.

This focus on causal parameters is maintained throughout. Chapter 2 is titled “Causal and noncausal models,” and provides a quite high-level formal discussion of causality in the context of both classical simultaneous models, and introduces topics in causal modeling which will be covered through the remainder of the book, including the Rubin Causal Model and a variety of methods researchers use to identify causal parameters. Given this emphasis, it is unsurprising that regression parameters are not incorrectly defined as the gradient of . Discusses counterfactual modeling in Chapter 25, “Treatment Evaluation,” at length, linking the methods in this literature to previous discussions of single-equation regression, matching, instrumental variables, and regression discontinuity designs.

**Remarks.**

The additional textbooks briefly surveyed suffer to a greater or lesser extent from weak discussions of causality as the texts surveyed by Chen and Pearl, with the exceptions of Wooldridge (2002) and particularly Cameron and Trivedi (2005), which I think would only fail Chen and Pearl’s criterion that the equivalent of the “do(x)” concept should be included (and arguably, an equivalent is included).

There is something of a puzzle here in that the oral tradition in applied econometrics heavily emphasizes causation, but it would seem that relatively few textbooks explicitly discuss the matter. In journal articles, seminars, and economics classrooms, there is consensus that the goal of econometric analysis is almost always to estimate a model which can answer causal questions. Overcoming the various serious challenges that arise in making such attempts is the core of most papers in applied econometrics, and how successful a paper is in achieving that goal is the target of sharp-eyed readers and referees. What explains the discrepancy between how economists think about causation and what appears in most econometrics textbooks?

First, econometrics textbooks tend to be authored by theoretical econometricians, who tend to be situated much closer to the interface between statistics and econometrics than applied researchers. Since statisticians do not tend to think in terms of causality, perhaps some of that statistical tradition makes its way over to econometrics textbooks.

Second, statistical concepts which *in the context of applied econometrics* refer to causal concepts are nonetheless presented as statistical concepts in econometrics textbooks, but it is understood that the underlying objects of inference are still causal. A “biased estimate of ” is a purely statistical concept, but if a referee or seminar attendee were to use that phrase they almost certainly mean, “the estimate you present is not a good estimate of the causal effect in which we are interested.” Similarly, a remark like, “your data doesn’t credibly identify ” appears to be a claim about a purely statistical matter, but the person making that claim almost certainly means, “the causal parameter we would like to estimate is hopelessly confounded, given the data we have and the model you’ve developed.” Further to this point, I note that way back in the old-timey days of the 1990s, I took a sequence of econometrics courses from MacKinnon and Davidson based on their 1993 textbook. Even though this text does not include a good discussion of causality using that term, and it is notably lacking in applied examples, it was always very clear to me (and, I think, my classmates) that we are ultimately interested in estimating models which allow us to make causal inferences, as opposed to merely characterizing the joint distribution of some set of variables.

Third, the language of counterfactuals in which the literature on causation is currently being developed is a relatively recent development. As noted above, Wooldridge (2002) is, to the best of my knowledge, the first econometrics textbook to include an extended discussion written in this language. What amounts to the same concepts were previously, as in the examples in previous point, discussed using language borrowed from statistics. The slightly more recent text by Cameron and Trivedi (2005) is substantially more oriented towards causal modeling than any of the other texts, and also includes lengthy discussion of the recent literature on modeling heterogeneous causal effects. My impression from reading Chen and Pearl and flipping through the texts above is the textbooks tend to be getting better over time in terms of discussing causation, presumably in part because these ideas are permeating the applied econometrics literature. Notably, the oldest textbooks discussed above (Amemiya 1985 and Kmenta 1986) present the vaguest discussions of causal concepts.

The oral tradition in economics is not well-reflected in current, or particularly in outdated, textbooks. Chen and Pearl do those of us who teach or study econometrics a service in highlighting this problem, and hopefully discussion in future textbooks will continue to improve.

]]>`reg y x, robust`

Everyone knows that the usual OLS standard errors are generally “wrong,” that robust standard errors are “usually” bigger than OLS standard errors, and it often “doesn’t matter much” whether one uses robust standard errors. It is whispered that there may be mysterious circumstances in which robust standard errors are smaller than OLS standard errors. Textbook discussions typically present the nasty matrix expressions for the robust covariance matrix estimate, but do not discuss in detail when robust standard errors matter or in what circumstances robust standard errors will be smaller than OLS standard errors. This post attempts a simple explanation of robust standard errors and circumstances in which they will tend to be much bigger or smaller than OLS standard errors.

**Expressions for OLS and robust standard errors.**

Consider the univariate linear model

where is the dependent variable, is a covariate, is the error term, and is the parameter over which we would like to make inferences. I’ve omitted a constant by expressing the model in deviations from sample means, denoted with overbars. Assume is mean independent of and serially uncorrelated, but allow heteroskedasticity, . Let denote the OLS estimate of .

If we erroneously assume the error is homoskedastic, we estimate the variance of with

where . I will refer to the square root of this estimate throughout as the “OLS standard error.” When the errors are heteroskedastic, converges to the mean of , denote that . However, the true sampling variance of can easily be shown to be

Robust standard errors are based on estimates of this expression in which the are replaced with squared OLS residuals, or sometimes slightly more complicated expressions designed to perform better in small samples, see for example Imbens and Kolsar (2012).

**When do robust standard errors differ from OLS standard errors?**

Compare the expressions above to see that OLS and robust standard errors are (asymptotically) identical in the special case in which and are uncorrelated, in which case

If, on the other hand, and are positively correlated, then OLS standard errors are too small and robust standard errors will tend to be larger than OLS standard errors. And if and are negatively correlated, then OLS standard errors are too big and robust standard errors will tend to be smaller than OLS standard errors. These cases are illustrated in the graphs: in the left panel, the variance of the error terms increases with the distance between and its mean , whereas in the right panel observations are most dispersed around the regression line when is at its mean.

The graphs have been constructed such that the unconditional variance of the errors terms and the variance of are the same in each graph. But by inspection we can guess that our estimate of the slope is much less precise if the data look like the left panel than the right panel: perform a thought experiment to see that lots of regression lines fit the data in the left panel quite well, but the data in the right panel do a better job pinning down the slope. There is more information about the relationship between and in the data in the right panel even though the variance of and the unconditional variance of the error term are identical.

We see that heteroskedasticity doesn’t matter* per se*, what matters is the relationship between the variance of the error term and the covariates—if the errors are heteroskedastic but uncorrelated with , we can safely ignore the heteroskedasticity. To see why this is so, recall that in the homoskedastic case the variance of is inversely proportional to . If we add one more observation for which happens to equal , the variance of our estimate doesn’t change—there is no information in that observation about the relationship between and . As the draw of moves farther from its mean, the variance of falls more and more, because such draws, in the homoskedastic case, are more and more informative.

Now consider the case in which the variance of increases with , as in the left panel of the graph above. When we get one more observation, the amount of information it contains increases with for the same reasons as the homoskedastic case, but this effect is blunted by the higher variance of . The amount of information contained in a draw in which is far from its mean is lower than the OLS variance estimate “thinks” there is, so to speak, because the OLS variance estimate ignores the fact that such draws are more highly dispersed around the regression line. The OLS standard errors in this case are too small.

If on the other hand the variance of decreases with , then observations of far from its mean both contain more information for the usual reason in the homoskedastic case *and* are less dispersed around the regression line, as in the right panel of the graph above. These observations are even more highly informative than the OLS variance estimate “thinks” they are, and the OLS standard errors will tend to be too* large*. In this case, robust standard errors will tend to be *smaller* than OLS standard errors.

**Summarizing.
**

The upshot is this: if you have heteroskedasticity but the variance of your errors is independent of the covariates, you can safely ignore it, but if you calculate robust standard errors anyways they will be very similar to OLS standard errors. However, if the variance of your error terms tends to be higher when is far from its mean, OLS standard errors will tend to be biased down, and robust standard errors will tend to be larger than OLS standard errors. In the opposite case in which the variance of the error terms tends to be lower when is far from its mean, OLS standard errors will tend to be too large, and robust standard errors will tend to be smaller than OLS standard errors. With real data it’s commonly but not always going to be the case that the variance of the error will be higher when is far from its mean, explaining the result that robust standard errors are typically larger than OLS standard errors in economic applications.

]]>

Following work such as Wilkinson and Pickett’s The Spirit Level, the notion that income inequality causes low health has become popular. For example, Paul Krugman recently noted in a blog post titled “Inequality Kills,”

We have lots of evidence that low socioeconomic status leads to higher mortality — even if you correct for things like availability of health insurance. Some of the effects may come through self-destructive behavior, some through simple increased stress; think about what it feels like in 21st-century America to be a worker without even a high school degree. In any case […] what we’re looking at is a clear demonstration of the fact that high inequality isn’t just unfair, it kills.

Income inequality and poor population health are correlated across counties, lending support to the idea that inequality does indeed kill. For example, the graph to the right, from *The Spirit Level*, shows a scatterplot of Gini coefficients against an index of health and social problems: more inequality is correlated with more problems. But such graphs, as we will see, are hard to interpret, and we cannot conclude from the type of correlation it displays that inequality *per se* causes poor health.

Consider the ambiguity in the Krugman’s argument above: is it *inequality*, as in the title, that leads to poor health, or is it *low socioeconomic status*, as in the body? These are clearly related mechanisms, but they are different mechanisms.

Suppose societies A and B have identical income distributions up to the 90th percentile, but A’s distribution in the top decile is more “stretched out,” that is, the relatively rich are richer still in society A. If low personal income causes low health, all else equal the bottom 90% of people in A and B will have the same health. If health is socially determined in the sense that relative deprivation matters in addition to absolute deprivation, then the bottom 90% in society A will experience worse health than in B because in society A the bottom 90% are relatively worse off compared to B. And if more income dispersion causes lower health for everyone, then the richest 10% in society A may *also* experience lower health than in B. For both policy and scientific reasons, it’s important that we discover whether a person’s health is determined by his income alone, or by both his income and the incomes of the other people in his society.

The literature formalizes these issues as three paths from the distribution of income to a person’s health. First, a person’s income may cause that person’s health (the absolute income hypothesis). Health is only socially determined through this mechanism in the sense that every person’s income is socially determined, there is no further social effect holding individual income constant.

Second, a person’s income relative to other people in her reference group may cause her health (the relative income hypothesis). Finally, the dispersion of income in the society in which the person lives may cause her health, holding her income constant (the income inequality hypothesis). These mechanisms can be expressed:

- Absolute income hypothesis:
- Relative income hypothesis:
- Income inequality hypothesis:

where indexes people, is a measure of health, is income, , , and are unknown functions, is the income of a reference person (such as the median or mean person’s income), and is the variance or other measure of dispersion of across people. All three mechanisms may occur at the same time, they are not exclusive.

The relative income and the income inequality hypothesis are less plausible on their face than the absolute income hypothesis: it is easy to think of reasons why your income causes your health (even in the presence of “free” health care), but it is harder to think of reasons why my income causes your health, as in the absolute and relative income hypotheses. Angus Deaton skeptically refers to the relative and inequality hypotheses as “action at a distance.”

Perhaps Deaton is overly skeptical, as animal studies and other evidence do lend support to the idea that low social position causes physiological changes which lead to poor health (e.g., the Whitehall studies, see Marmot et al 2001). More inequality may cause people low in the hierarchy to experience negative emotions such as stress and shame, which may directly cause low health and indirectly cause low health through behaviors such as substance abuse. However, we face a number of problems attempting to operationalize this notion, and in theory anything goes even if we accept assume this mechanism exists. Deaton, for example, asks us to consider these variants on the relative income hypothesis:

- Your health depends on your rank in the social hierarchy.
- Your health depends on the difference between your income and the richest person’s income.
- Your health depends on the difference between your income and the poorest person’s income.

These all seem reasonable ways of modeling the notion that the social hierarchy affects health. Now consider the implications of a policy which reduces inequality without changing the ordering of income across people or changing mean income. Under 1, there is no effect at all on health, as we have not changed anyone’s rank in hierarchy. Under 2, average health goes up because the distance between the richest person’s income and a person’s income falls. And under 3, average health goes down as the distance between the poorest person’s income others’ incomes falls.

Another pragmatic problem is determining appropriate reference groups. Do you compare yourself to other people in your town? Your country? Your occupation, or your age, or your ethnicity, or your friends, or some combination of all of these and many other characteristics? In theory, this is easy—models assume there are groups 1 through and each agent is assigned a group . In practice, reference groups are nebulous, and we will generally get different statistical answers depending on how we define reference groups.

Many studies attempt to use aggregate data to get at the effect of inequality on health, yielding results such as displayed as in the scatterplot of health and Gini coefficients above. Discovering that countries with more inequality tend to have lower public health is often interpreted as evidence of social causation of health operating through stress, social cohesion, or other psychological consequences of position in the social hierarchy. However, that conclusion does not follow.

One reason we’ll observe inequality and low health move together even if only the absolute income hypothesis holds is called the “concavity effect.” Suppose that the effect of an extra dollar on health is positive but lower than the effect of the previous dollar, that is, that is concave, as in the graph to the right. Then, holding mean income constant, increasing the dispersion of income in a society mechanically decreases average health. Intuitively, if we take a dollar from a rich person and give it to a poor person, average health goes up if an additional dollar increases a poor person’s health more than a rich person’s health. The concavity effect implies that studies of aggregate data cannot help us disentangle the absolute, relative, and inequality hypotheses.

The concavity effect is sometimes referred to as a statistical artifact because it generates correlation between population health and income inequality that only operates through the absolute income effect. However, it is important to note that this is the effect we have the most evidence on, the evidence mostly agrees, and the evidence tells us that redistribution, so long as it does not destroy too much average wealth, will increase average health. Put another way, *we do not have to believe that inequality per se causes stress or other mental or physical health issues to conclude that reducing poverty will increase population health.*

With data on individuals we can shed some light on the relationship between income inequality and health, holding personal income fixed. Many papers estimate models similar to, or special cases of, specifications such as,

where is a vector of individual and contextual characteristics for person in country, region, or other reference group , is mean income within reference group, is the variance or other measure of income dispersion in ‘s reference group, is some function of income, and are parameters to be estimated, and is an error term representing other causes of health. Sometimes, is assumed to be linear, which means that curvature in the individual–level relationship may appear as a social effect. Usually, it is a quadratic or step function, and rarely no structure is imposed and the model is estimated using semiparametric methods (as in Jones and Wildman 2008). These papers typically use large, individual level cross-sectional or repeated cross-sectional datasets with countries or regions within countries treated as reference groups; infrequently panels are used or reference groups are defined more narrowly, such as age-region cells.

The evidence from estimating such models provides at best weak support for the relative and inequality hypotheses. As opposed to results from aggregate models which robustly find higher inequality is associated with lower population health *without* controlling for absolute individual income, the signs of the estimated coefficients on inequality measures are very roughly equally negative or positive, and they are commonly statistically and substantively insignificant. These results lead some authors to draw conclusions such as “evidence favouring a negative correlation between income inequality and life expectancy has disappeared” (Mackenbach 2002) and “there seems to be little support for the idea that income inequality is a major, generalizable determinant of population health differences within or between rich countries” (Lynch et al 2004), whereas “the absolute income hypothesis… is still the most likely to explain the frequently observed strong association between population health and income inequality levels” (Wagstaff and Doorslaer 2000).

I’ll close by noting some of the remaining difficulties with this literature, challenges to be overcome in future research.

As we’ve seen, the literature to date largely attempts to estimate partial associations between health, personal income, and aspects of the distribution of income. Even ignoring the ambiguities and problems discussed above, we cannot interpret the resulting estimates as plausibly reflecting causal effects.

At the individual level it is very likely that health causes income as well as income causing health. The income–health gradient in part reflects the disadvantages unhealthy people face in the labor market: health and income are simultaneously determined. Further, countless personal and contextual effects may cause both health and income, so models such as those estimated in the literature typically suffer from both simultaneity bias and omitted variables bias (for example, many studies fail to even condition on education, which is an important cause of both health and income). I expect to see more efforts to pin down the effect of individual income on individual health, and to tie such efforts to the burgeoning literature examining health over the life cycle, particularly the long-term effects of childhood development (e.g., Cunha and Heckman 2007). There is some evidence that some of the correlation between absolute health and income is attributable to what is here “reverse” causation from health to income (e.g., Boyce and Oswald 2011, Case and Paxson 2011). It’s difficult to see how we can credibly estimate the effect of unequal societies on health without making further progress on the effect of a person’s income on her health.

Omitted variables at the reference group (usually, regional) level are also a problem. In equation (*) above, the only reference group level variables are the mean and dispersion of income, implying that reference-group level causes of health which are correlated with the distribution of income may generate partial correlations between income distribution and health even if income distribution does not cause health. Deaton and Lubotsky (2003), for example, show that controlling for the proportion of black people at the regional level removes the association between inequality and mortality across U.S. cities. Which other demographic, policy, or institutional differences across regions cause both inequality and low health?

A related issue for future research is opening the black box and figuring out exactly how income inequality affects health. For example, Drabo (2010) argues that his results imply that more unequal incomes reduce demand for environmental quality, lower environmental quality causes lower health, and after netting out this mechanism there is no further effect of inequality on health. More unequal incomes may lead to changes in a variety of prices, access to various goods and services, the type and quality of various public programs, and changes in various notions of social capital. Which regional characteristics mediate the effect of income inequality on health? Is there an additional effect of inequality *per se* on health after holding constant personal income and all of the social causes of health which may themselves result from more inequality? At the moment, we simply don’t know.

We have much yet to learn about the effects of the distribution of income on health, and even the simpler issue of determining the effects of individual income on health.

]]>