Here, then, are some symptoms of bad critiques of economics:

- Treats macroeconomic forecasting as the major or only goal of economic analysis.
- Frames critique in terms of politics, most commonly the claim that economists are market fundamentalists.
- Uses “neoclassical” as if it refers to a political philosophy, set of policy prescriptions, or actual economies. Bonus: spells it “neo-classical” or “Neo-classical.”
- Refers to “the” neoclassical model or otherwise suggests all of economic thought is contained in Walras (1874).
- Uses “neoclassical economics” and “mainstream economics” interchangeably. Bonus: uses “neoliberal economics” interchangeably with either.
- Uses the word “neoliberal” for any reason.
- Refers to “corporate masters” or otherwise implies economists are shills for the wealthy or corporations.
- Claims economists think people are always rational.
- Claims financial crisis disproved mainstream economics.
- Explicitly claims that economics is not empirical, or does so implicitly by ignoring empirical economics.
- Treats all of economics as if it’s battling schools of macroeconomics.
- Misconstrues jargon: “rational.”
- Misconstrues jargon: “efficient” (financial sense) or “efficient” (Pareto sense).
- Misconstrues jargon: “externality“.
- Claims economists only care about money.
- Claims economists ignore the environment. Variant: claims economics falters on point that “infinite growth on a finite planet is impossible.”
- Goes out of its way to point out that the Economics Nobel is not a real Nobel.
- Cites Debunking Economics.

The Chen and Pearl paper has been around for a while in working paper form and recently came out in the Real World Economics Review, also available here from the authors with much clearer typesetting.

The additional textbooks I discuss below are: Amemiya (1985), Kmenta (1986), Davidson and MacKinnon (1993), Gujarati (1999), Hayashi (2000), Wooldridge (2002), Davidson and MacKinnon (2004), Deilman (2005), and Cameron and Trivedi (2005).

**The Issue: Causality in regression models.**

A scientist is attempting to understand the relationship between, say, health and smoking. Let y denote some measure of health and let x denote a measure of smoking intensity, say, number of cigarettes smoked per day. A simple model for health supposes the two outcomes are related by,

.

In short, Chen and Pearl consider these issues: how do econometrics textbooks clearly explain what the parameter means in this model, are they consistent in that interpretation, and generally how well are issues of causality addressed?

That simple-looking equation is much trickier than it appears, as first formally discussed in the econometrics literature by Trygve Haavelmo during the Second World War. For recent discussions, see for example Heckman (2005, 2008), Heckman and Pinto (2013), or blog discussions such as on Pearl’s blog or Andrew Gelman’s blog (note comments from Pearl and from Guido Imbens). First suppose we *define* the random variable u as the difference between y and its conditional expectation:

(1)

then it is easy to show that the error term must be mean-independent of . In econometric jargon, we obtain exogeneity by definition. In this interpretation, the parameter is implicitly defined through,

,

that is, is by definition the gradient of . In the smoking and health example, is by definition how much health changes on average as we consider a person who smokes one more cigarette per day (specifically *without* the caveat, “other things being equal”).

This interpretation of this model is merely “agnostic” or “predictive.” An insurance agency, for example, might be interested in estimating under this interpretation: the answer might help them understand how their payouts will vary if they accept customers who smoke more. But econometricians and other scientists are only rarely interested in such a predictive relationship. Instead, we want to know the causal effect of smoking on health, and the predictive regression generally does not recover that causal effect. Suppose for example we lived in a universe in which a given person’s health is unaffected by their smoking, but also that behaviors and characteristics which lead to low health also tend to lead to more smoking. Then we would tend to estimate negative values for even though by assumption (in whatever universe we’re discussing) smoking does not cause any person’s health.

For this reason econometricians rarely interpret the error term as simply the deviation between the outcome and its conditional expectation. Rather, in a structural interpretation of the equation, takes a causal interpretation and u is interpreted as summarizing all causes of y other than x. It is well-known that any of: (1) “reverse” causation, (2) omitted variables correlated with the regressors, or (3) measurement error in the regressors, lead to correlation between u and x, which in turn means that the parameter is not defined as the derivative of with respect to . We would like to know how a randomly selected person’s health changes if we could intervene and exogenously flip smoking status; the problem is that the correlation between smoking and health calculated from observational data does not generally give us any answer to that question.

**Textbook discussion of the issue. **

The seemingly straightforward issue is not straightforward at all, and exactly what we mean by “causal,” even in the context of simple regression models such as above, is a subject of ongoing multidisciplinary research. Nonetheless, since inferring causal relations from observational data is the defining characteristic of econometric analysis, it seems very reasonable to require that econometrics textbooks should contain lucid discussions of causal relationships and, in so doing, define parameters clearly and unambiguously. Disturbingly, Chen and Pearl find that six popular econometrics textbooks fail, to a greater or lesser extent, to do so.

Chen and Pearl evaluate texts on 10 criteria, which amount to: does the textbook provide as least as much information about causal interpretation as this post does very briefly above, is the text consistent on those interpretations, and does the text provide the equivalent of Pearl’s “do(x)” operator to define causal effects? Other than the “do(x)” criterion, which I don’t think is fair because Pearl’s concept has not caught on the econometrics literature and (even it ought to catch on) should therefore not (yet?) appear in current econometrics textbooks, the criteria seem very fair to me. Pity the poor student who attempts to understand how to interpret a structural econometric model after reading this startling passage in Kennedy, for example:

Using the dictionary meaning of causality, it is impossible to test for causality. Granger developed a special definition of causality which econometricians use in place of the dictionary definition: strictly speaking, econometricians should say “Granger-cause” in place of “cause,” but usually they do not. A variable x is said to Granger-cause y if prediction of the current value of y is enhanced by using past values of x.

This is the only passage in the book in which the word “causality” is used, and the claims in that passage are not correct, in no small part because so-called Granger causality is not a causal concept. Although in my view that passage is by far the worst discussion in the six texts discussed, Chen and Pearl show persuasively that each of the discussed textbooks are at times at least vague in their discussion of causal relations. On the other hand, Chen and Pearl are perhaps somewhat uncharitable in some of their discussion. For example, they make much of this passage from Greene,

[ In the model ] does measure the value of a college education (assuming the rest of the regression model is correctly specified)? The answer is no if the typical individual who chooses to go to college would have relatively high earnings whether or not he or she went to college…

but in context this appears to be a typo: the passage is rescued if “the OLS estimate of” is inserted in front of , and the passage makes no sense if that or an equivalent edit is not made, and Greene in many, many other places clearly differentiates between mere correlations and causal parameters. Chen and Pearl, however, are not satisfied with an answer Greene gave them in a a personal communication as to the meaning of a structural parameter:

In a personal correspondence (2012), Greene wrote, “The precise definition of effect of what on what is subject to interpretation and some ambiguity depending on the setting. I find that model coefficients are usually not the answer I seek, but instead are part of the correct answer. I’m not sure how to answer your query about exactly, precisely carved in stone, what should be.”

I tentatively side with Greene here, although Chen and Pearl do not specify exactly what question Greene was asked. In structural models, the structural parameters are not necessarily causal effects in and of themselves, they are rather assumed to be invariant with respect to some well-specified class of disturbance. For example, the deep parameters characterizing Harold Zurcher’s replacement of bus engines are not themselves causal effects, but given estimates of those parameters, the model can answer meaningful causal questions. Exactly what a structural coefficient means is model-dependent.

**Some results from other textbooks.**

Without going into nearly as much detail as Chen and Pearl, I took a look through some other econometrics textbooks to check to see how they discuss, or do not discuss, causality. Specifically, I looked to see whether the regression parameters are anywhere incorrectly defined as gradients of the conditional expectation of the dependent variable, and I tried to find explicit discussions of causal interpretation of estimated models. The texts surveyed below vary widely in level and vintage, including everything from introductory undergraduate to advanced graduate texts, from 1985 through 2005.

**Amemiya (1985), Advanced Econometrics.**

This textbook is now old, well, ancient, by academic standards, and is relatively technically demanding. Opens, on page 1, by dubiously asserting that the goal of econometrics is to estimate parameters which define the joint distribution of a set of random variables . As far as I can tell, the word “causal” does not appear anywhere, nor are there examples of predictive vs causal interpretation of parameters. Any notions of causality are implicit and framed in purely statistical terms. However, does not incorrectly defines as the gradient of .

**Kmenta (1986), Elements of Econometrics**

Does not incorrectly define as the gradient of .

There is a fairly long, yet confusing discussion of causality at the start of the chapter on simultaneous systems.

Although the concepts of causality and exogeneity are not identical, it is nevertheless possible to conclude that if a variable Y is–in some sense–caused by a variable X, Y cannot be considered exogenous in a system in which X also occurs. A widely discussed definition of causality has been proposed by Granger.

This is the textbook that I learned undergraduate econometrics from. I don’t remember how I thought of causality in econometric models at the time (possibly because I really didn’t like econometrics as an undergraduate). But it’s hard to see how a student could make much headway in understanding causality from that passage. Causality is first introduced “in some sense” deliberately avoiding a definition. An incorrect claim that if one variable causes another they cannot both be treated as exogenous in a system follows: that is simply not true, nothing in regression models precludes causal relationships between exogenous variables (as a trivial example, the square of an exogenous covariate is routinely used to capture nonlinear relationships between variables, which is deterministic and monocausal relationship). And then the notion of Granger-causality is introduced as the only formally defined causal concept in econometrics.

**Davidson and MacKinnon (1993), Estimation and Inference in Econometrics**

The parameters in the linear regression model are defined in Chapter 1 very abstractly as the set of real numbers defining the subspace spanned by the column vectors of the regressors. is never incorrectly defined as the gradient of . Simultaneity and omitted variable bias are discussed in purely statistical, as opposed to causal, terms in Chapter 7.

Discusses causality explicitly in section 18.2, “Exogeneity and causality.” The clearest passage is,

But we have not yet discussed the conditions under which one can validly treat a variable as explanatory. This includes the use of such variables as regressors in least squares estimation and as instruments in instrumental variables or GMM estimation. For conditional inference to be valid, the explanatory variables much be predetermined or exogenous in one or other a variety of senses to be defined below.

which is not very clear at all: the authors intend, I think, the first sentence to mean, “But we have not yet discussed the conditions under which one can treat the coefficient on a variable as reflecting a causal effect.” The matter is then further muddied as later in this subsection the concept of Granger causality is introduced, without clearly differentiating between so-called Granger-causality and causality.

There is an implicit discussion of causality when estimation of supply and demand functions is introduced as an issue to motivate instrumental variable estimation: if we remember from theory that the slopes of these functions are indeed causal effects, then the discussion amounts to asserting that OLS does not recover causal effects in this context.

**Gujarati (1999), Essentials of Econometrics, second edition.**

Does not incorrectly define as the gradient of .

Implicitly defines regression parameters as causal effects (without using the word “causal”) on page 7. On page 8, correctly defines the error term as unobserved causes of the dependent variable, and notes,

Before proceeding further, a warning regarding causation is in order…. Does regression imply causation? Not necessarily. As Kendall and Stuart note, “A statistical relationship, however strong and however suggestive, can never establish causal connection: our ideas of causation must come from outside statistics, ultimately from some theory or other.”

A variant of this warning is repeated on page 124, although somewhat oddly then proceeds to give uses for regression analysis which do not include the estimation of causal effects.

Gives examples of omitted variables bias and simultaneity bias which implicitly define the structural parameters as causal effects, and refers again to these parameters when introducing instrumental variables, a topic not pursued in this introductory-level text.

**Hayashi (2000), Econometrics.**

Defines regression parameters as causal effects (without using the word “causal”) on page 4, but also claims on the same page that an econometric model is a “set of joint distributions satisfying a set of assumptions,” which leaves it unclear whether the author intends regression parameters to reflect causal effects or parameters defining statistical distributions.

Introduces the issue of endogeneity noting that, “The most important assumption made for the OLS [sic] is the orthogonality between the error term and the regressors. Without it, the OLS estimator is not even consistent.” Much like Davidson and MacKinnon (1993), differentiates between causation and mere correlation using estimation of the slopes of supply and demand curves as an example, albeit without using any variant of the word, “cause.”

**Wooldridge (2002), Econometric Analysis of Cross-Section and Panel Data.**

Chen and Pearl discuss “baby” Wooldridge, the undergrad text. Does Papa Wooldridge fare better?

The opening passage of the text, Section 1.1 of the Introduction, begins,

The goal of most empirical studies in economics and other social sciences is to determine whether a change in one variable, say w, causes a change in another variable, say y…. Because economic variables are properly interpreted as random variables, we should use ideas from probability to formalize the sense in which change in w causes a change in y. The notion of ceteris paribus… is the crux of establishing a causal relationship. Simply finding that two variables are correlated is rarely enough….”

Goes on to define regression parameters as partial derivatives of conditional expectations, although not of but of (in our notation) of .

Includes the first, to the best of my knowledge, lengthy discussion of the counterfactuals/treatment effects literature (Chapter 18), and links the preceding discussion of regression models to the treatment effects literature.

**Davidson and MacKinnon (2004), Econometric Theory and Methods.
**

We can make a fixed-effects type observation here, as we have the another text from James and Russell, about a decade later than the 1993 text discussed above. How do the 1993 and 2004 books differ? The introductory passage on page 1 introduces regression parameters and implies their definition depends on how the error term is defined, although at this point exactly what means is deliberately left vague, it’s interpretation is “quite arbitrary,” the authors correctly note. After introducing the equivalent of the model , the text states (in our notation),

At this stage we should note that, as long as we say nothing about the unobserved quantity , [the equation] does not tell us anything. In fact, we can allow to be quite arbitrary, since for any given [value] the model… can always be be made to be true by defining suitably.

A similar passage on page 313 notes that, when a regressor is measured with error, OLS estimation gives the desired result if the error term is defined as simply the difference between the observed outcome and its expectation with respect to the observed regressor, but “in most cases” in econometrics that definition does allow us to estimate the parameters we wish to estimate.

More or less the same discussion of supply and demand as in the 1993 text can again be interpreted as an implicit discussion of causality.

**Dielman (2005), Applied Regression Analysis, 4th ed.**

Incorrectly defines as the slope of on page 75, although in the context of a model explicitly described as a “descriptive regression.” Does not immediately clarify, however, when a regression model should be interpreted as merely descriptive.

Discusses “causal” versus “extrapolative” regression models in the narrow context of time series modeling on page 112, but does not make it clear what the intended difference between these concepts is, nor is it clear why this discussion is limited to time series models. Claims that the issue with causal models is, “causal models require the identification of variables that are related to the dependent variable in a causal manner. Then data must be gathered on these explanatory variables to use the model.” This makes it seem that simple correlations can be used to infer causal relations so long as we can observe both the variables. However, also notes on page 118 that “A common mistake made when using regression analysis is to assume that a strong fit (a high ) of a regression of y on x automatically means `x causes y.'” There is then a brief discussion of endogeneity through simultaneity and through omitted variables, which is quite clear, particularly for an introductory text.

**Cameron and Trivedi (2005), Microeconometrics: Methods and Applications. **

A few sentences into the introduction on page 1, notes that,

A key distinction in econometrics is between essentially descriptive models and data summaries at various levels of statistical sophistication and models that go beyond mere associations and attempt to estimate causal parameters. The classic definitions of causality in econometrics derive from the Cowles Commission simultaneous equations model that draw sharp distinctions between exogenous and endogenous variables, and between structural and reduced form parameters. Although reduced form models are useful for some purposes, knowledge of structural or causal parameters is essential for policy analysis.

This focus on causal parameters is maintained throughout. Chapter 2 is titled “Causal and noncausal models,” and provides a quite high-level formal discussion of causality in the context of both classical simultaneous models, and introduces topics in causal modeling which will be covered through the remainder of the book, including the Rubin Causal Model and a variety of methods researchers use to identify causal parameters. Given this emphasis, it is unsurprising that regression parameters are not incorrectly defined as the gradient of . Discusses counterfactual modeling in Chapter 25, “Treatment Evaluation,” at length, linking the methods in this literature to previous discussions of single-equation regression, matching, instrumental variables, and regression discontinuity designs.

**Remarks.**

The additional textbooks briefly surveyed suffer to a greater or lesser extent from weak discussions of causality as the texts surveyed by Chen and Pearl, with the exceptions of Wooldridge (2002) and particularly Cameron and Trivedi (2005), which I think would only fail Chen and Pearl’s criterion that the equivalent of the “do(x)” concept should be included (and arguably, an equivalent is included).

There is something of a puzzle here in that the oral tradition in applied econometrics heavily emphasizes causation, but it would seem that relatively few textbooks explicitly discuss the matter. In journal articles, seminars, and economics classrooms, there is consensus that the goal of econometric analysis is almost always to estimate a model which can answer causal questions. Overcoming the various serious challenges that arise in making such attempts is the core of most papers in applied econometrics, and how successful a paper is in achieving that goal is the target of sharp-eyed readers and referees. What explains the discrepancy between how economists think about causation and what appears in most econometrics textbooks?

First, econometrics textbooks tend to be authored by theoretical econometricians, who tend to be situated much closer to the interface between statistics and econometrics than applied researchers. Since statisticians do not tend to think in terms of causality, perhaps some of that statistical tradition makes its way over to econometrics textbooks.

Second, statistical concepts which *in the context of applied econometrics* refer to causal concepts are nonetheless presented as statistical concepts in econometrics textbooks, but it is understood that the underlying objects of inference are still causal. A “biased estimate of ” is a purely statistical concept, but if a referee or seminar attendee were to use that phrase they almost certainly mean, “the estimate you present is not a good estimate of the causal effect in which we are interested.” Similarly, a remark like, “your data doesn’t credibly identify ” appears to be a claim about a purely statistical matter, but the person making that claim almost certainly means, “the causal parameter we would like to estimate is hopelessly confounded, given the data we have and the model you’ve developed.” Further to this point, I note that way back in the old-timey days of the 1990s, I took a sequence of econometrics courses from MacKinnon and Davidson based on their 1993 textbook. Even though this text does not include a good discussion of causality using that term, and it is notably lacking in applied examples, it was always very clear to me (and, I think, my classmates) that we are ultimately interested in estimating models which allow us to make causal inferences, as opposed to merely characterizing the joint distribution of some set of variables.

Third, the language of counterfactuals in which the literature on causation is currently being developed is a relatively recent development. As noted above, Wooldridge (2002) is, to the best of my knowledge, the first econometrics textbook to include an extended discussion written in this language. What amounts to the same concepts were previously, as in the examples in previous point, discussed using language borrowed from statistics. The slightly more recent text by Cameron and Trivedi (2005) is substantially more oriented towards causal modeling than any of the other texts, and also includes lengthy discussion of the recent literature on modeling heterogeneous causal effects. My impression from reading Chen and Pearl and flipping through the texts above is the textbooks tend to be getting better over time in terms of discussing causation, presumably in part because these ideas are permeating the applied econometrics literature. Notably, the oldest textbooks discussed above (Amemiya 1985 and Kmenta 1986) present the vaguest discussions of causal concepts.

The oral tradition in economics is not well-reflected in current, or particularly in outdated, textbooks. Chen and Pearl do those of us who teach or study econometrics a service in highlighting this problem, and hopefully discussion in future textbooks will continue to improve.

]]>`reg y x, robust`

Everyone knows that the usual OLS standard errors are generally “wrong,” that robust standard errors are “usually” bigger than OLS standard errors, and it often “doesn’t matter much” whether one uses robust standard errors. It is whispered that there may be mysterious circumstances in which robust standard errors are smaller than OLS standard errors. Textbook discussions typically present the nasty matrix expressions for the robust covariance matrix estimate, but do not discuss in detail when robust standard errors matter or in what circumstances robust standard errors will be smaller than OLS standard errors. This post attempts a simple explanation of robust standard errors and circumstances in which they will tend to be much bigger or smaller than OLS standard errors.

**Expressions for OLS and robust standard errors.**

Consider the univariate linear model

where is the dependent variable, is a covariate, is the error term, and is the parameter over which we would like to make inferences. I’ve omitted a constant by expressing the model in deviations from sample means, denoted with overbars. Assume is mean independent of and serially uncorrelated, but allow heteroskedasticity, . Let denote the OLS estimate of .

If we erroneously assume the error is homoskedastic, we estimate the variance of with

where . I will refer to the square root of this estimate throughout as the “OLS standard error.” When the errors are heteroskedastic, converges to the mean of , denote that . However, the true sampling variance of can easily be shown to be

Robust standard errors are based on estimates of this expression in which the are replaced with squared OLS residuals, or sometimes slightly more complicated expressions designed to perform better in small samples, see for example Imbens and Kolsar (2012).

**When do robust standard errors differ from OLS standard errors?**

Compare the expressions above to see that OLS and robust standard errors are (asymptotically) identical in the special case in which and are uncorrelated, in which case

If, on the other hand, and are positively correlated, then OLS standard errors are too small and robust standard errors will tend to be larger than OLS standard errors. And if and are negatively correlated, then OLS standard errors are too big and robust standard errors will tend to be smaller than OLS standard errors. These cases are illustrated in the graphs: in the left panel, the variance of the error terms increases with the distance between and its mean , whereas in the right panel observations are most dispersed around the regression line when is at its mean.

The graphs have been constructed such that the unconditional variance of the errors terms and the variance of are the same in each graph. But by inspection we can guess that our estimate of the slope is much less precise if the data look like the left panel than the right panel: perform a thought experiment to see that lots of regression lines fit the data in the left panel quite well, but the data in the right panel do a better job pinning down the slope. There is more information about the relationship between and in the data in the right panel even though the variance of and the unconditional variance of the error term are identical.

We see that heteroskedasticity doesn’t matter* per se*, what matters is the relationship between the variance of the error term and the covariates—if the errors are heteroskedastic but uncorrelated with , we can safely ignore the heteroskedasticity. To see why this is so, recall that in the homoskedastic case the variance of is inversely proportional to . If we add one more observation for which happens to equal , the variance of our estimate doesn’t change—there is no information in that observation about the relationship between and . As the draw of moves farther from its mean, the variance of falls more and more, because such draws, in the homoskedastic case, are more and more informative.

Now consider the case in which the variance of increases with , as in the left panel of the graph above. When we get one more observation, the amount of information it contains increases with for the same reasons as the homoskedastic case, but this effect is blunted by the higher variance of . The amount of information contained in a draw in which is far from its mean is lower than the OLS variance estimate “thinks” there is, so to speak, because the OLS variance estimate ignores the fact that such draws are more highly dispersed around the regression line. The OLS standard errors in this case are too small.

If on the other hand the variance of decreases with , then observations of far from its mean both contain more information for the usual reason in the homoskedastic case *and* are less dispersed around the regression line, as in the right panel of the graph above. These observations are even more highly informative than the OLS variance estimate “thinks” they are, and the OLS standard errors will tend to be too* large*. In this case, robust standard errors will tend to be *smaller* than OLS standard errors.

**Summarizing.
**

The upshot is this: if you have heteroskedasticity but the variance of your errors is independent of the covariates, you can safely ignore it, but if you calculate robust standard errors anyways they will be very similar to OLS standard errors. However, if the variance of your error terms tends to be higher when is far from its mean, OLS standard errors will tend to be biased down, and robust standard errors will tend to be larger than OLS standard errors. In the opposite case in which the variance of the error terms tends to be lower when is far from its mean, OLS standard errors will tend to be too large, and robust standard errors will tend to be smaller than OLS standard errors. With real data it’s commonly but not always going to be the case that the variance of the error will be higher when is far from its mean, explaining the result that robust standard errors are typically larger than OLS standard errors in economic applications.

]]>In a recently released NBER working paper, “Behavioral hazard in health insurance,” Katherine Baicker, Sendhil Mullainathan, and Joshua Schwartzstein consider behavioral biases that lead people to (specifically, and with loss of generality) underutilize health care. How should we think about designing health insurance in the presence of such biases?

We have solid evidence that changing the copayment (the amount you pay) affects use of care, so the design of health insurance plans matters for both our finances and our health. For example, the graph shows results from the RAND health insurance experiment, in which people were randomly assigned various levels of health insurance. People assigned to pay high prices for care used less care. In Canada, patients face a copayment of zero for “necessary” care, which suggests we get way too much health care—lots of treatments for which costs exceed benefits. We’re over at the level of care associated with a coinsurance rate of zero in the graph, and the standard model tells us that even small out-of-pocket payments from patients would greatly reduce demand for treatments. Further, we should expect those treatments to have very little net benefit, so we might greatly reduce costs at little consequence to our health.

The standard model helps us to explain overuse of expensive care with low health benefits. However, it is difficult to reconcile with evidence that people often underutilize certain treatments: treatments with minimal side effects, low prices, and large health benefits. For example, Choudry *et al* (2011) show that eliminating a roughly $20 copayment heart attack patients made for statins, beta blockers, and other drugs substantially increased adherence. The standard model requires us to infer that patients who would take the drugs at a price of zero but not at $20 either receive less than $20 worth of health benefits from the drugs or experience severe side effects which greatly reduce net benefit. Neither of these hypotheses sits well with the clinical evidence on efficacy and side effects.

Baicker *et al* consider behavioral models to help understand such outcomes. They start with a simple rational choice model as a point of departure. There is one illness with severity which varies across people. Everyone pays an insurance premium (or tax) , and people who choose to receive treatment must also pay a copayment . The treatment leads to an increase in health worth , with . A person with income who receives treatment gets utility and a person who does not receive treatment gets . In this simple setup a person will choose to receive treatment if , that is, if the health benefits are worth more to them than the copayment they must make. Since people are rational and have full information in this model, anything that makes price deviate from marginal cost then causes inefficiency *ex post*.

Optimal insurance contracts in this environment involve over-utilization when people are rational and risk-averse. The copayment that maximizes social welfare, , satisfies

,

where is the cost of providing treatment, is the benefit of reduced financial risk (which depends on the curvature of the utility function), and is the elasticity of demand for care. More elastic demand implies more moral hazard, and more moral hazard means copayments should be higher. For example, if the price of a visit to an emergency room rises from $50 to $100 and almost no one is deterred from emergency care, then moral hazard is not a big issue and insurance mostly reduces risk, which means in turn that we should heavily insure emergency care. As the authors emphasize, policy makers in this world only need to know the elasticity of demand and the degree of risk aversion to design optimal insurance systems, they do not need to know how effective care is (the schedule ) because rational fully-informed agents make their decisions on the basis of health benefits.

The result that the elasticity of demand determines optimal insurance leads to some strange conclusions. For example, demand for beta blockers appears to be about as elastic as demand for cold remedies, even though beta blockers are “essential” and cold remedies are not (to put it mildly). A policy maker should then set similar copayments for cold remedies and beta blockers.

But suppose people make systematic errors. They choose treatment if , where can represent a variety of “internalities,” that is, behavioral biases, including present bias, inattention, and false beliefs (systematic over or underestimation of efficacy). Here, is the “experienced utility” of treatment whereas is the “decision utility” of treatment. Conventionally, these coincide: if you choose A over B, you are better off with A. Here, when you choose A over B, you might be better off with B.

In effect, the paper considers what happens when we allow for the possibility that demand does not coincide with marginal benefits, and much of the analysis is similar to standard analysis of activities with positive externalities, for example, vaccines against communicable diseases. Subsidizing such a vaccine such that price falls below marginal cost can be sound policy; similarly, we may want insurance to decrease the price of some treatments below cost, even if everyone is risk neutral. The graph illustrates the outcome with behavioral underutilization: suppose first that price is set to equal marginal cost. The blue line is the demand curve, so the outcome is Q treatments. However, marginal benefits do not coincide with demand, marginal benefits are given by the green line. Setting the price to zero through full insurance increases treatments to Q’. In the standard model, we would conclude that moral hazard leads to a welfare loss equal to area shaded green. In the behavioral model, we instead conclude that setting the price to zero increases welfare by an amount equal to the area of the blue triangle.

In the behavioral model, the optimal copayment satisfies

,

where is approximately equal to (see page 18 for details) and denotes illness severity for the patient who’s just indifferent to treatment. The standard model is the special case in which and the second term on the right-hand side disappears. Optimal insurance now depends on more than just the elasticity of demand and the value of financial risk reduction . Treatments with larger behavioral distortions (more negative values of ) should have lower copayments, holding and constant. Cold medication and beta blockers need not have the same copayment. Even if everyone were risk neutral so that , it would be still optimal to provide insurance, because insurance can correct the behavioral issues leading to inefficiently low levels of care. If behavioral issues are severe enough, it may even be optimal to force people to pay more than marginal cost, or subsidize rather than charge for treatment.

The authors present an empirical illustration of how dramatically these effects can change standard results. Again consider the heart attack patients studied by Choudry *et al* (2011). The standard model forces us to conclude that eliminating the copayment for heart attack drugs leads to extra costs of about $106 per patient and extra health benefits worth about $26 per patient. The incremental care provided when copayments are eliminated costs more than it’s worth; moral hazard reduces welfare by about $106 – $26 = $80 per patient. The standard model tells us to conclude that eliminating copayments is bad policy. The behavioral model, conversely, implies that the incremental care is worth roughly $3,000 per patient, not $26. According to the behavioral model, eliminating copayments is a very good policy.

What do these results imply for health care in Canada? One immediate implication is that frequently-proposed small copayments for necessary care may not be good policy. Usually, a large demand response to small copayments would be considered evidence that Canadians consume lots of care they don’t really need, that is, that moral hazard is prevalent. But we should also consider the possibility that people mistakenly forego high net benefit treatments due to behavioral bias. If we were to introduce copayments, we should do so selectively: charge people for only types of care with low health benefits or for which patients (or physicians) tend to overestimate health benefits.

]]>

Following work such as Wilkinson and Pickett’s The Spirit Level, the notion that income inequality causes low health has become popular. For example, Paul Krugman recently noted in a blog post titled “Inequality Kills,”

We have lots of evidence that low socioeconomic status leads to higher mortality — even if you correct for things like availability of health insurance. Some of the effects may come through self-destructive behavior, some through simple increased stress; think about what it feels like in 21st-century America to be a worker without even a high school degree. In any case […] what we’re looking at is a clear demonstration of the fact that high inequality isn’t just unfair, it kills.

Income inequality and poor population health are correlated across counties, lending support to the idea that inequality does indeed kill. For example, the graph to the right, from *The Spirit Level*, shows a scatterplot of Gini coefficients against an index of health and social problems: more inequality is correlated with more problems. But such graphs, as we will see, are hard to interpret, and we cannot conclude from the type of correlation it displays that inequality *per se* causes poor health.

Consider the ambiguity in the Krugman’s argument above: is it *inequality*, as in the title, that leads to poor health, or is it *low socioeconomic status*, as in the body? These are clearly related mechanisms, but they are different mechanisms.

Suppose societies A and B have identical income distributions up to the 90th percentile, but A’s distribution in the top decile is more “stretched out,” that is, the relatively rich are richer still in society A. If low personal income causes low health, all else equal the bottom 90% of people in A and B will have the same health. If health is socially determined in the sense that relative deprivation matters in addition to absolute deprivation, then the bottom 90% in society A will experience worse health than in B because in society A the bottom 90% are relatively worse off compared to B. And if more income dispersion causes lower health for everyone, then the richest 10% in society A may *also* experience lower health than in B. For both policy and scientific reasons, it’s important that we discover whether a person’s health is determined by his income alone, or by both his income and the incomes of the other people in his society.

The literature formalizes these issues as three paths from the distribution of income to a person’s health. First, a person’s income may cause that person’s health (the absolute income hypothesis). Health is only socially determined through this mechanism in the sense that every person’s income is socially determined, there is no further social effect holding individual income constant.

Second, a person’s income relative to other people in her reference group may cause her health (the relative income hypothesis). Finally, the dispersion of income in the society in which the person lives may cause her health, holding her income constant (the income inequality hypothesis). These mechanisms can be expressed:

- Absolute income hypothesis:
- Relative income hypothesis:
- Income inequality hypothesis:

where indexes people, is a measure of health, is income, , , and are unknown functions, is the income of a reference person (such as the median or mean person’s income), and is the variance or other measure of dispersion of across people. All three mechanisms may occur at the same time, they are not exclusive.

The relative income and the income inequality hypothesis are less plausible on their face than the absolute income hypothesis: it is easy to think of reasons why your income causes your health (even in the presence of “free” health care), but it is harder to think of reasons why my income causes your health, as in the absolute and relative income hypotheses. Angus Deaton skeptically refers to the relative and inequality hypotheses as “action at a distance.”

Perhaps Deaton is overly skeptical, as animal studies and other evidence do lend support to the idea that low social position causes physiological changes which lead to poor health (e.g., the Whitehall studies, see Marmot et al 2001). More inequality may cause people low in the hierarchy to experience negative emotions such as stress and shame, which may directly cause low health and indirectly cause low health through behaviors such as substance abuse. However, we face a number of problems attempting to operationalize this notion, and in theory anything goes even if we accept assume this mechanism exists. Deaton, for example, asks us to consider these variants on the relative income hypothesis:

- Your health depends on your rank in the social hierarchy.
- Your health depends on the difference between your income and the richest person’s income.
- Your health depends on the difference between your income and the poorest person’s income.

These all seem reasonable ways of modeling the notion that the social hierarchy affects health. Now consider the implications of a policy which reduces inequality without changing the ordering of income across people or changing mean income. Under 1, there is no effect at all on health, as we have not changed anyone’s rank in hierarchy. Under 2, average health goes up because the distance between the richest person’s income and a person’s income falls. And under 3, average health goes down as the distance between the poorest person’s income others’ incomes falls.

Another pragmatic problem is determining appropriate reference groups. Do you compare yourself to other people in your town? Your country? Your occupation, or your age, or your ethnicity, or your friends, or some combination of all of these and many other characteristics? In theory, this is easy—models assume there are groups 1 through and each agent is assigned a group . In practice, reference groups are nebulous, and we will generally get different statistical answers depending on how we define reference groups.

Many studies attempt to use aggregate data to get at the effect of inequality on health, yielding results such as displayed as in the scatterplot of health and Gini coefficients above. Discovering that countries with more inequality tend to have lower public health is often interpreted as evidence of social causation of health operating through stress, social cohesion, or other psychological consequences of position in the social hierarchy. However, that conclusion does not follow.

One reason we’ll observe inequality and low health move together even if only the absolute income hypothesis holds is called the “concavity effect.” Suppose that the effect of an extra dollar on health is positive but lower than the effect of the previous dollar, that is, that is concave, as in the graph to the right. Then, holding mean income constant, increasing the dispersion of income in a society mechanically decreases average health. Intuitively, if we take a dollar from a rich person and give it to a poor person, average health goes up if an additional dollar increases a poor person’s health more than a rich person’s health. The concavity effect implies that studies of aggregate data cannot help us disentangle the absolute, relative, and inequality hypotheses.

The concavity effect is sometimes referred to as a statistical artifact because it generates correlation between population health and income inequality that only operates through the absolute income effect. However, it is important to note that this is the effect we have the most evidence on, the evidence mostly agrees, and the evidence tells us that redistribution, so long as it does not destroy too much average wealth, will increase average health. Put another way, *we do not have to believe that inequality per se causes stress or other mental or physical health issues to conclude that reducing poverty will increase population health.*

With data on individuals we can shed some light on the relationship between income inequality and health, holding personal income fixed. Many papers estimate models similar to, or special cases of, specifications such as,

where is a vector of individual and contextual characteristics for person in country, region, or other reference group , is mean income within reference group, is the variance or other measure of income dispersion in ‘s reference group, is some function of income, and are parameters to be estimated, and is an error term representing other causes of health. Sometimes, is assumed to be linear, which means that curvature in the individual–level relationship may appear as a social effect. Usually, it is a quadratic or step function, and rarely no structure is imposed and the model is estimated using semiparametric methods (as in Jones and Wildman 2008). These papers typically use large, individual level cross-sectional or repeated cross-sectional datasets with countries or regions within countries treated as reference groups; infrequently panels are used or reference groups are defined more narrowly, such as age-region cells.

The evidence from estimating such models provides at best weak support for the relative and inequality hypotheses. As opposed to results from aggregate models which robustly find higher inequality is associated with lower population health *without* controlling for absolute individual income, the signs of the estimated coefficients on inequality measures are very roughly equally negative or positive, and they are commonly statistically and substantively insignificant. These results lead some authors to draw conclusions such as “evidence favouring a negative correlation between income inequality and life expectancy has disappeared” (Mackenbach 2002) and “there seems to be little support for the idea that income inequality is a major, generalizable determinant of population health differences within or between rich countries” (Lynch et al 2004), whereas “the absolute income hypothesis… is still the most likely to explain the frequently observed strong association between population health and income inequality levels” (Wagstaff and Doorslaer 2000).

I’ll close by noting some of the remaining difficulties with this literature, challenges to be overcome in future research.

As we’ve seen, the literature to date largely attempts to estimate partial associations between health, personal income, and aspects of the distribution of income. Even ignoring the ambiguities and problems discussed above, we cannot interpret the resulting estimates as plausibly reflecting causal effects.

At the individual level it is very likely that health causes income as well as income causing health. The income–health gradient in part reflects the disadvantages unhealthy people face in the labor market: health and income are simultaneously determined. Further, countless personal and contextual effects may cause both health and income, so models such as those estimated in the literature typically suffer from both simultaneity bias and omitted variables bias (for example, many studies fail to even condition on education, which is an important cause of both health and income). I expect to see more efforts to pin down the effect of individual income on individual health, and to tie such efforts to the burgeoning literature examining health over the life cycle, particularly the long-term effects of childhood development (e.g., Cunha and Heckman 2007). There is some evidence that some of the correlation between absolute health and income is attributable to what is here “reverse” causation from health to income (e.g., Boyce and Oswald 2011, Case and Paxson 2011). It’s difficult to see how we can credibly estimate the effect of unequal societies on health without making further progress on the effect of a person’s income on her health.

Omitted variables at the reference group (usually, regional) level are also a problem. In equation (*) above, the only reference group level variables are the mean and dispersion of income, implying that reference-group level causes of health which are correlated with the distribution of income may generate partial correlations between income distribution and health even if income distribution does not cause health. Deaton and Lubotsky (2003), for example, show that controlling for the proportion of black people at the regional level removes the association between inequality and mortality across U.S. cities. Which other demographic, policy, or institutional differences across regions cause both inequality and low health?

A related issue for future research is opening the black box and figuring out exactly how income inequality affects health. For example, Drabo (2010) argues that his results imply that more unequal incomes reduce demand for environmental quality, lower environmental quality causes lower health, and after netting out this mechanism there is no further effect of inequality on health. More unequal incomes may lead to changes in a variety of prices, access to various goods and services, the type and quality of various public programs, and changes in various notions of social capital. Which regional characteristics mediate the effect of income inequality on health? Is there an additional effect of inequality *per se* on health after holding constant personal income and all of the social causes of health which may themselves result from more inequality? At the moment, we simply don’t know.

We have much yet to learn about the effects of the distribution of income on health, and even the simpler issue of determining the effects of individual income on health.

]]>