An ecological correlation is a correlation based on group means, rather than measurements from individuals. For example, if we are interested in the relationship between urbanization and national prosperity, we might look across countries to see if there is a correlation between the percentage of people living in urban areas and GDP per capita. This is an ecological correlation because it is based on data about groups of people, i.e. countries, rather than individuals. If our interest is in national economies, then it’s appropriate to look at ecological correlations based on aggregate data for countries. However, if we find a correlation at the country level, we cannot assume we will find a similar correlation at the individual level or even for groups within countries. We cannot assume, for example, that people who live in urban areas are more prosperous on average than those who do not. Nor can we assume that urban areas within countries will be more prosperous than non-urban areas.
Consider another example. Suppose we find a positive correlation at the national level between life expectancy and Internet use. Can we conclude from this that individuals who use the Internet live longer than those who do not? No, we cannot.
Assuming equivalence between correlations derived from group means and correlations derived from individual data leads to the ecological fallacy.
The term comes from a 1950 paper by Robinson (1950). For each of the 48 states in the US as of the 1930 census, he computed the literacy rate and the proportion of the population born outside the US. He showed that these two figures were associated with a positive correlation of 0.53 — in other words, the greater the proportion of immigrants in a state, the higher its average literacy. However, when individuals are considered, the correlation was −0.11 — immigrants were on average less literate than native citizens. Robinson showed that the positive correlation at the level of state populations was because immigrants tended to settle in states where the native population was more literate. He cautioned against deducing conclusions about individuals on the basis of population-level, or “ecological” data.
Wikipedia also notes that, according to a book by Gelman, Park, Shor, Bafumi, & Corina (2008), in recent elections wealthier states were more likely to vote Democratic and poorer states Republican. At the individual level, however, wealthier voters are more likely to vote Republican, and poorer voters more likely to vote Democratic. This illustrates the need to be careful about what we conclude from ecological correlations.
As Lubinski & Humphreys (1996) have advised, however, there are times when an ecological correlation is the proper way to look at the relationship between two variables. For example, to understand the impact of smoking on public health, looking at group-level relationships between smoking and lung cancer is more useful than looking at correlations based on data from individual smokers.
Gelman, Andrew; Park, David; Shor, Boris; Bafumi, Joseph; Cortina, Jeronimo (2008). Red State, Blue State, Rich State, Poor State. Princeton University Press. ISBN 978-0-691-13927-2.
Lubinski, D., & Humphreys, L. G. (1996). Seeing the forest from the trees: When predicting the behavior or status of groups, correlate means. Psychology, Public Policy, and Law, volume 2, pages 363-376.
Robinson, W.S. (1950). “Ecological Correlations and the Behavior of Individuals”. American Sociological Review 15: 351–357