Suppose there are 250 economies in the world, each with exactly the same, really large, number of people. Furthermore these economies are exact replicas of each other. Every person in economy 1 has 249 exact twins in all the other economies all of whom have the same identical income. So the distribution of income in each economy is EXACTLY the same. And in each of these 250 economies the distribution of income is given by a logistic distribution with parameters mu and sigma (again, same for all 250 of'em).
Now say you're a researcher who's interested in studying the relationship between economic development, as measured by mean income, and inequality, as measured by the Gini coefficient. Problem for you is that you do not get to observe the entire income distribution in each of the 250 economies at your disposal (in a way, if you COULD observe the entire income distribution for all 250 economies, why would you be interested in the mean and the Gini anyway? Those are descriptive statistics, which means they leave information out). So you don't know that all these 250 economies are identical (which also means that even if you did know you couldn't estimate the relationship between development and inequality anyway since you'd have no variation in the data. A single observation).
Instead, what you do have is the ability to select 1000 people from each of these economies, compute sample means and sample Ginis and base your inference about development and inequality on this data. Of course, since you're a scrupulous researcher you want to ensure that each of your samples of 1000 is random.
Well, in that case what you're gonna get is exactly what I got in the previous post; a positive relationship between inequality and mean income and you will erroneously conclude (as the paper cited below did) that as countries become more developed they become more unequal, EVEN THOUGH ALL THE ECONOMIES ARE REALLY IDENTICAL.
Economic Investigations has nicer graphics:
(They're much nicer over there)
From this faulty conclusions all kinds of wrong policy implications and conclusions can follow. That more inequality is the price you have to pay for higher standards of living (right wing). That growth of income doesn't matter because it's only the rich who benefit (left wing). And so on.
So what's left? Are we prevented from saying anything about the relationship between inequality and development because we don't know what distributions the underlying data come from? Well, no, but there should be a lot more caution with regard to the data and a lot less conclusions drawn from regressions of the form
Inequality = a + b*income
and even (or especially)
Inequality = a + b*income + c*income^2
For example, it would be silly to try to argue that an economy with an average income of 2000$ per year comes from the same distribution (or is identical too, and the difference in sample mean accounted for by randomness) as an economy with an average income of 30000$. Likewise an economy with a Gini of .6 is very very unlikely to be the same (or have the same "structure") as one with a Gini of .3 (interpreting the magnitude of Gini coefficients was part of my motivation for looking at this stuff).
In fact, here's the distribution of the mean income across the 250 economies from the original simulation:
Of course, by the Central Limit Theorem as the number of your economies increases the distribution of the mean income will converge to a normal distribution with the "true" mean of means. Going by the sample standard deviation above this means that an economy with an observed per capita income of 40,000$ is about (roughly, sort of, hold on a second ...) only 5% likely to come from a lognormal distribution with mean of 36,000$.
Here's the distribution for the Gini
So here the sample minimum and max are .41 and .48 which means that it's possible for our estimate of inequality to vary quite a lot even when the true Gini is .44. On the other hand the good news is that most of the observations fall within (.42,.46) interval and anything outside of that is unlikely to have come from that distribution.
Of course what really matters is the JOINT distribution of mean income and the gini. But it should be possible to compute the probability that any two observed economies come from the same distribution. I haven't done it yet. I'm working on it.
Here are some other "fake" relationships found in the generated data:
Share of top 1% earners vs. avg income (this also fits the observed pattern for US and EU):
So this follows the pattern with the Gini for pretty much the same reason, but because for the y-axis variable you're only looking at the portion of the distribution there's more dispersion around the regression line.
Here's poverty vs. log income:
This one's a bit harder to explain since the poverty (measured by headcount ratio with the poverty line set at 10000$) does not depend on the Bill Gates effect. But it's sort of the same thing locally at the left tail of the distribution (you get some lucky draws which both increase average income and decrease poverty, since the lognormal density is increasing for values below the median) so the strength of the relationship is much weaker. Also it's going to be very sensitive to where one sets the poverty line.
Finally, Amartya Sen proposed the following measure of Social Welfare:
SW = (1-G) * y, where G is gini and y is average income. Since here G and y are positively related, when G goes up, (1-G) goes down but y goes up. So there's two offsetting effects on SW and this is essentially a measure of which one dominates. Here it looks like higher income is associated with lower welfare (because of higher inequality, so the change in inequality dominates (which is why some folks love this measure)). But again, remember that these are all identical economies:
Ok. That's it for now. Be suspicious of people claiming that there's strong relationship(s) between inequality and growth/income.