Tuesday, October 30, 2007

Inequality

I think a bit of the point of the last (next to last?) post got lost by bringing the US vs. EU into it. Here's another way to think about it.

Suppose there are 250 economies in the world, each with exactly the same, really large, number of people. Furthermore these economies are exact replicas of each other. Every person in economy 1 has 249 exact twins in all the other economies all of whom have the same identical income. So the distribution of income in each economy is EXACTLY the same. And in each of these 250 economies the distribution of income is given by a logistic distribution with parameters mu and sigma (again, same for all 250 of'em).

Now say you're a researcher who's interested in studying the relationship between economic development, as measured by mean income, and inequality, as measured by the Gini coefficient. Problem for you is that you do not get to observe the entire income distribution in each of the 250 economies at your disposal (in a way, if you COULD observe the entire income distribution for all 250 economies, why would you be interested in the mean and the Gini anyway? Those are descriptive statistics, which means they leave information out). So you don't know that all these 250 economies are identical (which also means that even if you did know you couldn't estimate the relationship between development and inequality anyway since you'd have no variation in the data. A single observation).

Instead, what you do have is the ability to select 1000 people from each of these economies, compute sample means and sample Ginis and base your inference about development and inequality on this data. Of course, since you're a scrupulous researcher you want to ensure that each of your samples of 1000 is random.

Well, in that case what you're gonna get is exactly what I got in the previous post; a positive relationship between inequality and mean income and you will erroneously conclude (as the paper cited below did) that as countries become more developed they become more unequal, EVEN THOUGH ALL THE ECONOMIES ARE REALLY IDENTICAL.

Economic Investigations has nicer graphics:



(They're much nicer over there)

From this faulty conclusions all kinds of wrong policy implications and conclusions can follow. That more inequality is the price you have to pay for higher standards of living (right wing). That growth of income doesn't matter because it's only the rich who benefit (left wing). And so on.

So what's left? Are we prevented from saying anything about the relationship between inequality and development because we don't know what distributions the underlying data come from? Well, no, but there should be a lot more caution with regard to the data and a lot less conclusions drawn from regressions of the form
Inequality = a + b*income

and even (or especially)
Inequality = a + b*income + c*income^2

For example, it would be silly to try to argue that an economy with an average income of 2000$ per year comes from the same distribution (or is identical too, and the difference in sample mean accounted for by randomness) as an economy with an average income of 30000$. Likewise an economy with a Gini of .6 is very very unlikely to be the same (or have the same "structure") as one with a Gini of .3 (interpreting the magnitude of Gini coefficients was part of my motivation for looking at this stuff).

In fact, here's the distribution of the mean income across the 250 economies from the original simulation:



Of course, by the Central Limit Theorem as the number of your economies increases the distribution of the mean income will converge to a normal distribution with the "true" mean of means. Going by the sample standard deviation above this means that an economy with an observed per capita income of 40,000$ is about (roughly, sort of, hold on a second ...) only 5% likely to come from a lognormal distribution with mean of 36,000$.

Here's the distribution for the Gini



So here the sample minimum and max are .41 and .48 which means that it's possible for our estimate of inequality to vary quite a lot even when the true Gini is .44. On the other hand the good news is that most of the observations fall within (.42,.46) interval and anything outside of that is unlikely to have come from that distribution.

Of course what really matters is the JOINT distribution of mean income and the gini. But it should be possible to compute the probability that any two observed economies come from the same distribution. I haven't done it yet. I'm working on it.



Here are some other "fake" relationships found in the generated data:

Share of top 1% earners vs. avg income (this also fits the observed pattern for US and EU):



So this follows the pattern with the Gini for pretty much the same reason, but because for the y-axis variable you're only looking at the portion of the distribution there's more dispersion around the regression line.


Here's poverty vs. log income:



This one's a bit harder to explain since the poverty (measured by headcount ratio with the poverty line set at 10000$) does not depend on the Bill Gates effect. But it's sort of the same thing locally at the left tail of the distribution (you get some lucky draws which both increase average income and decrease poverty, since the lognormal density is increasing for values below the median) so the strength of the relationship is much weaker. Also it's going to be very sensitive to where one sets the poverty line.

Finally, Amartya Sen proposed the following measure of Social Welfare:
SW = (1-G) * y, where G is gini and y is average income. Since here G and y are positively related, when G goes up, (1-G) goes down but y goes up. So there's two offsetting effects on SW and this is essentially a measure of which one dominates. Here it looks like higher income is associated with lower welfare (because of higher inequality, so the change in inequality dominates (which is why some folks love this measure)). But again, remember that these are all identical economies:




Ok. That's it for now. Be suspicious of people claiming that there's strong relationship(s) between inequality and growth/income.

6 Comments:

Anonymous okbut said...

Many valid points, but isn't one of the empirical premises of your claim not quite true. Gini Coefficients are usually measured following a census of the population, so it's not a 'sample of 1000,' it's not a sample at all.

http://www.census.gov/hhes/www/income/histinc/ie1.html

Also, even if this were a problem that arises due to sampling, wouldn't we expect the data points on the next sample to move around a lot?

Any estimate of your regression in differences should therefore be immune to this spurious relationship critique.

10:11 PM  
Blogger Gabriel M. said...

Okbut,

Who's stupid enough to answer truthfully to a census when most of one's wealth is gained by illegal means or is unknown to the wife or something like that?

All measurements at the national level are extremely noisy.

I do agree that YNS! went too far with his thought experiment. He could have sticked with the notion that all countries are draws from the same distribution. He instead wants to make everyone replicas...

11:19 PM  
Blogger YouNotSneaky! said...

Gabriel,

Measurement error (which is what mis reporting of one's income is) is a separate issue, admittedly a big one. I started thinking about this to get an idea of how to interpret Gini's. How big does a difference in two Ginis have to be so that we can say (with some x% certainty) that the two economies really have different levels of inequality and not just measurement error. I haven't gotten there yet.

"He instead wants to make everyone replicas..."

Yeah but the same thing applies if the income distributions are different - the coeffs in those type of regressions will be inconsistently estimated. Assuming replicas just brings the point out in a stark way.

okbut,

Well, first the data points DO move around a lot sample to sample, at least as far as the Gini is concerned. I've seen Gini estimates for US ranging from .35 to .55 (which is way way more than two st. dev. in simulated data). Some of that variance just has to do with whether you're looking at post or pre tax income but some of it could be (probably is) sampling issues as the post says.

Second, it is my understanding - and I could be very wrong about this - is that the Ginis are generally calculated from a sample from the census (usually 5%). So, even assuming no measurement error as in above, the sampling issue is still there. Worse, again it is my understanding - and I could be wrong about this too - that usually the Ginis are computed by first estimating quantile or decile shares and then extrapolating between them to get an approximation of the entire income distribution. (as in the Deininger and Squire data set).

Third, even if they were computed from the entire census, those only come every 10 years. So say you compute'em in 1990 and 2000 using EVERYONE in the economy and you get two different G's you still can't tell if it's because the "underlying structure" has changed or just because of randomness in the data generating process (which remained the same). The other interpretation in the original post still stands.

Thanks for the good comments.

4:19 PM  
Anonymous okbut said...

Thanks for the very thoughtful answers. I think you make a fundamental point which is is that there is almost bound to be some bias in the correlation between income per capita and inequality. The next question is, "is it possible to minimize or remove this bias?"

8:04 AM  
Blogger YouNotSneaky! said...

Yeah, I'm still thinking about how to do that.

5:11 PM  
Blogger Basti said...

Two comments:

Firstly, will this finally persuade my American coworkers (and by extension, me) to finally take more vacations?

Secondly, What would be a good chi-squared test to do to better understand these results?

5:16 PM  

Post a Comment

<< Home