Tuesday, October 30, 2007

Inequality

I think a bit of the point of the last (next to last?) post got lost by bringing the US vs. EU into it. Here's another way to think about it.

Suppose there are 250 economies in the world, each with exactly the same, really large, number of people. Furthermore these economies are exact replicas of each other. Every person in economy 1 has 249 exact twins in all the other economies all of whom have the same identical income. So the distribution of income in each economy is EXACTLY the same. And in each of these 250 economies the distribution of income is given by a logistic distribution with parameters mu and sigma (again, same for all 250 of'em).

Now say you're a researcher who's interested in studying the relationship between economic development, as measured by mean income, and inequality, as measured by the Gini coefficient. Problem for you is that you do not get to observe the entire income distribution in each of the 250 economies at your disposal (in a way, if you COULD observe the entire income distribution for all 250 economies, why would you be interested in the mean and the Gini anyway? Those are descriptive statistics, which means they leave information out). So you don't know that all these 250 economies are identical (which also means that even if you did know you couldn't estimate the relationship between development and inequality anyway since you'd have no variation in the data. A single observation).

Instead, what you do have is the ability to select 1000 people from each of these economies, compute sample means and sample Ginis and base your inference about development and inequality on this data. Of course, since you're a scrupulous researcher you want to ensure that each of your samples of 1000 is random.

Well, in that case what you're gonna get is exactly what I got in the previous post; a positive relationship between inequality and mean income and you will erroneously conclude (as the paper cited below did) that as countries become more developed they become more unequal, EVEN THOUGH ALL THE ECONOMIES ARE REALLY IDENTICAL.

Economic Investigations has nicer graphics:



(They're much nicer over there)

From this faulty conclusions all kinds of wrong policy implications and conclusions can follow. That more inequality is the price you have to pay for higher standards of living (right wing). That growth of income doesn't matter because it's only the rich who benefit (left wing). And so on.

So what's left? Are we prevented from saying anything about the relationship between inequality and development because we don't know what distributions the underlying data come from? Well, no, but there should be a lot more caution with regard to the data and a lot less conclusions drawn from regressions of the form
Inequality = a + b*income

and even (or especially)
Inequality = a + b*income + c*income^2

For example, it would be silly to try to argue that an economy with an average income of 2000$ per year comes from the same distribution (or is identical too, and the difference in sample mean accounted for by randomness) as an economy with an average income of 30000$. Likewise an economy with a Gini of .6 is very very unlikely to be the same (or have the same "structure") as one with a Gini of .3 (interpreting the magnitude of Gini coefficients was part of my motivation for looking at this stuff).

In fact, here's the distribution of the mean income across the 250 economies from the original simulation:



Of course, by the Central Limit Theorem as the number of your economies increases the distribution of the mean income will converge to a normal distribution with the "true" mean of means. Going by the sample standard deviation above this means that an economy with an observed per capita income of 40,000$ is about (roughly, sort of, hold on a second ...) only 5% likely to come from a lognormal distribution with mean of 36,000$.

Here's the distribution for the Gini



So here the sample minimum and max are .41 and .48 which means that it's possible for our estimate of inequality to vary quite a lot even when the true Gini is .44. On the other hand the good news is that most of the observations fall within (.42,.46) interval and anything outside of that is unlikely to have come from that distribution.

Of course what really matters is the JOINT distribution of mean income and the gini. But it should be possible to compute the probability that any two observed economies come from the same distribution. I haven't done it yet. I'm working on it.



Here are some other "fake" relationships found in the generated data:

Share of top 1% earners vs. avg income (this also fits the observed pattern for US and EU):



So this follows the pattern with the Gini for pretty much the same reason, but because for the y-axis variable you're only looking at the portion of the distribution there's more dispersion around the regression line.


Here's poverty vs. log income:



This one's a bit harder to explain since the poverty (measured by headcount ratio with the poverty line set at 10000$) does not depend on the Bill Gates effect. But it's sort of the same thing locally at the left tail of the distribution (you get some lucky draws which both increase average income and decrease poverty, since the lognormal density is increasing for values below the median) so the strength of the relationship is much weaker. Also it's going to be very sensitive to where one sets the poverty line.

Finally, Amartya Sen proposed the following measure of Social Welfare:
SW = (1-G) * y, where G is gini and y is average income. Since here G and y are positively related, when G goes up, (1-G) goes down but y goes up. So there's two offsetting effects on SW and this is essentially a measure of which one dominates. Here it looks like higher income is associated with lower welfare (because of higher inequality, so the change in inequality dominates (which is why some folks love this measure)). But again, remember that these are all identical economies:




Ok. That's it for now. Be suspicious of people claiming that there's strong relationship(s) between inequality and growth/income.

Sunday, October 28, 2007

Update on inequality...

I plan on adding a few comments to my previous post soon (pretty busy with real work currently). But for now I just wanted to note that Gabriel wrote the Mathematica code which you can use to run your own simulations.


Here's a paper
which I can't access right now but which looks like it falls into the trap outlined in the previous post. Abstract:

Recent research has posited that, in advanced economies, there is a positive correlation between in- come inequality and development. Using a new unbalanced panel dataset for 71 countries from 1961 to 1992, we present evidence that supports this conjecture. Although many factors may be contributing to this renewed positive relationship between growth and inequality, one plausible explanation rests on the shift away from a manufacturing base towards a service base in most advanced economies.

(My emphasis)

Sunday, October 21, 2007

Inequality and the Bill Gates effect

(Note: I haven't gotten all the kinks worked out in what follows below. So maybe I've missed something or said something wrong)

The recurrent topic of which economy's dad can beat up which economy's dad - the US' or the EU's - recently popped up again here and there. Roughly speaking, the US has lots of inequality but higher per capita income, whereas the EU (here, as often, basically meaning France, Italy and Germany) has lower per capita income but a lot less inequality. In fact if you take the OECD countries, minus the late joiners, and slap'em up on a scatter plot what you'll see is a (pretty rough - there's all kinds of measurment issues here) positive relationship between per capita income and the level of inequality as measured by the Gini coefficient.

These differences start people talking about the Anglo-Saxon vs. European models of the economy, how the US and Europe have different economic structures which produce these results and how either Europe is quickly falling into the dustbin of history, or the US is becoming a place where folks starve in the gutter because they have no access to health insurance and evil capitalist stole their puppies. A possible question one might ask however is whether this kind of relationship - high income and inequality in US, low income and inequality in Europe - could arise simply by chance. To do that let's delve a bit deeper into the measurement of inequality and income distributions. What I'm gonna argue is that despite the fact that it may not seem like it, this kind of relationship could very well arise by chance and that relatedly, there probably isn't that much of a difference between the economic structure of the US and that of the EU. In fact, a positive relationship between per capita income and inequality is PRECISELY what you would expect if THERE ARE NO DIFFERENCES in the underlying structure of these economies (given certain assumptions of course).


Ok, so the Gini coefficient is a measure of inequality and it is given by:




where y_i is the income of person i and the y's are arranged in an ascending order, that is:




The Gini has the familiar interpretation as twice the area between the diagonal and the Lorenz curve, and it is equal to 0 if everyone has the same income, and 1 if one person has all the income. Higher levels of G indicate higher inequality.

One of the key properties of the Gini (or any half way decent measure of inequality for that matter) is that it is scale independent. What this means is that if we multiply everyone's income by some positive number (say double everyone's income) then the amount of inequality, as measured by the Gini, will not change. In other words, richer economies are not automatically considered to be more unequal than poor ones. (It also means that the Gini is a unit-less measure, independent of whether we measure incomes in US dollars or rupees or World of Warcraft gold or whatever)

Mathematically it means that G is a function homogenous of degree 0 in all the y's



So, you would expect that if you have a whole bunch of economies, all of which have the same "underlying economic structure" then, on average, there should be no relationship between per capita income and inequality. Some economies might randomly end up with high inequality and high per capita income, some randomly end up with high inequality and low per capita income, some with low income/low inequality and some with low income/high inequality. But on average there should be no pattern for "similar" economies.

In fact there's a whole literature on the relationship between inequality and economic development (as measured by level of per capita income) going back to Nobel prize winner Simon Kuznets. Kuznets, famously claimed to have discovered the so called Kuznets curve which says that the relationship between inequality and per capita income is reversed n-shape. That is, in poor countries everyone's dirt poor but they're all equally dirt poor. As the economy begins to develop and per capita income rises however, some portion of the population begins pulling away from everyone else and inequality rises. In developed economies the rest of the population has increasing incomes along with the very top and inequality actually falls. There's been a lot of work looking into whether the Kuznets curve actually exists in the data (it sort of looks like it, except that most middle income countries tend to be in Latin America which has high inequality for historical reasons, hence the relationship might not be driven by changes in income), controlling for reverse causality and other factors. A typical approach is to run a regression like this




where G is the gini, y_A is per capita income, x is a set of control variables and the betas and gammas are parameters to be estimated. Then if the estimation produces beta_1>0 and beta_2<0 then this is taken as evidence in support of the Kuznets hypothesis.

Relatedly, we might expect that if we were to do a regression such as the one above and get beta_1=beta_2=0 then all the economies in the sample have roughly the same "underlying economic structure" (except for variation in the x's) and the differences among them in per capita income and inequality have arisen purely by chance.


But this expectation is actually wrong. The reason for this is that both statistics - the Gini and the per capita income - are themselves constructed from the same underlying data, the individual level or household level incomes, which themselves are produced through some partly random process.

To explain and illustrate it's actually easier to do a simulation (and like I said, I haven't worked out all the kinks).

I'm assume here that within each economy (simulation) individual incomes are draws from a log-normal distribution with parameters mu and sigma where





Why a log normal distribution? Well, for most real life economies the income distribution is skewed to the left (meaning median income is less than average income) and generally a log normal is a pretty good fit. Another potential candidate could be the Pareto distribution but that actually turns to be "too skewed" - overall it's something like mostly log-normal with Pareto at the very top. At any rate, what actually matters is that the incomes are generated by a data generating process which is left skewed.

(See here. Actually, I had a better reference somewhere but managed to loose it among my bookmarks)

I'm gonna keep stressing this throughout but the key is that all the simulated economies have incomes generated by the SAME process, that is the same mu and sigma.

So we generate, say a 1000 draws from a log normal distribution and then repeat this, say, 250 times (should do more than 1K but that was enough to considerably slow down my computer and anyway 1K obs might very well be more than what you have to work with in the real world when calculating Gini's). Then, for each economy (each set of 1K draws) we calculate the Gini, per capita income and some other statistics. Then we look at the relationship between the Gini and per capita income. And we see something like this...




(mu=10.3, sigma=.7, implying on average median income of about 29K, per capita income of 38K and an average Gini of .378)

or something like this...



(mu=10.15, sigma=.8307; I was trying to calibrate these roughly to US and European data but admittedly there's so many measurement issues here that's pretty hard to actually know what the median income of ... individuals?, households?, workers? ... is).

So. What we end up getting is a POSITIVE relationship between inequality and per capita income EVEN THOUGH we assumed that all the economies had the same "underlying economic structure". What "randomness" produces here is not a lack of a relationship but a fairly strong positive one. By this interpretation it could very well be pure luck that US ended up with higher per capita income and higher inequality than Europe, rather there being any actual "structural" differences between them.

But what is the intuition for these results? Well, it's Bill Gates. With a log normal distribution (or any left skewed one) once in a while, PURELY BY CHANCE, you will get a Bill Gates in your economy. A really really rich person out in the right tail end of the distribution. His appearance is purely random, but given enough economies to observe we should observe Bill Gates' in at least some of them.

What happens when a Bill Gates (in real life, a few thousand of them) appears? Well, a very very rich person has the effect of pulling up the average income in an economy but leaving the median income unchanged. And with a log normal distribution the resulting Gini is in large part a measure of the difference between the median and the average. In fact if we regress the ginis from out simulations on the means and the medians (or their ratio) we can explain 2/3 of the variation in the gini:



If instead the data generating process which throws up the initial incomes was symmetric (or close to it) then for every Bill Gates, on average, there'd be an anti-Bill Gates, a really really poor person in the left tail of the distribution to offset the effect of the other on the mean (more precisely there'd be on average same amounts of really really rich and really really poor). In that case you WOULD actually expect no relationship between income and gini for similar economies. But if the individual incomes comes from a skewed distribution then you will get a positive relationship.

Ok, so what does it mean and is it plausible? Well, unless I'm missing something, first thing it means is that there's plenty of reason to be suspicious of all them studies which regress ginis on incomes to uncover relationships between inequality and development. For the topic at hand it could mean that there isn't that much *real* difference between US and EU. By pure chance, in US a few hundred more Bill Gates appeared than in EU and as a result both per capita income and level of inequality got pulled up.

We have this tendency to look for "structural" explanations when confronted with patterns in the data. And so we start talking about the "Anglo-Saxon model" or "European social democracy" but we forget that these patterns could be purely random. Add to that the fact that usually these kind of debates serve ideological and political purposes and it's easy to see why folks are so quick to jump to conclusions. But the above would actually be bad news for both Euro-bashers - lower incomes in Europe is just a matter of luck - and US-haters - higher inequality in US would also not be a result of some rat race soulless system (at least not any less soulless than the European one) but again, an artifact of random chance.

And once you pause and think about it this way you realize that the European economies and the "Anglo-Saxon" ones have way way more in common with each other than they do with all the other economies in the world. They're all rich, essentially capitalist (don't kid yourself, European economies are capitalist, Scandinavia included) economies and hence any real differences between them are bound to be minimal.

(Note also that the above explanation is pretty inline with the paper and comments cited at Marginal Revolution linked to above)