An analysis of the biasing in my friendship selection process

An analysis of the biasing in my friendship selection process

On the 2nd of November, in Romania there were the pres­i­den­tial elections. It was the first time for me that I could vote for this, because last time I wasn't 18 years old yet. Maybe I simply didn't pay attention to this before, but now it seemed to me that way more people in my circles have been pre­oc­cu­pied with the politics sur­round­ing these elections. I barely saw any posts about the Eu­ropar­la­men­tary or the mayoral elections (for which I could vote), but now my Facebook was full of people cam­paign­ing for one candidate or the other, posting reviews, analyzing and spec­u­lat­ing on how the elections will turn out. The two names that came up most often (in a positive way) were Klaus Iohannis and Monica Macovei (Ponta also came up, but most of the time surrounded by some bad mouthing).

I guessed that my Facebook friend list is probably biased and it won't represent the accuracy, but I was surprised by how much it actually differed, after the official results were published from the first round.

I first did a very naive and crude analysis of my friends on Facebook, by looking how many likes had each candidate from my friends and taking that as the number of votes they would get. It is a very crude ap­prox­i­ma­tion, I know that some people liked a candidate only to get to follow them and get their latest updates, without intending to vote, probably some people liked two or more candidates and many other holes can be poked in my little theory, but for the purposes of my blog it will do just fine.

Candidat Official Friends FB count Percentage Difference
Mirel Amariţei 0.08% 0.00% 0 0.08%
Constantin Rotaru 0.30% 0.00% 0 0.30%
William Brînză 0.45% 0.79% 1 -0.34%
Gheorghe Funar 0.47% 0.00% 0 0.47%
Zsolt Szilágyi 0.56% 0.00% 0 0.56%
Teodor Meleşcanu 1.09% 0.00% 0 1.09%
Elena Udrea 5.20% 6.30% 8 -1.10%
Corneliu Vadim-Tudor 3.68% 0.79% 1 2.89%
Hunor Kelemen 3.47% 0.00% 0 3.47%
Dan Diaconescu 4.03% 0.00% 0 4.03%
Popescu Tariceanu 5.36% 0.79% 1 4.57%
Klaus Iohannis 30.47% 51.97% 66 -21.50%
Monica Macovei 4.44% 29.92% 38 -25.48%
Victor Ponta 40.44% 9.45% 12 30.99%

127 of my friends "voted" on Facebook and thus I obtained the table above.

As you can see, there are only three big dif­fer­ences in the results: Ponta, Iohannis and Macovei. The others are all under 5%. The biggest surprise for me was Monica Macovei. I knew she would be less popular in the offline world, but I did not expect her to get less then 5%. The other kinda surprise was how popular is Ponta. I know that his target audience is exactly Facebook-savvy, but still, it is a bit saddening to see that there are many people who believe his promises.

But enough of just looking at the numbers, lets use math on them.

First, I want to calculate a "distance" between my friends and the general population. I'm going to consider a 14-di­men­sion­al space, where each candidate has one dimension and the value each one has obtained, as a percentage, to be the value in that dimension. I will obtain two points in this high di­men­sion­al space: one from using the official results and one from my friends. The distance will then be the Euclidean distance between the two points.

$$ d(FB, OFF)  = \sqrt{ \sum_{i=1}^{14}(FB_i-OFF_i)^2} = 0.4618 $$

The maximal possible distance we could get in this unit cube is $ sqrt(14) \approx 3.741 $, this being the distance between the point in origin $ (0, 0, 0, ... ,0) $ and the unit point $ (1, 1, 1, ..., 1) $. The actual maximum distance is a bit smaller, because we can't have a point in corners, because the sum of the values for each dimensions has to add up to 1. The furthest away a result can be from origin is 1, if one candidate wins absolutely. Any other com­bi­na­tion of votes results in a point that is closer to the origin. Two points under these cir­cum­stances are furthest if they have each a different candidate winning with 100%, in this case the distance between them is $ sqrt(2) \approx 1.414 $.

So my friends are $ 0.4618/1.414 = 0.3265 $ different from the general population. That's... a big difference. Un­for­tu­nate­ly, we all have to have the same president.

Another thing I want to look at is how does knowing that someone is my friend change the prob­a­bil­i­ty of how they voted, using Bayes' theorem.

$$ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} $$

Lets consider A to be the set of people who vote for candidate X and B the set of my friends. In this case $ P(A) $ is simply the percentage of votes a candidate got. $ P(A|B) $ can be calculated just as easily, because it is the percentage of my Facebook friends who like the given candidate. The support given by B (knowing that someone is my friend) is then the ratio between the two:

$$ \frac{P(A|B)}{P(B)} $$

By applying this to every one we get the following results:

Candidat Ratios
Monica Macovei 674%
William Brînză 175%
Klaus Iohannis 171%
Elena Udrea 121%
Victor Ponta 23%
Corneliu Vadim-Tudor 21%
Popescu Tariceanu 15%
Zsolt Szilágyi 0%
Teodor Meleşcanu 0%
Mirel Amariţei 0%
Hunor Kelemen 0%
Gheorghe Funar 0%
Dan Diaconescu 0%
Constantin Rotaru 0%

If you are my friend, the chances of you voting for William Brînză are 175% higher. Who is this guy? I never heard of him....

Also, we can see that Monica Macovei is far, far, far, far more popular among my friends than in the general population, so I had a huge bias (674%) to over­es­ti­mate her results. Iohannis had more moderate results, but that may be because he is actually quite popular in Romania, at least in Tran­syl­va­nia, where he did win the majority in almost every county. Elena Udrea is the only other candidate who was more favored by my friends then by the average romanian, all the others being more popular in the "real world".

The main conclusion I see from this is that I live in "bubble". I already knew about the al­go­rith­mic filtering bubble, but with that I agree, I want Google/Facebook/others to show me custom tailored content, the things that I will likely find interesing  and not the random baby pictures my friends put up on Facebook, or show me results for a pro­gram­ming language I know when I search for a general function, not C. This analysis I did removed pretty much all the al­go­rith­mic filtering part, but still the dis­crep­an­cy between my friends and Romania was quite large. And this bubble is not a machine made one, but one made by me, and it suggests that I really should pay more attention to other views, to other opinions, because there is a whole other world there that thinks dif­fer­ent­ly from me. I cannot just ignore that and say "Oh, they are stupid/old/uneducated/living off government aid", but I have to put myself in their shoes and maybe even find a way to help them.

And now, in two days time, the second round of election will be over, the votes will have been counted and we will have a new president. I'll refrain from trying to predict the results, but even if the candidate I'm voting for won't win, I'm glad that apparently people are starting to wake up in Romania. Our national anthem starts with the verse "Wake up, Romanian" and it seems that it is actually happening, at least among the younger generation, who are getting more and more involved in what is going on in their country.