We explored the implications of both the Friendship Paradox and Benford’s Law in the realms of Twitter.
The Friendship Paradox:
Most people when asked about their relative number of friends, will predict that they have more friends than their friends have. However, a mathematical phenomenon termed the “Friendship Paradox” disproves this. First discovered in 1991, the paradox essentially asserts that on average, your friends have more friends than you.
To examine the paradox on a real world level, we decided to enter the Twitter world and record the number of followers within 100 “chains” of Twitter users. To begin, we took the 5th most recent follower of @POTUS, or President Obama. We used @POTUS because at the time of our study, this account was the fastest growing account in twitter history. In addition, the President is followed by the widest range of account types and does necessarily appeal to a specific aspect of society. From this 5th most recent follower, we recorded the amount of “followers” and the amount of “following.” Then, we took the 5th most recent person this user had followed, and did the same. Each “chain” of users consisted of 5 accounts. The hope was that on average, the amount of followers would increase as the chain progressed to the 5th person.
Before we reveal the trends that were discovered from our data, let’s examine the paradox on a simpler level.
Situation: There are four people, named ‘A,’ ‘B,’ ‘C,’ and ‘D.’ The following is a visual representation of their friend group, with the lines indicating a friendship.
A has just 1 friend, B has 3 friends, C and D each has 2 friends. A has 3 friends of friends, B has 5 friends friends, C has 5 friends of friends, and D has 5 friends of friends.
For persons A, C, and D, the number of friends of friends is greater than the number of friends. For Person B, this is not the case as he represents a “popular” person, who is likely to have more friends than his friends have.
There are 8 total “friendships” in this situation, meaning the average person has 2 friends.
There are 18 total “friends of friends” among the 8 friendships meaning the average friend has 2.25 friends. This is the friendship paradox.
The math behind it:
If person i has a number xi friends and there are n amount of people, we must figure out what the average amount of friends is for the entire graph. To do this, we divide the total number of friends by the total number of people. So,
We need to calculate the total sum for this average, and so this means we only need to count the friends of i if we count the friends of the friends of i. Each of the friends of i will give the term xi to the total sum, so the holistic summation has (xi)(xi) = xi2. The numerator will be the sum of the squares of each friend’s friend, and the denominator will be the sum of the total number of friends. This gives us the mean:
Using the variance formula,
We can rearrange it into the following equation.
If we substitute this into the original equation we get…
This means that the average amount of friends is less than or equal to the average amount of friend’s friends.
Benford’s Law is a mathematical phenomenom named after physicist Frank Benford, although it was first discovered by Canadian Simon Newcomb. While looking through his book of log tables, Newcomb realized that the earlier pages of the book, (the pages that had numbers with a leading digit of 1), were of much greater quantity than the other pages.
Benford’s Formula: the probability of the leading digit being of a certain value can be described by
Probability (D) = log10 (d+1) – log10 (d) or Probability (D) = log10(1+(1/d))
Connection to enhance understanding:
Think of a pencil of length 1x. In order for the leading digit of the length of this pencil to become 2, the length must change by a factor of 100%. From here, for the leading digit of the length of this pencil to become 3, the length must change by a factor of 50%. And so on.
-Distribution of First digit (Meters)
-Daily volume of shares on NASDAQ
-Import/Export Volumes for sales of fish from USA
-Distribution of first digits of altitude in top 120,000 towns in the world
The majority of people who attempt to commit fraud in either tax returns, sales reports, or voting records are not aware of this phenomenon. Therefore, they attempt to “unsuspiciously” provide an array of numbers with a “random” distribution of leading digits. Data that does not adhere to Benford’s Law is viewed as suspicious and often leads to speculation over the legitimacy of the data.
Because we had recorded data on 1000 twitter accounts, we decided to see if the phenomenon applied to Twitter followers as well. As is indicated in the table below, the data generally followed a trend typical for data that adheres to Benford’s Law.
Thank you for reading!