Friendship Paradox

Published by Elias Wirth on

Your friends are more popular than you. In the year 1991, the sociologist Scott L. Feld made an interesting discovery. He realized that on average, most individual’s friends have more friends than the individual. This phenomenon is called the friendship paradox. [1] In this article, we describe the friendship paradox using graph theory, provide a mathematical proof, and give examples for possible applications. Furthermore, we talk about its applications in monitoring disease outbreaks.

Paradox-free friendship 

Before we introduce the ideas from graph theory, we should talk about the definition of friendship. Ideally, friendship is a two-way street. This means that if person 1 is friends with person 2, then person 2 is also friends with person 1. This description of friendship is certainly far from perfect. No one can deny that there are people who think they are friends with others but the feeling is in fact not mutual. Although this is certainly a valid concern, we are not here to philosophize but to model friendships. Therefore, we run with the idealized version of mutual friendships for our model.

Modeling friendship with graph theory

Let us consider two individuals, denoted with 1 and 2, respectively. We can represent them with circles, called vertices in graph theory.

The graphic depicts a graph containing two unconnected vertices.
Figure 1. The graphic depicts a graph containing two unconnected vertices. [2]

At the moment, person 1 and person 2 in Figure 1 are not yet friends. This is represented by the lack of a connection between the two vertices.

Let us now assume that the two have gotten to know each other a little better and have actually become friends. We can represent the blossoming friendship between the two with a connection, called edge in graph theory.

The graphic depicts a graph containing two connected vertices.
Figure 2. The graphic depicts a graph containing two connected vertices. [2]

Since Scott’s claim is always true for a network containing only two connected vertices, e.g. Figure 2, we are interested in more complex networks.

Let us increase the size of the network by adding more vertices and edges to the graph. We obtain the representation of a social network as seen in Figure 3.

The graphic depicts a graph containing multiple vertices and edges.
Figure 3. The graphic depicts a graph containing multiple vertices and edges. [2]

The claim that in a social network on average, an individual’s friends have more friends than the individual is no longer obvious. We can, however, translate the claim into graph theory. The social network that we want to model is then called an undirected graph  \(G=(V, \ E),\) where \(V\) is the set of vertices and \(E\) is the set of edges in the graph. We note that \(V\) represents the set of all individuals in the social network and \(E\) represents the set of all friendships in the social network. In Figure 3,

\[|V|=5,\]

since there are exactly 5 vertices, and

\[|E|=7,\]

since there are exactly 7 edges connecting the vertices. Additionally, we denote the number of friendships for a given node or vertex \(v\) by \(d(v)\), called the degree of a vertex in graph theory. If, for example, we consider node 1 in Figure 3, we note that \(d(1)=4\).

Average number of friends

The average number of friends that a random person in the graph has, \(\mu,\) is

\[\mu=\frac{\sum_{v \in V}d(v)}{|V|}.\]

Since \(\sum_{v \in V}d(v)\) counts all the friendships twice, 

\[\mu=\frac{\sum_{v \in V}d(v)}{|V|}=\frac{2|E|}{|V|}.\]

Average number of friends of friends

But how do we determine the average number of friends that a friend of a random person has? We can answer this question if we let each node in Figure 3 create a record for each of its friends that states the number of friends of that friend. Consider node 1. It creates \( d(1)=4\) records:

  • Node 2 has \(d(2)=3\) friends.
  • Node 3 has \(d(3)=2\) friends.
  • Node 4 has \(d(4)=3\) friends.
  • Node 5 has \(d(5)=2\) friends.

Another node, say node 5 creates \(d(5)=2\) records:

  • Node 1 has \(d(1)=4\) friends.
  • Node 2 has \(d(2)=3\) friends.

Every single node in the network creates these records for all of its friends. In total, we obtain

\[ \sum_{v\in V}d(v)\]

records. The average number of friends that node \(v\)’s friends have is the average number of friends written on the records created by \(v\). The average number of friends that a random node’s friends have is thus the average number of friends on these records.  Thus, the sum of the number of friends on all records divided by the number of records created is then the average number of friends that an individual’s friend has. But what is the sum of the number of friends on all records?

In total, there are \(d(v)\) records that state how many friends \(v\) has, because every friend of \(v\) creates one of these records. Each of those records states that \(v\) has \(d(v)\) friends. Thus, every node \(v\) contributes \(d(v)^2\) to the number of friends on all records.  Hence, the sum over all nodes yields

\[ \frac {\sum_{v \in V}d(v)^2}{\sum_{v \in V}d(v)}\]

for the average number of friends of friends in the network. With help of the variance

\[\sigma^2=\frac{1}{|V|}\sum_{v \in V} (d(v)-\mu)^2\]

we are able to rewrite the equation above. Consider

\[ \begin{align} \mu+\frac{\sigma^2}{\mu} & = \frac{\sum_{v \in V}d(v)}{|V|}+\frac{\sum_{v \in V}\big( d(v)-\frac{1}{|V|}\sum_{v \in V}d(v)\big)^2}{\sum_{v \in V}d(v)}\\ &=\frac{\sum_{v \in V}d(v)}{|V|}+\frac{\sum_{v \in V}d(v)^2}{\sum_{v \in V}d(v)}-\frac{2}{|V|}\frac{(\sum_{v \in V}d(v))^2}{\sum_{v \in V}d(v)}+\frac{|V|}{|V|^2}\frac{(\sum_{v \in V}d(v))^2}{\sum_{v \in V}d(v)}\\ &= \frac {\sum_{v \in V}d(v)^2}{\sum_{v \in V}d(v)}. \end{align} \]

This is precisely the average number of friends of friends in the network. Since \(\mu\) and \(\sigma\) are positive numbers, 

\[\mu \leq \mu+\frac{\sigma^2}{\mu} = \frac {\sum_{v \in V}d(v)^2}{\sum_{v \in V}d(v)}.\]

This proves the correctness of the friendship paradox. On average an individual’s friends have more friends than the individual.

Applications of the friendship paradox

The friendship paradox does not only exist to make people feel less popular than others. It does, in fact, have useful applications in medicine, particularly in monitoring disease outbreaks. According to a study conducted by Christakis, friends of a random group of college students got sick earlier than the random group. Thus, by applying the friendship paradox and observing the friends of random people it might be possible to recognize contagion outbreaks earlier. [3]

Further reading

There is a follow-up article titled Inverse Friendship Paradox where we discuss the application of the friendship paradox to the complement graph.


References

[1] Feld, S. (1991). Why Your Friends Have More Friends Than You Do. American Journal of Sociology,96(6), 1464-1477. Retrieved 17:21, September 1, 2018, from http://www.jstor.org/stable/2781907 

[2] The graphs are created using the graph editor tool from csacademyhttps://csacademy.com/app/graph_editor/ 

[3] Christakis NA, Fowler JH (2010) Social Network Sensors for Early Detection of Contagious Outbreaks. PLoS ONE 5(9): e12948. Retrieved 17:15, September 1, 2018, from https://doi.org/10.1371/journal.pone.0012948 


Elias Wirth

The Math Section is my personal website dedicated to applications of mathematics in everyday life. The intention behind this project is to make mathematics more approachable to the public while staying mathematically rigorous. I have a Bachelor's degree in Mathematics from the University of Berne and am currently working on my Master's degree in Mathematics at the ETH Zurich.

Leave a Reply