🔬
Nosedive Analytics
Warning: This paper is work-in-progress.
Introduction
On 17th Jan, 2017, I submitted a post on Reddit's /r/blackmirror/ subreddit. The post is linked to a parody website I built in an afternoon: a websites that reads user's Facebook profile and calculates a "rating" for the user.
The episode is set in a world where people can rate each other from one to five stars for every interaction they have, and where one's rating impact your entire life.
— Wikipedia
Under the hood, the algorithm that generates the "rating" is based on the idea of a social media rating system from Black Mirror S03E01 "Nosedive".
It mimics two major characteristics featured in the show to best replicate the system in the show. Therefore, the "rating" has two parts, each implements one of the characteristics.
Two characteristics:
1.
The score is of 1-5 scale.
This feature is implemented by counting the total reactions of all the posts (as much as the API permits) on the user's timeline. It manipulates the count a bit by taking logarithm, and record it as naive_score . naive_score defines the first part of the "rating".
2.
Each rate is weighted so that people with higher ratings are more influential.
This feature is implemented by evaluating the "reactor"'s (the people who initiated the reaction on Facebook) popularity.
To make the app response fast enough, I assumed that people with more friends are generally more popular, thus more influential. Therefore, I took the "reactor"'s number of friends as the measurement of her popularity.
After some numeric manipulations, it is recorded as bonus in the database.
The final output of the algorithm are naive_score and bonus . Together they form the final_score : the sum of the two parts.
We found a positive relation between the popularity of the user herself, measured by naive_score , and the average popularity bonus of her friends, indirectly measured by bonus .
Figure 1: bonus plotted against naive_score .
We will mathematically breakdown naive_score and bonus to examine and substantiate the positive relation.
The Setup
Data
The user authorizes our app to access her data on Facebook, including name, gender, time zone, currency and most importantly, her timeline data as well as her network data. The following graph provides a visualization of the structure.
Figure 2: n denotes the total number of the reactions of all posts. n' is the number of friends of a particular friend, also known as the "reactor" to a particular post.
We were able to access all her public posts on her timeline and her friends reactions on her timeline. We are also able to get some basic stats of her friends, including the number of the friends her friends have.
The dataset looks like this:
There are 2491 data points in this dataset. n , bonus_raw , lpop , hypo_lpop , hypo_lpop_alt and phi were calculated afterwards as part of the analysis. I'll elaborate their meaning in a second.
Assumptions
1.
The more reaction the user gets, the more popular she is.
2.
The more friends a user has, the more popular she is.
Algorithm
Naive Score
naive_score measures the popularity of the user herself. Intuitively, the more reactions she gets on her timeline, the more popular she is.
To calculate it, the algorithm does the following:
For each post we observe on her timeline, we count its reactions and accumulate it. n is the total reactions we can retrieve, which is a decent measurement for user's popularity.
$\text{naive-score} \equiv \log_{10}(10n + 20) - 0.3$
def get_scores(self): posts = self.get_posts() reactions = [] for post in posts: reactions += self.get_reactions(post["id"]) self.reaction_count += len(reactions) print("Base score updated to:" + str(self.reaction_count)) self.get_bonus(reactions) n = self.reaction_count # The Base Score self.naive_score = log(n * 10 + 20, 10) - 0.3 # the naive_score self.final_score = self.naive_score + self.bonus print("The user's final score:" + str(self.final_score)) return self.naive_score, self.bonus, self.final_score
Python
and thus, to restore n for analysis purpose, we have:
$n=\frac{10^{\text{naive-score}+0.3}-20}{10}$
Bonus
The bonus seen in the dataset is calculated as follows:
For each reaction we observed in a user's timeline, we get the number of friends of the "reactor" (who made the reaction).
In order to mimic the second characteristics, we apply the following manipulation:
$\text{react-score}_i \equiv \log_{10}(20n'_i + 20) \qquad i \in \Omega$
where n' is the number of friends of the "reactor" of a particular reaction i and Omega is the set of all reactions.
$\text{bonus} \equiv {1\over2}\log_{10}[ (\sum_{i\in \Omega} \text{react-score}_i) + 1]$
def get_bonus(self, reactions): if reactions: for reaction in reactions: friend_score = get_naive_score(self.token, reaction["id"]) self.reactions_weights.append(friend_score / 5) self.bonus = log(sum(self.reactions_weights) + 1, 10) * 0.5 return self.bonus else: return 0 # The helper function def get_naive_score(token, user_id): friend = fb(token, user_id, "friends") try: n = friend["summary"]["total_count"] # Number of friends except KeyError: # Does not have permission to read n = 0 score = log(n * 20 + 20, 10) return score
Python
To restore bonus to a more appropriate scale for analysis, we introduce bonus_raw in the dataset, which is calculated as follows:
$\text{bonus-raw} \equiv 10^{2\text{bonus}} - 1 \approx |\Omega|\frac{\ln(20)}{\ln(10)} + \log_{10}(\prod_{i\in\Omega}n'_i)$
Note that by definition:
$\because |\Omega| \equiv n$
$\therefore \text{bonus-raw} \equiv 10^{2\text{bonus}} - 1 \approx n\frac{\ln(20)}{\ln(10)} + \log_{10}(\prod_{i\in\Omega}n'_i)$
See appendix #1 for math details.
Hypothesis
We believe that the more popular the user is, the more popular her friends are.
That is, bonus_raw average is positively related to n , which has a positive relationship with naive_score .
User j 's friends' popularity product:
$\text{popularity}_j \equiv \prod_{i\in\Omega_j}n'_i$
If the popularity is not related to n , then the expected popularity product is:
$\text{popularity}'_j \equiv (\bar{n'})^{|\Omega|} \equiv( \bar{n'})^{n}$
The constant n_prime_avg is:
$\bar{n'} \approx 7$
For convenience, take log on both side:
$\log_{10}(\text{popularity}_j) = \log_{10}(\prod_{i\in\Omega_j}n'_i) = \text{bonus-raw} - n\frac{\ln(20)}{\ln(10)}$
known as lpop in the dataset.
The hypothesized measurement of popularity:
$\log_{10}(\text{popularity}'_j) = \log_{10}[(\bar{n'})^{n}] = n \log_{10} \bar{n'}$
known as hypo_lpop in the dataset. hypo_lpop is the popularity of the users' friends, assuming there is no relationship between .
To test the hypothesis, we measure the difference between the lpop and hypo_lpop . It could be done by phi .
$\phi={\log(\text{popularity}_j) \over \log(\text{popularity}'_j)}$
Evidence
Here is our chart plotted by taking n as the x-axis and phi as the y-axis:
If, as our null hypothesis suggests, there is no relation between n and the popularity, all the data points, phi, should be close to 1.
As the chart shows, hypo_lpop always tend to over estimate the popularity in a consistent manner (the maxima for phi is around 0.86).
There is a visually obvious trend that the people with lower n have friends of less popularity.
The following is a 3d scatter plot that might be helpful. If you look straight down from the top, for those data points that fail to "catch up" with lpop as n increases, they generally have lower phi .
Hypothesis Testing
🚧 Work in Progress: formal hypothesis testing is on its way.
Regressions
🚧 Work in Progress: formal regressions are on their way.
Appendix
#1
Expend bonus :
$\text{bonus} = {1\over2}\log_{10}[ (\sum_{i\in \Omega} \log_{10}(20n'_i + 20)) + 1]$
Therefore, to restore the sum of n':
$\sum_{i\in \Omega} \log_{10}(20n'_i + 20) = 10^{2\text{bonus}} -1$
Equivalently:
$\log_{10}[20^{|\Omega|}\prod_{i\in\Omega}(n'_i + 1)] = 10^{2\text{bonus}} - 1$
Without loss of generality, and assuming everyone has at least 1 friend:
$\log_{10}[20^{|\Omega|}\prod_{i\in\Omega}n'_i] = 10^{2\text{bonus}} - 1$
Thus:
$|\Omega|\frac{\ln(20)}{\ln(10)} + \log_{10}(\prod_{i\in\Omega}n'_i) = 10^{2\text{bonus}} - 1$