Seven Things Netflix’s ‘The Great Hack’ Gets Wrong About the Facebook–Cambridge Analytica Data Scandal

Alec Stapp

5 years ago

And if David finds out the data beneath his profile, you’ll start to be able to connect the dots in various ways with Facebook and Cambridge Analytica and Trump and Brexit and all these loosely-connected entities. Because you get to see inside the beast, you get to see inside the system.

This excerpt from the beginning of Netflix’s The Great Hack shows the goal of the documentary: to provide one easy explanation for Brexit and the election of Trump, two of the most surprising electoral outcomes in recent history.

Unfortunately, in attempting to tell a simple narrative, the documentary obscures more than it reveals about what actually happened in the Facebook-Cambridge Analytica data scandal. In the process, the film wildly overstates the significance of the scandal in either the 2016 US presidential election or the 2016 UK referendum on leaving the EU.

In this article, I will review the background of the case and show seven things the documentary gets wrong about the Facebook-Cambridge Analytica data scandal.

Background

In 2013, researchers published a paper showing that you could predict some personality traits — openness and extraversion — from an individual’s Facebook Likes. Cambridge Analytica wanted to use Facebook data to create a “psychographic” profile — i.e., personality type — of each voter and then micro-target them with political messages tailored to their personality type, ultimately with the hope of persuading them to vote for Cambridge Analytica’s client (or at least to not vote for the opposing candidate).

In this case, the psychographic profile is the person’s Big Five (or OCEAN) personality traits, which research has shown are relatively stable throughout our lives:

Openness to new experiences
Conscientiousness
Extroversion
Agreeableness
Neuroticism

But how to get the Facebook data to create these profiles? A researcher at Cambridge University, Alex Kogan, created an app called thisismydigitallife, a short quiz for determining your personality type. Between 250,000 and 270,000 people were paid a small amount of money to take this quiz.

Those who took the quiz shared some of their own Facebook data as well as their friends’ data (so long as the friends’ privacy settings allowed third-party app developers to access their data).

This process captured data on “at least 30 million identifiable U.S. consumers”, according to the FTC. For context, even if we assume all 30 million were registered voters, that means the data could be used to create profiles for less than 20 percent of the relevant population. And though some may disagree with Facebook’s policy for sharing user data with third-party developers, collecting data in this manner was in compliance with Facebook’s terms of service at the time.

What crossed the line was what happened next. Kogan then sold that data to Cambridge Analytica, without the consent of the affected Facebook users and in express violation of Facebook’s prohibition on selling Facebook data between third and fourth parties.

Upon learning of the sale, Facebook directed Alex Kogan and Cambridge Analytica to delete the data. But the social media company failed to notify users that their data had been misused or confirm via an independent audit that the data was actually deleted.

1. Cambridge Analytica was selling snake oil (no, you are not easily manipulated)

There’s a line in The Great Hack that sums up the opinion of the filmmakers and the subjects in their story: “There’s 2.1 billion people, each with their own reality. And once everybody has their own reality, it’s relatively easy to manipulate them.” According to the latest research from political science, this is completely bogus (and it’s the same marketing puffery that Cambridge Analytica would pitch to prospective clients).

The best evidence in this area comes from Joshua Kalla and David E. Broockman in a 2018 study published by American Political Science Review:

We argue that the best estimate of the effects of campaign contact and advertising on Americans’ candidates choices in general elections is zero. First, a systematic meta-analysis of 40 field experiments estimates an average effect of zero in general elections. Second, we present nine original field experiments that increase the statistical evidence in the literature about the persuasive effects of personal contact 10-fold. These experiments’ average effect is also zero.

In other words, a meta-analysis covering 49 high-quality field experiments found that in US general elections, advertising has zero effect on the outcome. (However, there is evidence “campaigns are able to have meaningful persuasive effects in primary and ballot measure campaigns, when partisan cues are not present.”)

But the relevant conclusion for the Cambridge Analytica scandal remains the same: in highly visible elections with a polarized electorate, it simply isn’t that easy to persuade voters to change their minds.

2. Micro-targeting political messages is overrated — people prefer general messages on shared beliefs

But maybe Cambridge Analytica’s micro-targeting strategy would result in above-average effects? The literature provides reason for skepticism here as well. Another paper by Eitan D. Hersh and Brian F. Schaffner in The Journal of Politics found that voters “rarely prefer targeted pandering to general messages” and “seem to prefer being solicited based on broad principles and collective beliefs.” It’s political tribalism all the way down.

A field experiment with 56,000 Wisconsin voters in the 2008 US presidential election found that “persuasive appeals possibly reduced candidate support and almost certainly did not increase it,” suggesting that “contact by a political campaign can engender a backlash.”

3. Big Five personality traits are not very useful for predicting political orientation

Or maybe there’s something special about targeting political messages based on a person’s Big Five personality traits? Again, there is little reason to believe this is the case. As Kris-Stella Trump mentions in an article for The Washington Post,

The ‘Big 5’ personality traits … only predict about 5 percent of the variation in individuals’ political orientations. Even accurate personality data would only add very little useful information to a data set that includes people’s partisanship — which is what most campaigns already work with.

The best evidence we have on the importance of personality traits on decision-making comes from the marketing literature (n.b., it’s likely easier to influence consumer decisions than political decisions in today’s increasingly polarized electorate). Here too the evidence is weak:

In this successful study, researchers targeted ads, based on personality, to more than 1.5 million people; the result was about 100 additional purchases of beauty products than had they advertised without targeting.

More to the point, the Facebook data obtained by Cambridge Analytica couldn’t even accomplish the simple task of matching Facebook Likes to the Big Five personality traits. Here’s Cambridge University researcher Alex Kogan in Michael Lewis’s podcast episode about the scandal:

We started asking the question of like, well, how often are we right? And so there’s five personality dimensions? And we said like, okay, for what percentage of people do we get all five personality categories correct? We found it was like 1%.

Eitan Hersh, an associate professor of political science at Tufts University, summed it up best: “Every claim about psychographics etc made by or about [Cambridge Analytica] is BS.”

4. If Cambridge Analytica’s “weapons-grade communications techniques” were so powerful, then Ted Cruz would be president

The Great Hack:

Ted Cruz went from the lowest rated candidate in the primaries to being the last man standing before Trump got the nomination… Everyone said Ted Cruz had this amazing ground game, and now we know who came up with all of it. Joining me now, Alexander Nix, CEO of Cambridge Analytica, the company behind it all.

Reporting by Nicholas Confessore and Danny Hakim at The New York Times directly contradicts this framing on Cambridge Analytica’s role in the 2016 Republican presidential primary:

Cambridge’s psychographic models proved unreliable in the Cruz presidential campaign, according to Rick Tyler, a former Cruz aide, and another consultant involved in the campaign. In one early test, more than half the Oklahoma voters whom Cambridge had identified as Cruz supporters actually favored other candidates.

Most significantly, the Cruz campaign stopped using Cambridge Analytica’s services in February 2016 due to disappointing results, as Kenneth P. Vogel and Darren Samuelsohn reported in Politico in June of that year:

Cruz’s data operation, which was seen as the class of the GOP primary field, was disappointed in Cambridge Analytica’s services and stopped using them before the Nevada GOP caucuses in late February, according to a former staffer for the Texas Republican.
“There’s this idea that there’s a magic sauce of personality targeting that can overcome any issue, and the fact is that’s just not the case,” said the former staffer, adding that Cambridge “doesn’t have a level of understanding or experience that allows them to target American voters.”

Vogel later tweeted that most firms hired Cambridge Analytica “because it was seen as a prerequisite for receiving $$$ from the MERCERS.” So it seems campaigns hired Cambridge Analytica not for its “weapons-grade communications techniques” but for the firm’s connections to billionaire Robert Mercer.

5. The Trump campaign phased out Cambridge Analytica data in favor of RNC data for the general election

Just as the Cruz campaign became disillusioned after working with Cambridge Analytica during the primary, so too did the Trump campaign during the general election, as Major Garrett reported for CBS News:

The crucial decision was made in late September or early October when Mr. Trump’s son-in-law Jared Kushner and Brad Parscale, Mr. Trump’s digital guru on the 2016 campaign, decided to utilize just the RNC data for the general election and used nothing from that point from Cambridge Analytica or any other data vendor. The Trump campaign had tested the RNC data, and it proved to be vastly more accurate than Cambridge Analytica’s, and when it was clear the RNC would be a willing partner, Mr. Trump’s campaign was able to rely solely on the RNC.

And of the little work Cambridge Analytica did complete for the Trump campaign, none involved “psychographics,” The New York Times reported:

Mr. Bannon at one point agreed to expand the company’s role, according to the aides, authorizing Cambridge to oversee a $5 million purchase of television ads. But after some of them appeared on cable channels in Washington, D.C. — hardly an election battleground — Cambridge’s involvement in television targeting ended.
…
Trump aides … said Cambridge had played a relatively modest role, providing personnel who worked alongside other analytics vendors on some early digital advertising and using conventional micro-targeting techniques. Later in the campaign, Cambridge also helped set up Mr. Trump’s polling operation and build turnout models used to guide the candidate’s spending and travel schedule. None of those efforts involved psychographics.

6. There is no evidence that Facebook data was used in the Brexit referendum

Last year, the UK’s data protection authority fined Facebook £500,000 — the maximum penalty allowed under the law — for violations related to the Cambridge Analytica data scandal. The fine was astonishing considering that the investigation of Cambridge Analytica’s licensed data derived from Facebook “found no evidence that UK citizens were among them,” according to the BBC. This detail demolishes the second central claim of The Great Hack, that data fraudulently acquired from Facebook users enabled Cambridge Analytica to manipulate the British people into voting for Brexit. On this basis, Facebook is currently appealing the fine.

7. The Great Hack wasn’t a “hack” at all

The title of the film is an odd choice given the facts of the case, as detailed in the background section of this article. A “hack” is generally understood as an unauthorized breach of a computer system or network by a malicious actor. People think of a genius black hat programmer who overcomes a company’s cybersecurity defenses to profit off stolen data. Alex Kogan, the Cambridge University researcher who acquired the Facebook data for Cambridge Analytica, was nothing of the sort.

As Gus Hurwitz noted in an article last year, Kogan entered into a contract with Facebook and asked users for their permission to acquire their data by using the thisismydigitallife personality app. Arguably, if there was a breach of trust, it was when the app users chose to share their friends’ data, too. The editorial choice to call this a “hack” instead of “data collection” or “data scraping” is of a piece with the rest of the film; when given a choice between accuracy and sensationalism, the directors generally chose the latter.

Why does this narrative persist despite the facts of the case?

The takeaway from the documentary is that Cambridge Analytica hacked Facebook and subsequently undermined two democratic processes: the Brexit referendum and the 2016 US presidential election. The reason this narrative has stuck in the public consciousness is that it serves everyone’s self-interest (except, of course, Facebook’s).

It lets voters off the hook for what seem, to many, to be drastic mistakes (i.e., electing a reality TV star president and undoing the European project). If we were all manipulated into making the “wrong” decision, then the consequences can’t be our fault!

This narrative also serves Cambridge Analytica, to a point. For a time, the political consultant liked being able to tell prospective clients that it was the mastermind behind two stunning political upsets. Lastly, journalists like the story because they compete with Facebook in the advertising market and view the tech giant as an existential threat.

There is no evidence for the film’s implicit assumption that, but for Cambridge Analytica’s use of Facebook data to target voters, Trump wouldn’t have been elected and the UK wouldn’t have voted to leave the EU. Despite its tone and ominous presentation style, The Great Hack fails to muster any support for its extreme claims. The truth is much more mundane: the Facebook-Cambridge Analytica data scandal was neither a “hack” nor was it “great” in historical importance.

The documentary ends with a question:

But the hardest part in all of this is that these wreckage sites and crippling divisions begin with the manipulation of one individual. Then another. And another. So, I can’t help but ask myself: Can I be manipulated? Can you?

No — but the directors of The Great Hack tried their best to do so.