Soylent Analytica: The Graph is too Damn Open

Gus Hurwitz —  21 March 2018

The world discovered something this past weekend that the world had already known: that what you say on the Internet stays on the Internet, spread intractably and untraceably through the tendrils of social media. I refer, of course, to the Cambridge Analytica/Facebook SNAFU (or just Situation Normal): the disclosure that Cambridge Analytica, a company used for election analytics by the Trump campaign, breached a contract with Facebook in order to unauthorizedly collect information on 50 million Facebook users. Since the news broke, Facebook’s stock is off by about 10 percent, Cambridge Analytica is almost certainly a doomed company, the FTC has started investigating both, private suits against Facebook are already being filed, the Europeans are investigating as well, and Cambridge Analytica is now being blamed for Brexit.

That is all fine and well, and we will be discussing this situation and its fallout for years to come. I want to write about a couple of other aspects of the story: the culpability of 270,000 Facebook users in disclosing the data of 50 million of their peers, and what this situation tells us about evergreen proposals to “open up the social graph” by making users’ social media content portable.

I Have Seen the Enemy and the Enemy is Us

Most discussion of Cambridge Analytica’s use of Facebook data has focused on the large number of user records Cambridge Analytica obtained access to – 50 million – and the fact that it obtained these records through some problematic means (and Cambridge Analytica pretty clearly breached contracts and acted deceptively to obtain these records). But one needs to dig a deeper to understand the mechanics of what actually happened. Once one does this, the story becomes both less remarkable and more interesting.

(For purposes of this discussion, I refer to Cambridge Analytica as the actor that obtained the records. It’s actually a little more complicated: Cambridge Analytica worked with an academic researcher to obtain these records. That researcher was given permission by Facebook to work with and obtain data on users for purposes relating to his research. But he exceeded that scope of authority, sharing the data that he collected with CA.)

The 50 million users’ records that Cambridge Analytica obtained access to were given to Cambridge Analytica by about 200,000 individual Facebook users. Those 270,000 users become involved with Cambridge Analytica by participating in an online quiz – one of those fun little throwaway quizzes that periodically get some attention on Facebook and other platforms. As part of taking that quiz, those 270,000 users agreed to grant Cambridge Analytica access to their profile information, including information available through their profile about their friends.

This general practice is reasonably well known. Any time a quiz or game like this has its moment on Facebook it is also accompanied by discussion of how the quiz or game is likely being used to harvest data about users. The terms of use of these quizzes and games almost always disclose that such information is being collected. More telling, any time a user posts a link to one of these quizzes or games, some friend will will invariably leave a comment warning about these terms of service and of these data harvesting practices.

There are two remarkable things about this. The first remarkable thing is that there is almost nothing remarkable about the fact that Cambridge Analytica obtained this information. A hundred such data harvesting efforts have preceded Cambridge Analytica; and a hundred more will follow it. The only remarkable things about the present story is that Cambridge Analytica was an election analytics firm working for Donald Trump – never mind that by all accounts the data collected proved to be of limited use generally in elections or that when Cambridge Analytica started working for the Trump campaign they were tasked with more mundane work that didn’t make use of this data.

More remarkable is that Cambridge Analytica didn’t really obtain data about 50 million individuals from Facebook, or from a Facebook quiz. Cambridge Analytica obtained this data from those 50 million individuals’ friends.

There are unquestionably important questions to be asked about the role of Facebook in giving users better control over, or ability to track uses of, their information. And there are questions about the use of contracts such as that between Facebook and Cambridge Analytica to control how data like this is handled. But this discussion will not be complete unless and until we also understand the roles and responsibilities of individual users in managing and respecting the privacy of their friends.

Fundamentally, we lack a clear and easy way to delineate privacy rights. If I share with my friends that I participated in a political rally, that I attended a concert, that I like certain activities, that I engage in certain illegal activities, what rights do I have to control how they subsequently share that information? The answer in the physical world, in the American tradition, is none – at least, unless I take affirmative steps to establish such a right prior to disclosing that information.

The answer is the same in the online world, as well – though platforms have substantial ability to alter this if they so desire. For instance, Facebook could change the design of its system to prohibit users from sharing information about their friends with third parties. (Indeed, this is something that most privacy advocates think social media platforms should do.) But such a “solution” to the delineation problem has its own problems. It assumes that the platform is the appropriate arbiter of privacy rights – a perhaps questionable assumption given platforms’ history of getting things wrong when it comes to privacy. More trenchant, it raises questions about users’ ability to delineate or allocate their privacy differently than allowed by the platforms, particularly where a given platform may not allow the delineation or allocation of rights that users prefer.

The Badness of the Open Graph Idea

One of the standard responses to concerns about how platforms may delineate and allow users to allocate their privacy interests is, on the one hand, that competition among platforms would promote desirable outcomes and that, on the other hand, the relatively limited and monopolistic competition that we see among firms like Facebook is one of the reasons that consumers today have relatively poor control over their information.

The nature of competition in markets such as these, including whether and how to promote more of it, is a perennial and difficult topic. The network effects inherent in markets like these suggest that promoting competition may in fact not improve consumer outcomes, for instance. Competition could push firms to less consumer-friendly privacy positions if that allows better monetization and competitive advantages. And the simple fact that Facebook has lost 10% of its value following the Cambridge Analytica news suggests that there are real market constraints on how Facebook operates.

But placing those issues to the side for now, the situation with Cambridge Analytica offers an important cautionary tale about one of the perennial proposals for how to promote competition between social media platforms: “opening up the social graph.” The basic idea of these proposals is to make it easier for users of these platforms to migrate between platforms or to use the features of different platforms through data portability and interoperability. Specific proposals have taken various forms over the years, but generally they would require firms like Facebook to either make users’ data exportable in a standardized form so that users could easily migrate it to other platforms or to adopt a standardized API that would allow other platforms to interoperate with data stored on the Facebook platform.

In other words, proposals to “open the social graph” are proposals to make it easier to export massive volumes of Facebook user data to third parties at efficient scale.

If there is one lesson from the past decade that is more trenchant than that delineation privacy rights is difficult it is that data security is even harder.

These last two points do not sum together well. The easier that Facebook makes it for its users’ data to be exported at scale, the easier Facebook makes it for its users’ data to be exfiltrated at scale. Despite its myriad problems, Cambridge Analytica at least was operating within a contractual framework with Facebook – it was a known party. Creating external API for exporting Facebook data makes it easier for unknown third-parties to anonymously obtain user information. Indeed, even if the API only works to allow trusted third parties to to obtain such information, the problem of keeping that data secured against subsequent exfiltration multiplies with each third party that is allowed access to that data.

One response to Soylent Analytica: The Graph is too Damn Open

  1. 
    J. S. Greenfield 23 March 2018 at 9:42 pm

    I largely agree with many of the thoughts expressed here, but I notably disagree with one of the conclusions. By making the leap to condemning proposals to make the social graph exportable, you conflate the idea of exporting friend relationships with the idea of exporting all user information about those friends. But there is absolutely no reason this need be the case. Exporting the social graph could potentially be as simple as exporting a list of friends, and the most basic identification/communication data, such as email address and/or phone number. Such an approach (under the control of users electing to export their friend relationships to a new platform) could have the potential to enable new platforms to more easily reach critical mass (just as some platforms successfully used social graph exporting from established platforms to do exactly that, in the past) without significantly impinging upon reasonable privacy expectations.

    This is simply not the all-or-nothing proposition that you seem to be implicitly presuming by your argument.