Archives For

“Data is the new oil,” said Jaron Lanier in a recent op-ed for The New York Times. Lanier’s use of this metaphor is only the latest instance of what has become the dumbest meme in tech policy. As the digital economy becomes more prominent in our lives, it is not unreasonable to seek to understand one of its most important inputs. But this analogy to the physical economy is fundamentally flawed. Worse, introducing regulations premised upon faulty assumptions like this will likely do far more harm than good. Here are seven reasons why “data is the new oil” misses the mark:

1. Oil is rivalrous; data is non-rivalrous

If someone uses a barrel of oil, it can’t be consumed again. But, as Alan McQuinn, a senior policy analyst at the Information Technology and Innovation Foundation, noted, “when consumers ‘pay with data’ to access a website, they still have the same amount of data after the transaction as before. As a result, users have an infinite resource available to them to access free online services.” Imposing restrictions on data collection makes this infinite resource finite. 

2. Oil is excludable; data is non-excludable

Oil is highly excludable because, as a physical commodity, it can be stored in ways that prevent use by non-authorized parties. However, as my colleagues pointed out in a recent comment to the FTC: “While databases may be proprietary, the underlying data usually is not.” They go on to argue that this can lead to under-investment in data collection:

[C]ompanies that have acquired a valuable piece of data will struggle both to prevent their rivals from obtaining the same data as well as to derive competitive advantage from the data. For these reasons, it also  means that firms may well be more reluctant to invest in data generation than is socially optimal. In fact, to the extent this is true there is arguably more risk of companies under-investing in data  generation than of firms over-investing in order to create data troves with which to monopolize a market. This contrasts with oil, where complete excludability is the norm.

3. Oil is fungible; data is non-fungible

Oil is a commodity, so, by definition, one barrel of oil of a given grade is equivalent to any other barrel of that grade. Data, on the other hand, is heterogeneous. Each person’s data is unique and may consist of a practically unlimited number of different attributes that can be collected into a profile. This means that oil will follow the law of one price, while a dataset’s value will be highly contingent on its particular properties and commercialization potential.

4. Oil has positive marginal costs; data has zero marginal costs

There is a significant expense to producing and distributing an additional barrel of oil (as low as $5.49 per barrel in Saudi Arabia; as high as $21.66 in the U.K.). Data is merely encoded information (bits of 1s and 0s), so gathering, storing, and transferring it is nearly costless (though, to be clear, setting up systems for collecting and processing can be a large fixed cost). Under perfect competition, the market clearing price is equal to the marginal cost of production (hence why data is traded for free services and oil still requires cold, hard cash).

5. Oil is a search good; data is an experience good

Oil is a search good, meaning its value can be assessed prior to purchasing. By contrast, data tends to be an experience good because companies don’t know how much a new dataset is worth until it has been combined with pre-existing datasets and deployed using algorithms (from which value is derived). This is one reason why purpose limitation rules can have unintended consequences. If firms are unable to predict what data they will need in order to develop new products, then restricting what data they’re allowed to collect is per se anti-innovation.

6. Oil has constant returns to scale; data has rapidly diminishing returns

As an energy input into a mechanical process, oil has relatively constant returns to scale (e.g., when oil is used as the fuel source to power a machine). When data is used as an input for an algorithm, it shows rapidly diminishing returns, as the charts collected in a presentation by Google’s Hal Varian demonstrate. The initial training data is hugely valuable for increasing an algorithm’s accuracy. But as you increase the dataset by a fixed amount each time, the improvements steadily decline (because new data is only helpful in so far as it’s differentiated from the existing dataset).

7. Oil is valuable; data is worthless

The features detailed above — rivalrousness, fungibility, marginal cost, returns to scale — all lead to perhaps the most important distinction between oil and data: The average barrel of oil is valuable (currently $56.49) and the average dataset is worthless (on the open market). As Will Rinehart showed, putting a price on data is a difficult task. But when data brokers and other intermediaries in the digital economy do try to value data, the prices are almost uniformly low. The Financial Times had the most detailed numbers on what personal data is sold for in the market:

  • “General information about a person, such as their age, gender and location is worth a mere $0.0005 per person, or $0.50 per 1,000 people.”
  • “A person who is shopping for a car, a financial product or a vacation is more valuable to companies eager to pitch those goods. Auto buyers, for instance, are worth about $0.0021 a pop, or $2.11 per 1,000 people.”
  • “Knowing that a woman is expecting a baby and is in her second trimester of pregnancy, for instance, sends the price tag for that information about her to $0.11.”
  • “For $0.26 per person, buyers can access lists of people with specific health conditions or taking certain prescriptions.”
  • “The company estimates that the value of a relatively high Klout score adds up to more than $3 in word-of-mouth marketing value.”
  • [T]he sum total for most individuals often is less than a dollar.

Data is a specific asset, meaning it has “a significantly higher value within a particular transacting relationship than outside the relationship.” We only think data is so valuable because tech companies are so valuable. In reality, it is the combination of high-skilled labor, large capital expenditures, and cutting-edge technologies (e.g., machine learning) that makes those companies so valuable. Yes, data is an important component of these production functions. But to claim that data is responsible for all the value created by these businesses, as Lanier does in his NYT op-ed, is farcical (and reminiscent of the labor theory of value). 

Conclusion

People who analogize data to oil or gold may merely be trying to convey that data is as valuable in the 21st century as those commodities were in the 20th century (though, as argued, a dubious proposition). If the comparison stopped there, it would be relatively harmless. But there is a real risk that policymakers might take the analogy literally and regulate data in the same way they regulate commodities. As this article shows, data has many unique properties that are simply incompatible with 20th-century modes of regulation.

A better — though imperfect — analogy, as author Bernard Marr suggests, would be renewable energy. The sources of renewable energy are all around us — solar, wind, hydroelectric — and there is more available than we could ever use. We just need the right incentives and technology to capture it. The same is true for data. We leave our digital fingerprints everywhere — we just need to dust for them.

Source: New York Magazine

When she rolled out her plan to break up Big Tech, Elizabeth Warren paid for ads (like the one shown above) claiming that “Facebook and Google account for 70% of all internet traffic.” This statistic has since been repeated in various forms by Rolling Stone, Vox, National Review, and Washingtonian. In my last post, I fact checked this claim and found it wanting.

Warren’s data

As supporting evidence, Warren cited a Newsweek article from 2017, which in turn cited a blog post from an open-source freelancer, who was aggregating data from a 2015 blog post published by Parse.ly, a web analytics company, which said: “Today, Facebook remains a top referring site to the publishers in Parse.ly’s network, claiming 39 percent of referral traffic versus Google’s share of 34 percent.” At the time, Parse.ly had “around 400 publisher domains” in its network. To put it lightly, this is not what it means to “account for” or “control” or “directly influence” 70 percent of all internet traffic, as Warren and others have claimed.

Internet traffic measured in bytes

In an effort to contextualize how extreme Warren’s claim was, in my last post I used a common measure of internet traffic — total volume in bytes — to show that Google and Facebook account for less than 20 percent of global internet traffic. Some Warren defenders have correctly pointed out that measuring internet traffic in bytes will weight the results toward data-heavy services, such as video streaming. It’s not obvious a priori, however, whether this would bias the results in favor of Facebook and Google or against them, given that users stream lots of video using those companies’ sites and apps (hello, YouTube).

Internet traffic measured by time spent by users

As I said in my post, there are multiple ways to measure total internet traffic, and no one of them is likely to offer a perfect measure. So, to get a fuller picture, we could also look at how users are spending their time on the internet. While there is no single source for global internet time use statistics, we can combine a few to reach an estimate (NB: this analysis includes time spent in apps as well as on the web). 

According to the Global Digital report by Hootsuite and We Are Social, in 2018 there were 4.021 billion active internet users, and the worldwide average for time spent using the internet was 6 hours and 42 minutes per day. That means there were 1,616 billion internet user-minutes per day.

Data from Apptopia shows that, in the three months from May through July 2018, users spent 300 billion hours in Facebook-owned apps and 118 billion hours in Google-owned apps. In other words, all Facebook-owned apps consume, on average, 197 billion user-minutes per day and all Google-owned apps consume, on average, 78 billion user-minutes per day. And according to SimilarWeb data for the three months from June to August 2019, web users spent 11 billion user-minutes per day visiting Facebook domains (facebook.com, whatsapp.com, instagram.com, messenger.com) and 52 billion user-minutes per day visiting Google domains, including google.com (and all subdomains) and youtube.com.

If you add up all app and web user-minutes for Google and Facebook, the total is 338 billion user minutes per day. A staggering number. But as a share of all internet traffic (in this case measured in terms of time spent)? Google- and Facebook-owned sites and apps account for about 21 percent of user-minutes.

Internet traffic measured by “connections”

In my last post, I cited a Sandvine study that measured total internet traffic by volume of upstream and downstream bytes. The same report also includes numbers for what Sandvine calls “connections,” which is defined as “the number of conversations occurring for an application.” Sandvine notes that while “some applications use a single connection for all traffic, others use many connections to transfer data or video to the end user.” For example, a video stream on Netflix uses a single connection, while every item on a webpage, such as loading images, may require a distinct connection.

Cam Cullen, Sandvine’s VP of marketing, also implored readers to “never forget Google connections include YouTube, Search, and DoubleClick — all of which are very noisy applications and universally consumed,” which would bias this statistic toward inflating Google’s share. With these caveats in mind, Sandvine’s data shows that Google is responsible for 30 percent of these connections, while Facebook is responsible for under 8 percent of connections. Note that Netflix’s share is less than 1 percent, which implies this statistic is not biased toward data-heavy services. Again, the numbers for Google and Facebook are a far cry from what Warren and others are claiming.

Source: Sandvine

Internet traffic measured by sources

I’m not sure whether either of these measures is preferable to what I offered in my original post, but each is at least a plausible measure of internet traffic — and all of them fall well short of Waren’s claimed 70 percent. What I do know is that the preferred metric offered by the people most critical of my post — external referrals to online publishers (content sites) — is decidedly not a plausible measure of internet traffic.

In defense of Warren, Jason Kint, the CEO of a trade association for digital content publishers, wrote, “I just checked actual benchmark data across our members (most publishers) and 67% of their external traffic comes through Google or Facebook.” Rand Fishkin cites his own analysis of data from Jumpshot showing that 66.0 percent of external referral visits were sent by Google and 5.1 percent were sent by Facebook.

In another response to my piece, former digital advertising executive, Dina Srinivasan, said, “[Percentage] of referrals is relevant because it is pointing out that two companies control a large [percentage] of business that comes through their door.” 

In my opinion, equating “external referrals to publishers” with “internet traffic” is unacceptable for at least two reasons.

First, the internet is much broader than traditional content publishers — it encompasses everything from email and Yelp to TikTok, Amazon, and Netflix. The relevant market is consumer attention and, in that sense, every internet supplier is bidding for scarce time. In a recent investor letter, Netflix said, “We compete with (and lose to) ‘Fortnite’ more than HBO,” adding: “There are thousands of competitors in this highly fragmented market vying to entertain consumers and low barriers to entry for those great experiences.” Previously, CEO Reed Hastings had only half-jokingly said, “We’re competing with sleep on the margin.” In this debate over internet traffic, the opposing side fails to grasp the scope of the internet market. It is unsuprising, then, that the one metric that does best at capturing attention — time spent — is about the same as bytes.

Second, and perhaps more important, even if we limit our analysis to publisher traffic, the external referral statistic these critics cite completely (and conveniently?) omits direct and internal traffic — traffic that represents the majority of publisher traffic. In fact, according to Parse.ly’s most recent data, which now includes more than 3,000 “high-traffic sites,” only 35 percent of total traffic comes from search and social referrers (as the graph below shows). Of course, Google and Facebook drive the majority of search and social referrals. But given that most users visit webpages without being referred at all, Google and Facebook are responsible for less than a third of total traffic

Source: Parse.ly

It is simply incorrect to say, as Srinivasan does, that external referrals offers a useful measurement of internet traffic because it captures a “large [percentage] of business that comes through [publishers’] door.” Well, “large” is relative, but the implication that these external referrals from Facebook and Google explain Warren’s 70%-of-internet-traffic claim is both factually incorrect and horribly misleading — especially in an antitrust context. 

It is factually incorrect because, at most, Google and Facebook are responsible for a third of the traffic on these sites; it is misleading because if our concern is ensuring that users can reach content sites without passing through Google or Facebook, the evidence is clear that they can and do — at least twice as often as they follow links from Google or Facebook to do so.

Conclusion

As my colleague Gus Hurwitz said, Warren is making a very specific and very alarming claim: 

There may be ‘softer’ versions of [Warren’s claim] that are reasonably correct (e.g., digital ad revenue, visibility into traffic). But for 99% of people hearing (and reporting on) these claims, they hear the hard version of the claim: Google and Facebook control 70% of what you do online. That claim is wrong, alarmist, misinformation, intended to foment fear, uncertainty, and doubt — to bootstrap the argument that ‘everything is terrible, worse, really!, and I’m here to save you.’ This is classic propaganda.

Google and Facebook do account for a 59 percent (and declining) share of US digital advertising. But that’s not what Warren said (nor would anyone try to claim with a straight face that “volume of advertising” was the same thing as “internet traffic”). And if our concern is with competition, it’s hard to look at the advertising market and conclude that it’s got a competition problem. Prices are falling like crazy (down 42 percent in the last decade), and volume is only increasing. If you add in offline advertising (which, whatever you think about market definition here, certainly competes with online advertising at the very least on some dimensions) Google and Facebook are responsible for only about 32 percent.

In her comments criticizing my article, Dina Srinivasan mentioned another of these “softer” versions:

Also, each time a publisher page loads, what [percentage] then queries Google or Facebook servers during the page loads? About 98+% of every page load. That stat is not even in Warren or your analysis. That is 1000% relevant.

It’s true that Google and Facebook have visibility into a great deal of all internet traffic (beyond their own) through a variety of products and services: browsers, content delivery networks (CDNs), web beacons, cloud computing, VPNs, data brokers, single sign-on (SSO), and web analytics services. But seeing internet traffic is not the same thing as “account[ing] for” — or controlling or even directly influencing — internet traffic. The first is a very different claim than the latter, and one with considerably more attenuated competitive relevance (if any). It certainly wouldn’t be a sufficient basis for advocating that Google and Facebook be broken up — which is probably why, although arguably accurate, it’s not the statistic upon which Warren based her proposal to do so.

In March of this year, Elizabeth Warren announced her proposal to break up Big Tech in a blog post on Medium. She tried to paint the tech giants as dominant players crushing their smaller competitors and strangling the open internet. This line in particular stood out: “More than 70% of all Internet traffic goes through sites owned or operated by Google or Facebook.

This statistic immediately struck me as outlandish, but I knew I would need to do some digging to fact check it. After seeing the claim repeated in a recent profile of the Open Markets Institute — “Google and Facebook control websites that receive 70 percent of all internet traffic” — I decided to track down the original source for this surprising finding. 

Warren’s blog post links to a November 2017 Newsweek article — “Who Controls the Internet? Facebook and Google Dominance Could Cause the ‘Death of the Web’” — written by Anthony Cuthbertson. The piece is even more alarmist than Warren’s blog post: “Facebook and Google now have direct influence over nearly three quarters of all internet traffic, prompting warnings that the end of a free and open web is imminent.

The Newsweek article, in turn, cites an October 2017 blog post by André Staltz, an open source freelancer, on his personal website titled “The Web began dying in 2014, here’s how”. His takeaway is equally dire: “It looks like nothing changed since 2014, but GOOG and FB now have direct influence over 70%+ of internet traffic.” Staltz claims the blog post took “months of research to write”, but the headline statistic is merely aggregated from a December 2015 blog post by Parse.ly, a web analytics and content optimization software company.

Source: André Staltz

The Parse.ly article — “Facebook Continues to Beat Google in Sending Traffic to Top Publishers” — is about external referrals (i.e., outside links) to publisher sites (not total internet traffic) and says the “data set used for this study included around 400 publisher domains.” This is not even a random sample much less a comprehensive measure of total internet traffic. Here’s how they summarize their results: “Today, Facebook remains a top referring site to the publishers in Parse.ly’s network, claiming 39 percent of referral traffic versus Google’s share of 34 percent.” 

Source: Parse.ly

So, using the sources provided by the respective authors, the claim from Elizabeth Warren that “more than 70% of all Internet traffic goes through sites owned or operated by Google or Facebook” can be more accurately rewritten as “more than 70 percent of external links to 400 publishers come from sites owned or operated by Google and Facebook.” When framed that way, it’s much less conclusive (and much less scary).

But what’s the real statistic for total internet traffic? This is a surprisingly difficult question to answer, because there is no single way to measure it: Are we talking about share of users, or user-minutes, of bits, or total visits, or unique visits, or referrals? According to Wikipedia, “Common measurements of traffic are total volume, in units of multiples of the byte, or as transmission rates in bytes per certain time units.”

One of the more comprehensive efforts to answer this question is undertaken annually by Sandvine. The networking equipment company uses its vast installed footprint of equipment across the internet to generate statistics on connections, upstream traffic, downstream traffic, and total internet traffic (summarized in the table below). This dataset covers both browser-based and app-based internet traffic, which is crucial for capturing the full picture of internet user behavior.

Source: Sandvine

Looking at two categories of traffic analyzed by Sandvine — downstream traffic and overall traffic — gives lie to the narrative pushed by Warren and others. As you can see in the chart below, HTTP media streaming — a category for smaller streaming services that Sandvine has not yet tracked individually — represented 12.8% of global downstream traffic and Netflix accounted for 12.6%. According to Sandvine, “the aggregate volume of the long tail is actually greater than the largest of the short-tail providers.” So much for the open internet being smothered by the tech giants.

Source: Sandvine

As for Google and Facebook? The report found that Google-operated sites receive 12.00 percent of total internet traffic while Facebook-controlled sites receive 7.79 percent. In other words, less than 20 percent of all Internet traffic goes through sites owned or operated by Google or Facebook. While this statistic may be less eye-popping than the one trumpeted by Warren and other antitrust activists, it does have the virtue of being true.

Source: Sandvine

And if David finds out the data beneath his profile, you’ll start to be able to connect the dots in various ways with Facebook and Cambridge Analytica and Trump and Brexit and all these loosely-connected entities. Because you get to see inside the beast, you get to see inside the system.

This excerpt from the beginning of Netflix’s The Great Hack shows the goal of the documentary: to provide one easy explanation for Brexit and the election of Trump, two of the most surprising electoral outcomes in recent history.

Unfortunately, in attempting to tell a simple narrative, the documentary obscures more than it reveals about what actually happened in the Facebook-Cambridge Analytica data scandal. In the process, the film wildly overstates the significance of the scandal in either the 2016 US presidential election or the 2016 UK referendum on leaving the EU.

In this article, I will review the background of the case and show seven things the documentary gets wrong about the Facebook-Cambridge Analytica data scandal.

Background

In 2013, researchers published a paper showing that you could predict some personality traits — openness and extraversion — from an individual’s Facebook Likes. Cambridge Analytica wanted to use Facebook data to create a “psychographic” profile — i.e., personality type — of each voter and then micro-target them with political messages tailored to their personality type, ultimately with the hope of persuading them to vote for Cambridge Analytica’s client (or at least to not vote for the opposing candidate).

In this case, the psychographic profile is the person’s Big Five (or OCEAN) personality traits, which research has shown are relatively stable throughout our lives:

  1. Openness to new experiences
  2. Conscientiousness
  3. Extroversion
  4. Agreeableness
  5. Neuroticism

But how to get the Facebook data to create these profiles? A researcher at Cambridge University, Alex Kogan, created an app called thisismydigitallife, a short quiz for determining your personality type. Between 250,000 and 270,000 people were paid a small amount of money to take this quiz. 

Those who took the quiz shared some of their own Facebook data as well as their friends’ data (so long as the friends’ privacy settings allowed third-party app developers to access their data). 

This process captured data on “at least 30 million identifiable U.S. consumers”, according to the FTC. For context, even if we assume all 30 million were registered voters, that means the data could be used to create profiles for less than 20 percent of the relevant population. And though some may disagree with Facebook’s policy for sharing user data with third-party developers, collecting data in this manner was in compliance with Facebook’s terms of service at the time.

What crossed the line was what happened next. Kogan then sold that data to Cambridge Analytica, without the consent of the affected Facebook users and in express violation of Facebook’s prohibition on selling Facebook data between third and fourth parties. 

Upon learning of the sale, Facebook directed Alex Kogan and Cambridge Analytica to delete the data. But the social media company failed to notify users that their data had been misused or confirm via an independent audit that the data was actually deleted.

1. Cambridge Analytica was selling snake oil (no, you are not easily manipulated)

There’s a line in The Great Hack that sums up the opinion of the filmmakers and the subjects in their story: “There’s 2.1 billion people, each with their own reality. And once everybody has their own reality, it’s relatively easy to manipulate them.” According to the latest research from political science, this is completely bogus (and it’s the same marketing puffery that Cambridge Analytica would pitch to prospective clients).

The best evidence in this area comes from Joshua Kalla and David E. Broockman in a 2018 study published by American Political Science Review:

We argue that the best estimate of the effects of campaign contact and advertising on Americans’ candidates choices in general elections is zero. First, a systematic meta-analysis of 40 field experiments estimates an average effect of zero in general elections. Second, we present nine original field experiments that increase the statistical evidence in the literature about the persuasive effects of personal contact 10-fold. These experiments’ average effect is also zero.

In other words, a meta-analysis covering 49 high-quality field experiments found that in US general elections, advertising has zero effect on the outcome. (However, there is evidence “campaigns are able to have meaningful persuasive effects in primary and ballot measure campaigns, when partisan cues are not present.”)

But the relevant conclusion for the Cambridge Analytica scandal remains the same: in highly visible elections with a polarized electorate, it simply isn’t that easy to persuade voters to change their minds.

2. Micro-targeting political messages is overrated — people prefer general messages on shared beliefs

But maybe Cambridge Analytica’s micro-targeting strategy would result in above-average effects? The literature provides reason for skepticism here as well. Another paper by Eitan D. Hersh and Brian F. Schaffner in The Journal of Politics found that voters “rarely prefer targeted pandering to general messages” and “seem to prefer being solicited based on broad principles and collective beliefs.” It’s political tribalism all the way down. 

A field experiment with 56,000 Wisconsin voters in the 2008 US presidential election found that “persuasive appeals possibly reduced candidate support and almost certainly did not increase it,” suggesting that  “contact by a political campaign can engender a backlash.”

3. Big Five personality traits are not very useful for predicting political orientation

Or maybe there’s something special about targeting political messages based on a person’s Big Five personality traits? Again, there is little reason to believe this is the case. As Kris-Stella Trump mentions in an article for The Washington Post

The ‘Big 5’ personality traits … only predict about 5 percent of the variation in individuals’ political orientations. Even accurate personality data would only add very little useful information to a data set that includes people’s partisanship — which is what most campaigns already work with.

The best evidence we have on the importance of personality traits on decision-making comes from the marketing literature (n.b., it’s likely easier to influence consumer decisions than political decisions in today’s increasingly polarized electorate). Here too the evidence is weak:

In this successful study, researchers targeted ads, based on personality, to more than 1.5 million people; the result was about 100 additional purchases of beauty products than had they advertised without targeting.

More to the point, the Facebook data obtained by Cambridge Analytica couldn’t even accomplish the simple task of matching Facebook Likes to the Big Five personality traits. Here’s Cambridge University researcher Alex Kogan in Michael Lewis’s podcast episode about the scandal: 

We started asking the question of like, well, how often are we right? And so there’s five personality dimensions? And we said like, okay, for what percentage of people do we get all five personality categories correct? We found it was like 1%.

Eitan Hersh, an associate professor of political science at Tufts University, summed it up best: “Every claim about psychographics etc made by or about [Cambridge Analytica] is BS.

4. If Cambridge Analytica’s “weapons-grade communications techniques” were so powerful, then Ted Cruz would be president

The Great Hack:

Ted Cruz went from the lowest rated candidate in the primaries to being the last man standing before Trump got the nomination… Everyone said Ted Cruz had this amazing ground game, and now we know who came up with all of it. Joining me now, Alexander Nix, CEO of Cambridge Analytica, the company behind it all.

Reporting by Nicholas Confessore and Danny Hakim at The New York Times directly contradicts this framing on Cambridge Analytica’s role in the 2016 Republican presidential primary:

Cambridge’s psychographic models proved unreliable in the Cruz presidential campaign, according to Rick Tyler, a former Cruz aide, and another consultant involved in the campaign. In one early test, more than half the Oklahoma voters whom Cambridge had identified as Cruz supporters actually favored other candidates.

Most significantly, the Cruz campaign stopped using Cambridge Analytica’s services in February 2016 due to disappointing results, as Kenneth P. Vogel and Darren Samuelsohn reported in Politico in June of that year:

Cruz’s data operation, which was seen as the class of the GOP primary field, was disappointed in Cambridge Analytica’s services and stopped using them before the Nevada GOP caucuses in late February, according to a former staffer for the Texas Republican.

“There’s this idea that there’s a magic sauce of personality targeting that can overcome any issue, and the fact is that’s just not the case,” said the former staffer, adding that Cambridge “doesn’t have a level of understanding or experience that allows them to target American voters.”

Vogel later tweeted that most firms hired Cambridge Analytica “because it was seen as a prerequisite for receiving $$$ from the MERCERS.” So it seems campaigns hired Cambridge Analytica not for its “weapons-grade communications techniques” but for the firm’s connections to billionaire Robert Mercer.

5. The Trump campaign phased out Cambridge Analytica data in favor of RNC data for the general election

Just as the Cruz campaign became disillusioned after working with Cambridge Analytica during the primary, so too did the Trump campaign during the general election, as Major Garrett reported for CBS News:

The crucial decision was made in late September or early October when Mr. Trump’s son-in-law Jared Kushner and Brad Parscale, Mr. Trump’s digital guru on the 2016 campaign, decided to utilize just the RNC data for the general election and used nothing from that point from Cambridge Analytica or any other data vendor. The Trump campaign had tested the RNC data, and it proved to be vastly more accurate than Cambridge Analytica’s, and when it was clear the RNC would be a willing partner, Mr. Trump’s campaign was able to rely solely on the RNC.

And of the little work Cambridge Analytica did complete for the Trump campaign, none involved “psychographics,” The New York Times reported:

Mr. Bannon at one point agreed to expand the company’s role, according to the aides, authorizing Cambridge to oversee a $5 million purchase of television ads. But after some of them appeared on cable channels in Washington, D.C. — hardly an election battleground — Cambridge’s involvement in television targeting ended.

Trump aides … said Cambridge had played a relatively modest role, providing personnel who worked alongside other analytics vendors on some early digital advertising and using conventional micro-targeting techniques. Later in the campaign, Cambridge also helped set up Mr. Trump’s polling operation and build turnout models used to guide the candidate’s spending and travel schedule. None of those efforts involved psychographics.

6. There is no evidence that Facebook data was used in the Brexit referendum

Last year, the UK’s data protection authority fined Facebook £500,000 — the maximum penalty allowed under the law — for violations related to the Cambridge Analytica data scandal. The fine was astonishing considering that the investigation of Cambridge Analytica’s licensed data derived from Facebook “found no evidence that UK citizens were among them,” according to the BBC. This detail demolishes the second central claim of The Great Hack, that data fraudulently acquired from Facebook users enabled Cambridge Analytica to manipulate the British people into voting for Brexit. On this basis, Facebook is currently appealing the fine.

7. The Great Hack wasn’t a “hack” at all

The title of the film is an odd choice given the facts of the case, as detailed in the background section of this article. A “hack” is generally understood as an unauthorized breach of a computer system or network by a malicious actor. People think of a genius black hat programmer who overcomes a company’s cybersecurity defenses to profit off stolen data. Alex Kogan, the Cambridge University researcher who acquired the Facebook data for Cambridge Analytica, was nothing of the sort. 

As Gus Hurwitz noted in an article last year, Kogan entered into a contract with Facebook and asked users for their permission to acquire their data by using the thisismydigitallife personality app. Arguably, if there was a breach of trust, it was when the app users chose to share their friends’ data, too. The editorial choice to call this a “hack” instead of “data collection” or “data scraping” is of a piece with the rest of the film; when given a choice between accuracy and sensationalism, the directors generally chose the latter.

Why does this narrative persist despite the facts of the case?

The takeaway from the documentary is that Cambridge Analytica hacked Facebook and subsequently undermined two democratic processes: the Brexit referendum and the 2016 US presidential election. The reason this narrative has stuck in the public consciousness is that it serves everyone’s self-interest (except, of course, Facebook’s).

It lets voters off the hook for what seem, to many, to be drastic mistakes (i.e., electing a reality TV star president and undoing the European project). If we were all manipulated into making the “wrong” decision, then the consequences can’t be our fault! 

This narrative also serves Cambridge Analytica, to a point. For a time, the political consultant liked being able to tell prospective clients that it was the mastermind behind two stunning political upsets. Lastly, journalists like the story because they compete with Facebook in the advertising market and view the tech giant as an existential threat.

There is no evidence for the film’s implicit assumption that, but for Cambridge Analytica’s use of Facebook data to target voters, Trump wouldn’t have been elected and the UK wouldn’t have voted to leave the EU. Despite its tone and ominous presentation style, The Great Hack fails to muster any support for its extreme claims. The truth is much more mundane: the Facebook-Cambridge Analytica data scandal was neither a “hack” nor was it “great” in historical importance.

The documentary ends with a question:

But the hardest part in all of this is that these wreckage sites and crippling divisions begin with the manipulation of one individual. Then another. And another. So, I can’t help but ask myself: Can I be manipulated? Can you?

No — but the directors of The Great Hack tried their best to do so.

[This post is the seventh in an ongoing symposium on “Should We Break Up Big Tech?” that features analysis and opinion from various perspectives.]

[This post is authored by Alec Stapp, Research Fellow at the International Center for Law & Economics]

Should we break up Microsoft? 

In all the talk of breaking up “Big Tech,” no one seems to mention the biggest tech company of them all. Microsoft’s market cap is currently higher than those of Apple, Google, Amazon, and Facebook. If big is bad, then, at the moment, Microsoft is the worst.

Apart from size, antitrust activists also claim that the structure and behavior of the Big Four — Facebook, Google, Apple, and Amazon — is why they deserve to be broken up. But they never include Microsoft, which is curious given that most of their critiques also apply to the largest tech giant:

  1. Microsoft is big (current market cap exceeds $1 trillion)
  2. Microsoft is dominant in narrowly-defined markets (e.g., desktop operating systems)
  3. Microsoft is simultaneously operating and competing on a platform (i.e., the Microsoft Store)
  4. Microsoft is a conglomerate capable of leveraging dominance from one market into another (e.g., Windows, Office 365, Azure)
  5. Microsoft has its own “kill zone” for startups (196 acquisitions since 1994)
  6. Microsoft operates a search engine that preferences its own content over third-party content (i.e., Bing)
  7. Microsoft operates a platform that moderates user-generated content (i.e., LinkedIn)

To be clear, this is not to say that an antitrust case against Microsoft is as strong as the case against the others. Rather, it is to say that the cases against the Big Four on these dimensions are as weak as the case against Microsoft, as I will show below.

Big is bad

Tim Wu published a book last year arguing for more vigorous antitrust enforcement — including against Big Tech — called “The Curse of Bigness.” As you can tell by the title, he argues, in essence, for a return to the bygone era of “big is bad” presumptions. In his book, Wu mentions “Microsoft” 29 times, but only in the context of its 1990s antitrust case. On the other hand, Wu has explicitly called for antitrust investigations of Amazon, Facebook, and Google. It’s unclear why big should be considered bad when it comes to the latter group but not when it comes to Microsoft. Maybe bigness isn’t actually a curse, after all.

As the saying goes in antitrust, “Big is not bad; big behaving badly is bad.” This aphorism arose to counter erroneous reasoning during the era of structure-conduct-performance when big was presumed to mean bad. Thanks to an improved theoretical and empirical understanding of the nature of the competitive process, there is now a consensus that firms can grow large either via superior efficiency or by engaging in anticompetitive behavior. Size alone does not tell us how a firm grew big — so it is not a relevant metric.

Dominance in narrowly-defined markets

Critics of Google say it has a monopoly on search and critics of Facebook say it has a monopoly on social networking. Microsoft is similarly dominant in at least a few narrowly-defined markets, including desktop operating systems (Windows has a 78% market share globally): 

Source: StatCounter

Microsoft is also dominant in the “professional networking platform” market after its acquisition of LinkedIn in 2016. And the legacy tech giant is still the clear leader in the “paid productivity software” market. (Microsoft’s Office 365 revenue is roughly 10x Google’s G Suite revenue).

The problem here is obvious. These are overly-narrow market definitions for conducting an antitrust analysis. Is it true that Facebook’s platforms are the only service that can connect you with your friends? Should we really restrict the productivity market to “paid”-only options (as the EU similarly did in its Android decision) when there are so many free options available? These questions are laughable. Proper market definition requires considering whether a hypothetical monopolist could profitably impose a small but significant and non-transitory increase in price (SSNIP). If not (which is likely the case in the narrow markets above), then we should employ a broader market definition in each case.

Simultaneously operating and competing on a platform

Elizabeth Warren likes to say that if you own a platform, then you shouldn’t both be an umpire and have a team in the game. Let’s put aside the problems with that flawed analogy for now. What she means is that you shouldn’t both run the platform and sell products, services, or apps on that platform (because it’s inherently unfair to the other sellers). 

Warren’s solution to this “problem” would be to create a regulated class of businesses called “platform utilities” which are “companies with an annual global revenue of $25 billion or more and that offer to the public an online marketplace, an exchange, or a platform for connecting third parties.” Microsoft’s revenue last quarter was $32.5 billion, so it easily meets the first threshold. And Windows obviously qualifies as “a platform for connecting third parties.”

Just as in mobile operating systems, desktop operating systems are compatible with third-party applications. These third-party apps can be free (e.g., iTunes) or paid (e.g., Adobe Photoshop). Of course, Microsoft also makes apps for Windows (e.g., Word, PowerPoint, Excel, etc.). But the more you think about the technical details, the blurrier the line between the operating system and applications becomes. Is the browser an add-on to the OS or a part of it (as Microsoft Edge appears to be)? The most deeply-embedded applications in an OS are simply called “features.”

Even though Warren hasn’t explicitly mentioned that her plan would cover Microsoft, it almost certainly would. Previously, she left Apple out of the Medium post announcing her policy, only to later tell a journalist that the iPhone maker would also be prohibited from producing its own apps. But what Warren fails to include in her announcement that she would break up Apple is that trying to police the line between a first-party platform and third-party applications would be a nightmare for companies and regulators, likely leading to less innovation and higher prices for consumers (as they attempt to rebuild their previous bundles).

Leveraging dominance from one market into another

The core critique in Lina Khan’s “Amazon’s Antitrust Paradox” is that the very structure of Amazon itself is what leads to its anticompetitive behavior. Khan argues (in spite of the data) that Amazon uses profits in some lines of business to subsidize predatory pricing in other lines of businesses. Furthermore, she claims that Amazon uses data from its Amazon Web Services unit to spy on competitors and snuff them out before they become a threat.

Of course, this is similar to the theory of harm in Microsoft’s 1990s antitrust case, that the desktop giant was leveraging its monopoly from the operating system market into the browser market. Why don’t we hear the same concern today about Microsoft? Like both Amazon and Google, you could uncharitably describe Microsoft as extending its tentacles into as many sectors of the economy as possible. Here are some of the markets in which Microsoft competes (and note how the Big Four also compete in many of these same markets):

What these potential antitrust harms leave out are the clear consumer benefits from bundling and vertical integration. Microsoft’s relationships with customers in one market might make it the most efficient vendor in related — but separate — markets. It is unsurprising, for example, that Windows customers would also frequently be Office customers. Furthermore, the zero marginal cost nature of software makes it an ideal product for bundling, which redounds to the benefit of consumers.

The “kill zone” for startups

In a recent article for The New York Times, Tim Wu and Stuart A. Thompson criticize Facebook and Google for the number of acquisitions they have made. They point out that “Google has acquired at least 270 companies over nearly two decades” and “Facebook has acquired at least 92 companies since 2007”, arguing that allowing such a large number of acquisitions to occur is conclusive evidence of regulatory failure.

Microsoft has made 196 acquisitions since 1994, but they receive no mention in the NYT article (or in most of the discussion around supposed “kill zones”). But the acquisitions by Microsoft or Facebook or Google are, in general, not problematic. They provide a crucial channel for liquidity in the venture capital and startup communities (the other channel being IPOs). According to the latest data from Orrick and Crunchbase, between 2010 and 2018, there were 21,844 acquisitions of tech startups for a total deal value of $1.193 trillion

By comparison, according to data compiled by Jay R. Ritter, a professor at the University of Florida, there were 331 tech IPOs for a total market capitalization of $649.6 billion over the same period. Making it harder for a startup to be acquired would not result in more venture capital investment (and therefore not in more IPOs), according to recent research by Gordon M. Phillips and Alexei Zhdanov. The researchers show that “the passage of a pro-takeover law in a country is associated with more subsequent VC deals in that country, while the enactment of a business combination antitakeover law in the U.S. has a negative effect on subsequent VC investment.”

As investor and serial entrepreneur Leonard Speiser said recently, “If the DOJ starts going after tech companies for making acquisitions, venture investors will be much less likely to invest in new startups, thereby reducing competition in a far more harmful way.” 

Search engine bias

Google is often accused of biasing its search results to favor its own products and services. The argument goes that if we broke them up, a thousand search engines would bloom and competition among them would lead to less-biased search results. While it is a very difficult — if not impossible — empirical question to determine what a “neutral” search engine would return, one attempt by Josh Wright found that “own-content bias is actually an infrequent phenomenon, and Google references its own content more favorably than other search engines far less frequently than does Bing.” 

The report goes on to note that “Google references own content in its first results position when no other engine does in just 6.7% of queries; Bing does so over twice as often (14.3%).” Arguably, users of a particular search engine might be more interested in seeing content from that company because they have a preexisting relationship. But regardless of how we interpret these results, it’s clear this not a frequent phenomenon.

So why is Microsoft being left out of the antitrust debate now?

One potential reason why Google, Facebook, and Amazon have been singled out for criticism of practices that seem common in the tech industry (and are often pro-consumer) may be due to the prevailing business model in the journalism industry. Google and Facebook are by far the largest competitors in the digital advertising market, and Amazon is expected to be the third-largest player by next year, according to eMarketer. As Ramsi Woodcock pointed out, news publications are also competing for advertising dollars, the type of conflict of interest that usually would warrant disclosure if, say, a journalist held stock in a company they were covering.

Or perhaps Microsoft has successfully avoided receiving the same level of antitrust scrutiny as the Big Four because it is neither primarily consumer-facing like Apple or Amazon nor does it operate a platform with a significant amount of political speech via user-generated content (UGC) like Facebook or Google (YouTube). Yes, Microsoft moderates content on LinkedIn, but the public does not get outraged when deplatforming merely prevents someone from spamming their colleagues with requests “to add you to my professional network.”

Microsoft’s core areas are in the enterprise market, which allows it to sidestep the current debates about the supposed censorship of conservatives or unfair platform competition. To be clear, consumer-facing companies or platforms with user-generated content do not uniquely merit antitrust scrutiny. On the contrary, the benefits to consumers from these platforms are manifest. If this theory about why Microsoft has escaped scrutiny is correct, it means the public discussion thus far about Big Tech and antitrust has been driven by perception, not substance.


Last year, real estate developer Alastair Mactaggart spent nearly $3.5 million to put a privacy law on the ballot in California’s November election. He then negotiated a deal with state lawmakers to withdraw the ballot initiative if they passed their own privacy bill. That law — the California Consumer Privacy Act (CCPA) — was enacted after only seven days of drafting and amending. CCPA will go into effect six months from today.

According to Mactaggart, it all began when he spoke with a Google engineer and was shocked to learn how much personal data the company collected. This revelation motivated him to find out exactly how much of his data Google had. Perplexingly, instead of using Google’s freely available transparency tools, Mactaggart decided to spend millions to pressure the state legislature into passing new privacy regulation.

The law has six consumer rights, including the right to know; the right of data portability; the right to deletion; the right to opt-out of data sales; the right to not be discriminated against as a user; and a private right of action for data breaches.

So, what are the law’s prospects when it goes into effect next year? Here are ten reasons why CCPA is going to be a dumpster fire.

1. CCPA compliance costs will be astronomical

“TrustArc commissioned a survey of the readiness of 250 firms serving California from a range of industries and company size in February 2019. It reports that 71 percent of the respondents expect to spend at least six figures in CCPA-related privacy compliance expenses in 2019 — and 19 percent expect to spend over $1 million. Notably, if CCPA were in effect today, 86 percent of firms would not be ready. An estimated half a million firms are liable under the CCPA, most of which are small- to medium-sized businesses. If all eligible firms paid only $100,000, the upfront cost would already be $50 billion. This is in addition to lost advertising revenue, which could total as much as $60 billion annually. (AEI / Roslyn Layton)

2. CCPA will be good for Facebook and Google (and bad for small ad networks)

“It’s as if the privacy activists labored to manufacture a fearsome cannon with which to subdue giants like Facebook and Google, loaded it with a scattershot set of legal restrictions, aimed it at the entire ads ecosystem, and fired it with much commotion. When the smoke cleared, the astonished activists found they’d hit only their small opponents, leaving the giants unharmed. Meanwhile, a grinning Facebook stared back at the activists and their mighty cannon, the weapon that they had slyly helped to design.” (Wired / Antonio García Martínez)

“Facebook and Google ultimately are not constrained as much by regulation as by users. The first-party relationship with users that allows these companies relative freedom under privacy laws comes with the burden of keeping those users engaged and returning to the app, despite privacy concerns.” (Wired / Antonio García Martínez)

3. CCPA will enable free-riding by users who opt out of data sharing

“[B]y restricting companies from limiting services or increasing prices for consumers who opt-out of sharing personal data, CCPA enables free riders—individuals that opt out but still expect the same services and price—and undercuts access to free content and services. Someone must pay for free services, and if individuals opt out of their end of the bargain—by allowing companies to use their data—they make others pay more, either directly or indirectly with lower quality services. CCPA tries to compensate for the drastic reduction in the effectiveness of online advertising, an important source of income for digital media companies, by forcing businesses to offer services even though they cannot effectively generate revenue from users.” (ITIF / Daniel Castro and Alan McQuinn)

4. CCPA is potentially unconstitutional as-written

“[T]he law potentially applies to any business throughout the globe that has/gets personal information about California residents the moment the business takes the first dollar from a California resident. Furthermore, the law applies to some corporate affiliates (parent, subsidiary, or commonly owned companies) of California businesses, even if those affiliates have no other ties to California. The law’s purported application to businesses not physically located in California raises potentially significant dormant Commerce Clause and other Constitutional problems.” (Eric Goldman)

5. GDPR compliance programs cannot be recycled for CCPA

“[C]ompanies cannot just expand the coverage of their EU GDPR compliance measures to residents of California. For example, the California Consumer Privacy Act:

  • Prescribes disclosures, communication channels (including toll-free phone numbers) and other concrete measures that are not required to comply with the EU GDPR.
  • Contains a broader definition of “personal data” and also covers information pertaining to households and devices.
  • Establishes broad rights for California residents to direct deletion of data, with differing exceptions than those available under GDPR.
  • Establishes broad rights to access personal data without certain exceptions available under GDPR (e.g., disclosures that would implicate the privacy interests of third parties).
  • Imposes more rigid restrictions on data sharing for commercial purposes.”

(IAPP / Lothar Determann)

6. CCPA will be a burden on small- and medium-sized businesses

“The law applies to businesses operating in California if they generate an annual gross revenue of $25 million or more, if they annually receive or share personal information of 50,000 California residents or more, or if they derive at least 50 percent of their annual revenue by “selling the personal information” of California residents. In effect, this means that businesses with websites that receive traffic from an average of 137 unique Californian IP addresses per day could be subject to the new rules.” (ITIF / Daniel Castro and Alan McQuinn)

CCPA “will apply to more than 500,000 U.S. companies, the vast majority of which are small- to medium-sized enterprises.” (IAPP / Rita Heimes and Sam Pfeifle)

7. CCPA’s definition of “personal information” is extremely over-inclusive

“CCPA likely includes gender information in the “personal information” definition because it is “capable of being associated with” a particular consumer when combined with other datasets. We can extend this logic to pretty much every type or class of data, all of which become re-identifiable when combined with enough other datasets. Thus, all data related to individuals (consumers or employees) in a business’ possession probably qualifies as “personal information.” (Eric Goldman)

“The definition of “personal information” includes “household” information, which is particularly problematic. A “household” includes the consumer and other co-habitants, which means that a person’s “personal information” oxymoronically includes information about other people. These people’s interests may diverge, such as with separating spouses, multiple generations under the same roof, and roommates. Thus, giving a consumer rights to access, delete, or port “household” information affects other people’s information, which may violate their expectations and create major security and privacy risks.” (Eric Goldman)

8. CCPA penalties might become a source for revenue generation

“According to the new Cal. Civ. Code §1798.150, companies that become victims of data theft or other data security breaches can be ordered in civil class action lawsuits to pay statutory damages between $100 to $750 per California resident and incident, or actual damages, whichever is greater, and any other relief a court deems proper, subject to an option of the California Attorney General’s Office to prosecute the company instead of allowing civil suits to be brought against it.” (IAPP / Lothar Determann)

“According to the new Cal. Civ. Code §1798.155, companies can be ordered in a civil action brought by the California Attorney General’s Office to pay penalties of up to $7,500 per intentional violation of any provision of the California Consumer Privacy Act, or, for unintentional violations, if the company fails to cure the unintentional violation within 30 days of notice, $2,500 per violation under Section 17206 of the California Business and Professions Code. Twenty percent of such penalties collected by the State of California shall be allocated to a new “Consumer Privacy Fund” to fund enforcement.” (IAPP / Lothar Determann)

“[T]he Attorney General, through its support of SB 561, is seeking to remove this provision, known as a “30-day cure,” arguing that it would be able to secure more civil penalties and thus increase enforcement. Specifically, the Attorney General has said it needs to raise $57.5 million in civil penalties to cover the cost of CCPA enforcement.”  (ITIF / Daniel Castro and Alan McQuinn)

9. CCPA is inconsistent with existing privacy laws

“California has led the United States and often the world in codifying privacy protections, enacting the first laws requiring notification of data security breaches (2002) and website privacy policies (2004). In the operative section of the new law, however, the California Consumer Privacy Act’s drafters did not address any overlap or inconsistencies between the new law and any of California’s existing privacy laws, perhaps due to the rushed legislative process, perhaps due to limitations on the ability to negotiate with the proponents of the Initiative. Instead, the new Cal. Civ. Code §1798.175 prescribes that in case of any conflicts with California laws, the law that affords the greatest privacy protections shall control.” (IAPP / Lothar Determann)

10. CCPA will need to be amended, creating uncertainty for businesses

As of now, a dozen bills amending CCPA have passed the California Assembly and continue to wind their way through the legislative process. California lawmakers have until September 13th to make any final changes to the law before it goes into effect. In the meantime, businesses have to begin compliance preparations under a cloud of uncertainty about what the says today — or what it might even say in the future.

Thomas Wollmann has a new paper — “Stealth Consolidation: Evidence from an Amendment to the Hart-Scott-Rodino Act” — in American Economic Review: Insights this month. Greg Ip included this research in an article for the WSJ in which he claims that “competition has declined and corporate concentration risen through acquisitions often too small to draw the scrutiny of antitrust watchdogs.” In other words, “stealth consolidation”.

Wollmann’s study uses a difference-in-differences approach to examine the effect on merger activity of the 2001 amendment to the Hart-Scott-Rodino (HSR) Antitrust Improvements Act of 1976 (15 U.S.C. 18a). The amendment abruptly increased the pre-merger notification threshold from $15 million to $50 million in deal size. Strictly on those terms, the paper shows that raising the pre-merger notification threshold increased merger activity.

However, claims about “stealth consolidation” are controversial because they connote nefarious intentions and anticompetitive effects. As Wollmann admits in the paper, due to data limitations, he is unable to show that the new mergers are in fact anticompetitive or that the social costs of these mergers exceed the social benefits. Therefore, more research is needed to determine the optimal threshold for pre-merger notification rules, and claiming that harmful “stealth consolidation” is occurring is currently unwarranted.

Background: The “Unscrambling the Egg” Problem

In general, it is more difficult to unwind a consummated anticompetitive merger than it is to block a prospective anticompetitive merger. As Wollmann notes, for example, “El Paso Natural Gas Co. acquired its only potential rival in a market” and “the government’s challenge lasted 17 years and involved seven trips to the Supreme Court.”

Rolling back an anticompetitive merger is so difficult that it came to be known as “unscrambling the egg.” As William J. Baer, a former director of the Bureau of Competition at the FTC, described it, “there were strong incentives for speedily and surreptitiously consummating suspect mergers and then protracting the ensuing litigation” prior to the implementation of a pre-merger notification rule. These so-called “midnight mergers” were intended to avoid drawing antitrust scrutiny.

In response to this problem, Congress passed the Hart–Scott–Rodino Antitrust Improvements Act of 1976, which required companies to notify antitrust authorities of impending mergers if they exceeded certain size thresholds.

2001 Hart–Scott–Rodino Amendment

In 2001, Congress amended the HSR Act and effectively raised the threshold for premerger notification from $15 million in acquired firm assets to $50 million. This sudden and dramatic change created an opportunity to use a difference-in-differences technique to study the relationship between filing an HSR notification and merger activity.

According to Wollmann, here’s what notifications look like for never-exempt mergers (>$50M):

And here’s what notifications for newly-exempt ($15M < X < $50M) mergers look like:

So what does that mean for merger investigations? Here is the number of investigations into never-exempt mergers:

We see a pretty consistent relationship between number of mergers and number of investigations. More mergers means more investigations.  

How about for newly-exempt mergers?

Here, investigations go to zero while merger activity remains relatively stable. In other words, it appears that some mergers that would have been investigated had they required an HSR notification were not investigated.

Wollmann then uses four-digit SIC code industries to sort mergers into horizontal and non-horizontal categories. Here are never-exempt mergers:

He finds that almost all of the increase in merger activity (relative to the counterfactual in which the notification threshold were unchanged) is driven by horizontal mergers. And here are newly-exempt mergers:

Policy Implications & Limitations

The charts show a stark change in investigations and merger activity. The difference-in-differences methodology is solid and the author addresses some potential confounding variables (such as presidential elections). However, the paper leaves the broader implications for public policy unanswered.

Furthermore, given the limits of the data in this analysis, it’s not possible for this approach to explain competitive effects in the relevant antitrust markets, for three reasons:

Four-digit SIC code industries are not antitrust markets

Wollmann chose to classify mergers “as horizontal or non-horizontal based on whether or not the target and acquirer operate in the same four-digit SIC code industry, which is common convention.” But as Werden & Froeb (2018) notes, four-digit SIC code industries are orders of magnitude too large in most cases to be useful for antitrust analysis:

The evidence from cartel cases focused on indictments from 1970–80. Because the Justice Department prosecuted many local cartels, for 52 of the 80 indictments examined, the Commerce Quotient was less than 0.01, i.e., the SIC 4-digit industry was at least 100 times the apparent scope of the affected market.  Of the 80 indictments, 19 involved SIC 4-digit industries that had been thought to comport well with markets, so these were the most instructive. For  16 of the 19, the SIC 4-digit industry was at least 10 times the apparent scope of the affected market (i.e., the Commerce Quotient was less than 0.1).

Antitrust authorities do not rely on SIC 4-digit industry codes and instead establish a market definition based on the facts of each case. It is not possible to infer competitive effects from census data as Wollmann attempts to do.

The data cannot distinguish between anticompetitive mergers and procompetitive mergers

As Wollmann himself notes, the results tell us nothing about the relative costs and benefits of the new HSR policy:

Even so, these findings do not on their own advocate for one policy over another. To do so requires equating industry consolidation to a specific amount of economic harm and then comparing the resulting figure to the benefits derived from raising thresholds, which could be large. Even if the agencies ignore the reduced regulatory burden on firms, introducing exemptions can free up agency resources to pursue other cases (or reduce public spending). These and related issues require careful consideration but simply fall outside the scope of the present work.

For instance, firms could be reallocating merger activity to targets below the new threshold to avoid erroneous enforcement or they could be increasing merger activity for small targets due to reduced regulatory costs and uncertainty.

The study is likely underpowered for effects on blocked mergers

While the paper provides convincing evidence that investigations of newly-exempt mergers decreased dramatically following the change in the notification threshold, there is no equally convincing evidence of an effect on blocked mergers. As Wollmann points out, blocked mergers were exceedingly rare both before and after the Amendment (emphasis added):

Over 57,000 mergers comprise the sample, which spans eighteen years. The mean number of mergers each year is 3,180. The DOJ and FTC receive 31,464 notifications over this period, or 1,748 per year. Also, as stated above, blocked mergers are very infrequent: there are on average 13 per year pre-Amendment and 9 per-year post-Amendment.

Since blocked mergers are such a small percentage of total mergers both before and after the Amendment, we likely cannot tell from the data whether actual enforcement action changed significantly due to the change in notification threshold.

Greg Ip’s write-up for the WSJ includes some relevant charts for this issue. Ironically for a piece about the problems of lax merger review, the accompanying graphs show merger enforcement actions slightly increasing at both the FTC and the DOJ since 2001:

Source: WSJ

Overall, Wollmann’s paper does an effective job showing how changes in premerger notification rules can affect merger activity. However, due to data limitations, we cannot conclude anything about competitive effects or enforcement intensity from this study.

Source: KC Green

GDPR is officially one year old. How have the first 12 months gone? As you can see from the mix of data and anecdotes below, it appears that compliance costs have been astronomical; individual “data rights” have led to unintended consequences; “privacy protection” seems to have undermined market competition; and there have been large unseen — but not unmeasurable! — costs in forgone startup investment. So, all-in-all, about what we expected.

GDPR cases and fines

Here is the latest data on cases and fines released by the European Data Protection Board:

  • €55,955,871 in fines
    • €50 million of which was a single fine on Google
  • 281,088 total cases
    • 144,376 complaints
    • 89,271 data breach notifications
    • 47,441 other
  • 37.0% ongoing
  • 62.9% closed
  • 0.1% appealed

Unintended consequences of new data privacy rights

GDPR can be thought of as a privacy “bill of rights.” Many of these new rights have come with unintended consequences. If your account gets hacked, the hacker can use the right of access to get all of your data. The right to be forgotten is in conflict with the public’s right to know a bad actor’s history (and many of them are using the right to memory hole their misdeeds). The right to data portability creates another attack vector for hackers to exploit. And the right to opt-out of data collection creates a free-rider problem where users who opt-in subsidize the privacy of those who opt-out.

Article 15: Right of access

  • “Amazon sent 1,700 Alexa voice recordings to the wrong user following data request” [The Verge / Nick Statt]
  • “Today I discovered an unfortunate consequence of GDPR: once someone hacks into your account, they can request-—and potentially access—all of your data. Whoever hacked into my Spotify account got all of my streaming, song, etc. history simply by requesting it.” [Jean Yang]

Article 17: Right to be forgotten

  • “Since 2016, newspapers in Belgium and Italy have removed articles from their archives under [GDPR]. Google was also ordered last year to stop listing some search results, including information from 2014 about a Dutch doctor who The Guardian reported was suspended for poor care of a patient.” [NYT / Adam Satariano]
  • “French scam artist Michael Francois Bujaldon is using the GDPR to attempt to remove traces of his United States District Court case from the internet. He has already succeeded in compelling PacerMonitor to remove his case.” [PlainSite]
  • “In the last 5 days, we’ve had requests under GDPR to delete three separate articles … all about US lawsuits concerning scams committed by Europeans. That ‘right to be forgotten’ is working out just great, huh guys?” [Mike Masnick]

Article 20: Right to data portability

  • Data portability increases the attack surface for bad actors to exploit. In a sense, the Cambridge Analytica scandal was a case of too much data portability.
  • “The problem with data portability is that it goes both ways: if you can take your data out of Facebook to other applications, you can do the same thing in the other direction. The question, then, is which entity is likely to have the greater center of gravity with regards to data: Facebook, with its social network, or practically anything else?” [Stratechery / Ben Thompson]
  • “Presumably data portability would be imposed on Facebook’s competitors and potential competitors as well.  That would mean all future competing firms would have to slot their products into a Facebook-compatible template.  Let’s say that 17 years from now someone has a virtual reality social network innovation: does it have to be “exportable” into Facebook and other competitors?  It’s hard to think of any better way to stifle innovation.” [Marginal Revolution / Tyler Cowen]

Article 21: Right to opt out of data processing

  • “[B]y restricting companies from limiting services or increasing prices for consumers who opt-out of sharing personal data, these frameworks enable free riders—individuals that opt out but still expect the same services and price—and undercut access to free content and services.” [ITIF / Alan McQuinn and Daniel Castro]

Compliance costs are astronomical

  • Prior to GDPR going into effect, “PwC surveyed 200 companies with more than 500 employees and found that 68% planned on spending between $1 and $10 million to meet the regulation’s requirements. Another 9% planned to spend more than $10 million. With over 19,000 U.S. firms of this size, total GDPR compliance costs for this group could reach $150 billion.” [Fortune / Daniel Castro and Michael McLaughlin]
  • “[T]he International Association of Privacy Professionals (IAPP) estimates 500,000 European organizations have registered data protection officers (DPOs) within the first year of the General Data Protection Regulation (GDPR). According to a recent IAPP salary survey, the average DPO’s salary in Europe is $88,000.” [IAPP]
  • As of March 20, 2019, 1,129 US news sites are still unavailable in the EU due to GDPR. [Joseph O’Connor]
  • Microsoft had 1,600 engineers working on GDPR compliance. [Microsoft]
  • During a Senate hearing, Keith Enright, Google’s chief privacy officer, estimated that the company spent “hundreds of years of human time” to comply with the new privacy rules. [Quartz / Ashley Rodriguez]
    • However, French authorities ultimately decided Google’s compliance efforts were insufficient: “France fines Google nearly $57 million for first major violation of new European privacy regime” [Washington Post / Tony Romm]
  • “About 220,000 name tags will be removed in Vienna by the end of [2018], the city’s housing authority said. Officials fear that they could otherwise be fined up to $23 million, or about $1,150 per name.” [Washington Post / Rick Noack]
    UPDATE: Wolfie Christl pointed out on Twitter that the order to remove name tags was rescinded after only 11,000 name tags were removed due to public backlash and what Housing Councilor Kathrin Gaal said were “different legal opinions on the subject.”

Tradeoff between privacy regulations and market competition

“On the big guys increasing market share? I don’t believe [the law] will have such a consequence.” Věra Jourová, the European Commissioner for Justice, Consumers and Gender Equality [WSJ / Sam Schechner and Nick Kostov]

“Mentioned GDPR to the head of a European media company. ‘Gift to Google and Facebook, enormous regulatory own-goal.'” [Benedict Evans]

Source: WSJ
  • “Hundreds of companies compete to place ads on webpages or collect data on their users, led by Google, Facebook and their subsidiaries. The European Union’s General Data Protection Regulation, which took effect in May, imposes stiff requirements on such firms and the websites who use them. After the rule took effect in May, Google’s tracking software appeared on slightly more websites, Facebook’s on 7% fewer, while the smallest companies suffered a 32% drop, according to Ghostery, which develops privacy-enhancing web technology.” [WSJ / Greg Ip]
  • Havas SA, one of the world’s largest buyers of ads, says it observed a low double-digit percentage increase in advertisers’ spending through DBM on Google’s own ad exchange on the first day the law went into effect, according to Hossein Houssaini, Havas’s global head of programmatic solutions. On the selling side, companies that help publishers sell ad inventory have seen declines in bids coming through their platforms from Google. Paris-based Smart says it has seen a roughly 50% drop. [WSJ / Nick Kostov and Sam Schechner]
  • “The consequence was that just hours after the law’s enforcement, numerous independent ad exchanges and other vendors watched their ad demand volumes drop between 20 and 40 percent. But with agencies free to still buy demand on Google’s marketplace, demand on AdX spiked. The fact that Google’s compliance strategy has ended up hurting its competitors and redirecting higher demand back to its own marketplace, where it can guarantee it has user consent, has unsettled publishers and ad tech vendors.” [Digiday / Jessica Davies]

Unseen costs of forgone investment & research

  • Startups: One study estimated that venture capital invested in EU startups fell by as much as 50 percent due to GDPR implementation: “Specifically, our findings suggest a $3.38 million decrease in the aggregate dollars raised by EU ventures per state per crude industry category per week, a 17.6% reduction in the number of weekly venture deals, and a 39.6% decrease in the amount raised in an average deal following the rollout of GDPR … We use our results to provide a back-of-the-envelope calculation of a range of job losses that may be incurred by these ventures, which we estimate to be between 3,604 to 29,819 jobs.” [NBER / Jian Jia, Ginger Zhe Jin, and Liad Wagman]
  • Mergers and acquisitions: “55% of respondents said they had worked on deals that fell apart because of concerns about a target company’s data protection policies and compliance with GDPR” [WSJ / Nina Trentmann]
  • Scientific research: “[B]iomedical researchers fear that the EU’s new General Data Protection Regulation (GDPR) will make it harder to share information across borders or outside their original research context.” [Politico / Sarah Wheaton]

GDPR graveyard

Small and medium-sized businesses (SMBs) have left the EU market in droves (or shut down entirely). Here is a partial list:

Blockchain & P2P Services

  • CoinTouch, peer-to-peer cryptocurrency exchange
  • FamilyTreeDNA, free and public genetic tools
    • Mitosearch
    • Ysearch
  • Monal, XMPP chat app
  • Parity, know-your-customer service for initial coin offerings (ICOs)
  • Seznam, social network for students
  • StreetLend, tool sharing platform for neighbors

Marketing

  • Drawbridge, cross-device identity service
  • Klout, social reputation service by Lithium
  • Unroll.me, inbox management app
  • Verve, mobile programmatic advertising

Video Games

Other