Any Way You Measure It, Warren Is Wrong to Claim “Facebook and Google Account for 70% of All Internet Traffic”

Alec Stapp —  1 October 2019 — Leave a comment
Source: New York Magazine

When she rolled out her plan to break up Big Tech, Elizabeth Warren paid for ads (like the one shown above) claiming that “Facebook and Google account for 70% of all internet traffic.” This statistic has since been repeated in various forms by Rolling Stone, Vox, National Review, and Washingtonian. In my last post, I fact checked this claim and found it wanting.

Warren’s data

As supporting evidence, Warren cited a Newsweek article from 2017, which in turn cited a blog post from an open-source freelancer, who was aggregating data from a 2015 blog post published by Parse.ly, a web analytics company, which said: “Today, Facebook remains a top referring site to the publishers in Parse.ly’s network, claiming 39 percent of referral traffic versus Google’s share of 34 percent.” At the time, Parse.ly had “around 400 publisher domains” in its network. To put it lightly, this is not what it means to “account for” or “control” or “directly influence” 70 percent of all internet traffic, as Warren and others have claimed.

Internet traffic measured in bytes

In an effort to contextualize how extreme Warren’s claim was, in my last post I used a common measure of internet traffic — total volume in bytes — to show that Google and Facebook account for less than 20 percent of global internet traffic. Some Warren defenders have correctly pointed out that measuring internet traffic in bytes will weight the results toward data-heavy services, such as video streaming. It’s not obvious a priori, however, whether this would bias the results in favor of Facebook and Google or against them, given that users stream lots of video using those companies’ sites and apps (hello, YouTube).

Internet traffic measured by time spent by users

As I said in my post, there are multiple ways to measure total internet traffic, and no one of them is likely to offer a perfect measure. So, to get a fuller picture, we could also look at how users are spending their time on the internet. While there is no single source for global internet time use statistics, we can combine a few to reach an estimate (NB: this analysis includes time spent in apps as well as on the web). 

According to the Global Digital report by Hootsuite and We Are Social, in 2018 there were 4.021 billion active internet users, and the worldwide average for time spent using the internet was 6 hours and 42 minutes per day. That means there were 1,616 billion internet user-minutes per day.

Data from Apptopia shows that, in the three months from May through July 2018, users spent 300 billion hours in Facebook-owned apps and 118 billion hours in Google-owned apps. In other words, all Facebook-owned apps consume, on average, 197 billion user-minutes per day and all Google-owned apps consume, on average, 78 billion user-minutes per day. And according to SimilarWeb data for the three months from June to August 2019, web users spent 11 billion user-minutes per day visiting Facebook domains (facebook.com, whatsapp.com, instagram.com, messenger.com) and 52 billion user-minutes per day visiting Google domains, including google.com (and all subdomains) and youtube.com.

If you add up all app and web user-minutes for Google and Facebook, the total is 338 billion user minutes per day. A staggering number. But as a share of all internet traffic (in this case measured in terms of time spent)? Google- and Facebook-owned sites and apps account for about 21 percent of user-minutes.

Internet traffic measured by “connections”

In my last post, I cited a Sandvine study that measured total internet traffic by volume of upstream and downstream bytes. The same report also includes numbers for what Sandvine calls “connections,” which is defined as “the number of conversations occurring for an application.” Sandvine notes that while “some applications use a single connection for all traffic, others use many connections to transfer data or video to the end user.” For example, a video stream on Netflix uses a single connection, while every item on a webpage, such as loading images, may require a distinct connection.

Cam Cullen, Sandvine’s VP of marketing, also implored readers to “never forget Google connections include YouTube, Search, and DoubleClick — all of which are very noisy applications and universally consumed,” which would bias this statistic toward inflating Google’s share. With these caveats in mind, Sandvine’s data shows that Google is responsible for 30 percent of these connections, while Facebook is responsible for under 8 percent of connections. Note that Netflix’s share is less than 1 percent, which implies this statistic is not biased toward data-heavy services. Again, the numbers for Google and Facebook are a far cry from what Warren and others are claiming.

Source: Sandvine

Internet traffic measured by sources

I’m not sure whether either of these measures is preferable to what I offered in my original post, but each is at least a plausible measure of internet traffic — and all of them fall well short of Waren’s claimed 70 percent. What I do know is that the preferred metric offered by the people most critical of my post — external referrals to online publishers (content sites) — is decidedly not a plausible measure of internet traffic.

In defense of Warren, Jason Kint, the CEO of a trade association for digital content publishers, wrote, “I just checked actual benchmark data across our members (most publishers) and 67% of their external traffic comes through Google or Facebook.” Rand Fishkin cites his own analysis of data from Jumpshot showing that 66.0 percent of external referral visits were sent by Google and 5.1 percent were sent by Facebook.

In another response to my piece, former digital advertising executive, Dina Srinivasan, said, “[Percentage] of referrals is relevant because it is pointing out that two companies control a large [percentage] of business that comes through their door.” 

In my opinion, equating “external referrals to publishers” with “internet traffic” is unacceptable for at least two reasons.

First, the internet is much broader than traditional content publishers — it encompasses everything from email and Yelp to TikTok, Amazon, and Netflix. The relevant market is consumer attention and, in that sense, every internet supplier is bidding for scarce time. In a recent investor letter, Netflix said, “We compete with (and lose to) ‘Fortnite’ more than HBO,” adding: “There are thousands of competitors in this highly fragmented market vying to entertain consumers and low barriers to entry for those great experiences.” Previously, CEO Reed Hastings had only half-jokingly said, “We’re competing with sleep on the margin.” In this debate over internet traffic, the opposing side fails to grasp the scope of the internet market. It is unsuprising, then, that the one metric that does best at capturing attention — time spent — is about the same as bytes.

Second, and perhaps more important, even if we limit our analysis to publisher traffic, the external referral statistic these critics cite completely (and conveniently?) omits direct and internal traffic — traffic that represents the majority of publisher traffic. In fact, according to Parse.ly’s most recent data, which now includes more than 3,000 “high-traffic sites,” only 35 percent of total traffic comes from search and social referrers (as the graph below shows). Of course, Google and Facebook drive the majority of search and social referrals. But given that most users visit webpages without being referred at all, Google and Facebook are responsible for less than a third of total traffic

Source: Parse.ly

It is simply incorrect to say, as Srinivasan does, that external referrals offers a useful measurement of internet traffic because it captures a “large [percentage] of business that comes through [publishers’] door.” Well, “large” is relative, but the implication that these external referrals from Facebook and Google explain Warren’s 70%-of-internet-traffic claim is both factually incorrect and horribly misleading — especially in an antitrust context. 

It is factually incorrect because, at most, Google and Facebook are responsible for a third of the traffic on these sites; it is misleading because if our concern is ensuring that users can reach content sites without passing through Google or Facebook, the evidence is clear that they can and do — at least twice as often as they follow links from Google or Facebook to do so.

Conclusion

As my colleague Gus Hurwitz said, Warren is making a very specific and very alarming claim: 

There may be ‘softer’ versions of [Warren’s claim] that are reasonably correct (e.g., digital ad revenue, visibility into traffic). But for 99% of people hearing (and reporting on) these claims, they hear the hard version of the claim: Google and Facebook control 70% of what you do online. That claim is wrong, alarmist, misinformation, intended to foment fear, uncertainty, and doubt — to bootstrap the argument that ‘everything is terrible, worse, really!, and I’m here to save you.’ This is classic propaganda.

Google and Facebook do account for a 59 percent (and declining) share of US digital advertising. But that’s not what Warren said (nor would anyone try to claim with a straight face that “volume of advertising” was the same thing as “internet traffic”). And if our concern is with competition, it’s hard to look at the advertising market and conclude that it’s got a competition problem. Prices are falling like crazy (down 42 percent in the last decade), and volume is only increasing. If you add in offline advertising (which, whatever you think about market definition here, certainly competes with online advertising at the very least on some dimensions) Google and Facebook are responsible for only about 32 percent.

In her comments criticizing my article, Dina Srinivasan mentioned another of these “softer” versions:

Also, each time a publisher page loads, what [percentage] then queries Google or Facebook servers during the page loads? About 98+% of every page load. That stat is not even in Warren or your analysis. That is 1000% relevant.

It’s true that Google and Facebook have visibility into a great deal of all internet traffic (beyond their own) through a variety of products and services: browsers, content delivery networks (CDNs), web beacons, cloud computing, VPNs, data brokers, single sign-on (SSO), and web analytics services. But seeing internet traffic is not the same thing as “account[ing] for” — or controlling or even directly influencing — internet traffic. The first is a very different claim than the latter, and one with considerably more attenuated competitive relevance (if any). It certainly wouldn’t be a sufficient basis for advocating that Google and Facebook be broken up — which is probably why, although arguably accurate, it’s not the statistic upon which Warren based her proposal to do so.

No Comments

Be the first to start the conversation!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.