Archives For big data

[TOTM: The following is part of a blog series by TOTM guests and authors on the law, economics, and policy of the ongoing COVID-19 pandemic. The entire series of posts is available here.

This post is authored by Christine S. Wilson (Commissioner of the U.S. Federal Trade Commission).[1] The views expressed here are the author’s and do not necessarily reflect those of the Federal Trade Commission or any other Commissioner.]  

I type these words while subject to a stay-at-home order issued by West Virginia Governor James C. Justice II. “To preserve public health and safety, and to ensure the healthcare system in West Virginia is capable of serving all citizens in need,” I am permitted to leave my home only for a limited and precisely enumerated set of reasons. Billions of citizens around the globe are now operating under similar shelter-in-place directives as governments grapple with how to stem the tide of infection, illness and death inflicted by the global Covid-19 pandemic. Indeed, the first response of many governments has been to impose severe limitations on physical movement to contain the spread of the novel coronavirus. The second response contemplated by many, and the one on which this blog post focuses, involves the extensive collection and analysis of data in connection with people’s movements and health. Some governments are using that data to conduct sophisticated contact tracing, while others are using the power of the state to enforce orders for quarantines and against gatherings.

The desire to use modern technology on a broad scale for the sake of public safety is not unique to this moment. Technology is intended to improve the quality of our lives, in part by enabling us to help ourselves and one another. For example, cell towers broadcast wireless emergency alerts to all mobile devices in the area to warn us of extreme weather and other threats to safety in our vicinity. One well-known type of broadcast is the Amber Alert, which enables community members to assist in recovering an abducted child by providing descriptions of the abductor, the abductee and the abductor’s vehicle. Citizens who spot individuals and vehicles that meet these descriptions can then provide leads to law enforcement authorities. A private nonprofit organization, the National Center for Missing and Exploited Children, coordinates with state and local public safety officials to send out Amber Alerts through privately owned wireless carriers.

The robust civil society and free market in the U.S. make partnerships between the private sector and government agencies commonplace. But some of these arrangements involve a much more extensive sharing of Americans’ personal information with law enforcement than the emergency alert system does.

For example, Amazon’s home security product Ring advertises itself not only as a way to see when a package has been left at your door, but also as a way to make communities safer by turning over video footage to local police departments. In 2018, the company’s pilot program in Newark, New Jersey, donated more than 500 devices to homeowners to install at their homes in two neighborhoods, with a big caveat. Ring recipients were encouraged to share video with police. According to Ring, home burglaries in those neighborhoods fell by more than 50% from April through July 2018 relative to the same time period a year earlier.

Yet members of Congress and privacy experts have raised concerns about these partnerships, which now number in the hundreds. After receiving Amazon’s response to his inquiry, Senator Edward Markey highlighted Ring’s failure to prevent police from sharing video footage with third parties and from keeping the video permanently, and Ring’s lack of precautions to ensure that users collect footage only of adults and of users’ own property. The House of Representatives Subcommittee on Economic and Consumer Policy continues to investigate Ring’s police partnerships and data policies. The Electronic Frontier Foundation has called Ring “a perfect storm of privacy threats,” while the UK surveillance camera commissioner has warned against “a very real power to understand, to surveil you in a way you’ve never been surveilled before.”

Ring demonstrates clearly that it is not new for potential breaches of privacy to be encouraged in the name of public safety; police departments urge citizens to use Ring and share the videos with police to fight crime. But emerging developments indicate that, in the fight against Covid-19, we can expect to see more and more private companies placed in the difficult position of becoming complicit in government overreach.

At least mobile phone users can opt out of receiving Amber Alerts, and residents can refuse to put Ring surveillance systems on their property. The Covid-19 pandemic has made some other technological intrusions effectively impossible to refuse. For example, online proctors who monitor students over webcams to ensure they do not cheat on exams taken at home were once something that students could choose to accept if they did not want to take an exam where and when they could be proctored face to face. With public schools and universities across the U.S. closed for the rest of the semester, students who refuse to give private online proctors access to their webcams – and, consequently, the ability to view their surroundings – cannot take exams at all.

Existing technology and data practices already have made the Federal Trade Commission sensitive to potential consumer privacy and data security abuses. For decades, this independent, bipartisan agency has been enforcing companies’ privacy policies through its authority to police unfair and deceptive trade practices. It brought its first privacy and data security cases nearly 20 years ago, while I was Chief of Staff to then-Chairman Timothy J. Muris. The FTC took on Eli Lilly for disclosing the e-mail addresses of 669 subscribers to its Prozac reminder service – many of whom were government officials, and at a time of greater stigma for mental health issues – and Microsoft for (among other things) falsely claiming that its Passport website sign-in service did not collect any personally identifiable information other than that described in its privacy policy.

The privacy and data security practices of healthcare and software companies are likely to impact billions of people during the current coronavirus pandemic. The U.S. already has many laws on the books that are relevant to practices in these areas. One notable example is the Health Insurance Portability and Accountability Act, which set national standards for the protection of individually identifiable health information by health plans, health care clearinghouses and health care providers who accept non-cash payments. While the FTC does not enforce HIPAA, it does enforce the Health Breach Notification Rule, as well as the provisions in the FTC Act used to challenge the privacy missteps of Eli Lilly and many other companies.

But technological developments have created gaps in HIPAA enforcement. For example, HIPAA applies to doctors’ offices, hospitals and insurance companies, but it may not apply to wearables, smartphone apps or websites. Yet sensitive medical information is now commonly stored in places other than health care practitioners’ offices.  Your phone and watch now collect information about your blood sugar, exercise habits, fertility and heart health. 

Observers have pointed to these emerging gaps in coverage as evidence of the growing need for federal privacy legislation. I, too, have called on the U.S. Congress to enact comprehensive federal privacy legislation – not only to address these emerging gaps, but for two other reasons.  First, consumers need clarity regarding the types of data collected from them, and how those data are used and shared. I believe consumers can make informed decisions about which goods and services to patronize when they have the information they need to evaluate the costs and benefits of using those goods. Second, businesses need predictability and certainty regarding the rules of the road, given the emerging patchwork of regimes both at home and abroad.

Rules of the road regarding privacy practices will prove particularly instructive during this global pandemic, as governments lean on the private sector for data on the grounds that the collection and analysis of data can help avert (or at least diminish to some extent) a public health catastrophe. With legal lines in place, companies would be better equipped to determine when they are being asked to cross the line for the public good, and whether they should require a subpoena or inform customers before turning over data. It is regrettable that Congress has been unable to enact federal privacy legislation to guide this discussion.

Understandably, Congress does not have privacy at the top of its agenda at the moment, as the U.S. faces a public health crisis. As I write, more than 579,000 Americans have been diagnosed with Covid-19, and more than 22,000 have perished. Sadly, those numbers will only increase. And the U.S. is not alone in confronting this crisis: governments globally have confronted more than 1.77 million cases and more than 111,000 deaths. For a short time, health and safety issues may take precedence over privacy protections. But some of the initiatives to combat the coronavirus pandemic are worrisome. We are learning more every day about how governments are responding in a rapidly developing situation; what I describe in the next section constitutes merely the tip of the iceberg. These initiatives are worth highlighting here, as are potential safeguards for privacy and civil liberties that societies around the world would be wise to embrace.

Some observers view public/private partnerships based on an extensive use of technology and data as key to fighting the spread of Covid-19. For example, Professor Jane Bambauer calls for contact tracing and alerts “to be done in an automated way with the help of mobile service providers’ geolocation data.” She argues that privacy is merely “an instrumental right” that “is meant to achieve certain social goals in fairness, safety and autonomy. It is not an end in itself.” Given the “more vital” interests in health and the liberty to leave one’s house, Bambauer sees “a moral imperative” for the private sector “to ignore even express lack of consent” by an individual to the sharing of information about him.

This proposition troubles me because the extensive data sharing that has been proposed in some countries, and that is already occurring in many others, is not mundane. In the name of advertising and product improvements, private companies have been hoovering up personal data for years. What this pandemic lays bare, though, is that while this trove of information was collected under the guise of cataloguing your coffee preferences and transportation habits, it can be reprocessed in an instant to restrict your movements, impinge on your freedom of association, and silence your freedom of speech. Bambauer is calling for detailed information about an individual’s every movement to be shared with the government when, in the United States under normal circumstances, a warrant would be required to access this information.

Indeed, with our mobile devices acting as the “invisible policeman” described by Justice William O. Douglas in Berger v. New York, we may face “a bald invasion of privacy, far worse than the general warrants prohibited by the Fourth Amendment.” Backward-looking searches and data hoards pose new questions of what constitutes a “reasonable” search. The stakes are high – both here and abroad, citizens are being asked to allow warrantless searches by the government on an astronomical scale, all in the name of public health.  

Abroad

The first country to confront the coronavirus was China. The World Health Organization has touted the measures taken by China as “the only measures that are currently proven to interrupt or minimize transmission chains in humans.” Among these measures are the “rigorous tracking and quarantine of close contacts,” as well as “the use of big data and artificial intelligence (AI) to strengthen contact tracing and the management of priority populations.” An ambassador for China has said his government “optimized the protocol of case discovery and management in multiple ways like backtracking the cell phone positioning.” Much as the Communist Party’s control over China enabled it to suppress early reports of a novel coronavirus, this regime vigorously ensured its people’s compliance with the “stark” containment measures described by the World Health Organization.

Before the Covid-19 pandemic, Hong Kong already had been testing the use of “smart wristbands” to track the movements of prisoners. The Special Administrative Region now monitors people quarantined inside their homes by requiring them to wear wristbands that send information to the quarantined individuals’ smartphones and alert the Department of Health and Police if people leave their homes, break their wristbands or disconnect them from their smartphones. When first announced in early February, the wristbands were required only for people who had been to Wuhan in the past 14 days, but the program rapidly expanded to encompass every person entering Hong Kong. The government denied any privacy concerns about the electronic wristbands, saying the Privacy Commissioner for Personal Data had been consulted about the technology and agreed it could be used to ensure that quarantined individuals remain at home.

Elsewhere in Asia, Taiwan’s Chunghwa Telecom has developed a system that the local CDC calls an “electronic fence.” Specifically, the government obtains the SIM card identifiers for the mobile devices of quarantined individuals and passes those identifiers to mobile network operators, which use phone signals to their cell towers to alert public health and law enforcement agencies when the phone of a quarantined individual leaves a certain geographic range. In response to privacy concerns, the National Communications Commission said the system was authorized by special laws to prevent the coronavirus, and that it “does not violate personal data or privacy protection.” In Singapore, travelers and others issued Stay-Home Notices to remain in their residency 24 hours a day for 14 days must respond within an hour if contacted by government agencies by phone, text message or WhatsApp. And to assist with contact tracing, the government has encouraged everyone in the country to download TraceTogether, an app that uses Bluetooth to identify other nearby phones with the app and tracks when phones are in close proximity.

Israel’s Ministry of Health has launched an app for mobile devices called HaMagen (the shield) to prevent the spread of coronavirus by identifying contacts between diagnosed patients and people who came into contact with them in the 14 days prior to diagnosis. In March, the prime minister’s cabinet initially bypassed the legislative body to approve emergency regulations for obtaining without a warrant the cellphone location data and additional personal information of those diagnosed with or suspected of coronavirus infection. The government will send text messages to people who came into contact with potentially infected individuals, and will monitor the potentially infected person’s compliance with quarantine. The Ministry of Health will not hold this information; instead, it can make data requests to the police and Shin Bet, the Israel Security Agency. The police will enforce quarantine measures and Shin Bet will track down those who came into contact with the potentially infected.

Multiple Eastern European nations with constitutional protections for citizens’ rights of movement and privacy have superseded them by declaring a state of emergency. For example, in Hungary the declaration of a “state of danger” has enabled Prime Minister Viktor Orbán’s government to engage in “extraordinary emergency measures” without parliamentary consent.  His ministers have cited the possibility that coronavirus will prevent a gathering of a sufficient quorum of members of Parliament as making it necessary for the government to be able to act in the absence of legislative approval.

Member States of the European Union must protect personal data pursuant to the General Data Protection Regulation, and communications data, such as mobile location, pursuant to the ePrivacy Directive. The chair of the European Data Protection Board has observed that the ePrivacy Directive enables Member States to introduce legislative measures to safeguard public security. But if those measures allow for the processing of non-anonymized location data from mobile devices, individuals must have safeguards such as a right to a judicial remedy. “Invasive measures, such as the ‘tracking’ of individuals (i.e. processing of historical non-anonymized location data) could be considered proportional under exceptional circumstances and depending on the concrete modalities of the processing.” The EDPB has announced it will prioritize guidance on these issues.

EU Member States are already implementing such public security measures. For example, the government of Poland has by statute required everyone under a quarantine order due to suspected infection to download the “Home Quarantine” smartphone app. Those who do not install and use the app are subject to a fine. The app verifies users’ compliance with quarantine through selfies and GPS data. Users’ personal data will be administered by the Minister of Digitization, who has appointed a data protection officer. Each user’s identification, name, telephone number, quarantine location and quarantine end date can be shared with police and other government agencies. After two weeks, if the user does not report symptoms of Covid-19, the account will be deactivated — but the data will be stored for six years. The Ministry of Digitization claims that it must store the data for six years in case users pursue claims against the government. However, local privacy expert and Panoptykon Foundation cofounder Katarzyna Szymielewicz has questioned this rationale.

Even other countries that are part of the Anglo-American legal tradition are ramping up their use of data and working with the private sector to do so. The UK’s National Health Service is developing a data store that will include online/call center data from NHS Digital and Covid-19 test result data from the public health agency. While the NHS is working with private partner organizations and companies including Microsoft, Palantir Technologies, Amazon Web Services and Google, it has promised to keep all the data under its control, and to require those partners to destroy or return the data “once the public health emergency situation has ended.” The NHS also has committed to meet the requirements of data protection legislation by ensuring that individuals cannot be re-identified from the data in the data store.

Notably, each of the companies partnering with the NHS at one time or another has been subjected to scrutiny for its privacy practices. Some observers have noted that tech companies, which have been roundly criticized for a variety of reasons in recent years, may seek to use this pandemic for “reputation laundering.” As one observer cautioned: “Reputations matter, and there’s no reason the government or citizens should cast bad reputations aside when choosing who to work with or what to share” during this public health crisis.

At home

In the U.S., the federal government last enforced large-scale isolation and quarantine measures during the influenza (“Spanish Flu”) pandemic a century ago. But the Centers for Disease Control and Prevention track diseases on a daily basis by receiving case notifications from every state. The states mandate that healthcare providers and laboratories report certain diseases to the local public health authorities using personal identifiers. In other words, if you test positive for coronavirus, the government will know. Every state has laws authorizing quarantine and isolation, usually through the state’s health authority, while the CDC has authority through the federal Public Health Service Act and a series of presidential executive orders to exercise quarantine and isolation powers for specific diseases, including severe acute respiratory syndromes (a category into which the novel coronavirus falls).

Now local governments are issuing orders that empower law enforcement to fine and jail Americans for failing to practice social distancing. State and local governments have begun arresting and charging people who violate orders against congregating in groups. Rhode Island is requiring every non-resident who enters the state to be quarantined for two weeks, with police checks at the state’s transportation hubs and borders.

How governments discover violations of quarantine and social distancing orders will raise privacy concerns. Police have long been able to enforce based on direct observation of violations. But if law enforcement authorities identify violations of such orders based on data collection rather than direct observation, the Fourth Amendment may be implicated. In Jones and Carpenter, the Supreme Court has limited the warrantless tracking of Americans through GPS devices placed on their cars and through cellphone data. But building on the longstanding practice of contact tracing in fighting infectious diseases such as tuberculosis, GPS data has proven helpful in fighting the spread of Covid-19. This same data, though, also could be used to piece together evidence of violations of stay-at-home orders. As Chief Justice John Roberts wrote in Carpenter, “With access to [cell-site location information], the government can now travel back in time to retrace a person’s whereabouts… Whoever the suspect turns out to be, he has effectively been tailed every moment of every day for five years.”

The Fourth Amendment protects American citizens from government action, but the “reasonable expectation of privacy” test applied in Fourth Amendment cases connects the arenas of government action and commercial data collection. As Professor Paul Ohm of the Georgetown University Law Center notes, “the dramatic expansion of technologically-fueled corporate surveillance of our private lives automatically expands police surveillance too, thanks to the way the Supreme Court has construed the reasonable expectation of privacy test and the third-party doctrine.”

For example, the COVID-19 Mobility Data Network – infectious disease epidemiologists working with Facebook, Camber Systems and Cubiq – uses mobile device data to inform state and local governments about whether social distancing orders are effective. The tech companies give the researchers aggregated data sets; the researchers give daily situation reports to departments of health, but say they do not share the underlying data sets with governments. The researchers have justified this model based on users of the private companies’ apps having consented to the collection and sharing of data.

However, the assumption that consumers have given informed consent to the collection of their data (particularly for the purpose of monitoring their compliance with social isolation measures during a pandemic) is undermined by studies showing the average consumer does not understand all the different types of data that are collected and how their information is analyzed and shared with third parties – including governments. Technology and telecommunications companies have neither asked me to opt into tracking for public health nor made clear how they are partnering with federal, state and local governments. This practice highlights that data will be divulged in ways consumers cannot imagine – because no one assumed a pandemic when agreeing to a company’s privacy policy. This information asymmetry is part of why we need federal privacy legislation.

On Friday afternoon, Apple and Google announced their opt-in Covid-19 contact tracing technology. The owners of the two most common mobile phone operating systems in the U.S. said that in May they would release application programming interfaces that enable interoperability between iOS and Android devices using official contact tracing apps from public health authorities. At an unspecified date, Bluetooth-based contact tracing will be built directly into the operating systems. “Privacy, transparency, and consent are of utmost importance in this effort,” the companies said in their press release.  

At this early stage, we do not yet know exactly how the proposed Google/Apple contact tracing system will operate. It sounds similar to Singapore’s TraceTogether, which is already available in the iOS and Android mobile app stores (it has a 3.3 out of 5 average rating in the former and a 4.0 out of 5 in the latter). TraceTogether is also described as a voluntary, Bluetooth-based system that avoids GPS location data, does not upload information without the user’s consent, and uses changing, encrypted identifiers to maintain user anonymity. Perhaps the most striking difference, at least to a non-technical observer, is that TraceTogether was developed and is run by the Singaporean government, which has been a point of concern for some observers. The U.S. version – like finding abducted children through Amber Alerts and fighting crime via Amazon Ring – will be a partnership between the public and private sectors.     

Recommendations

The global pandemic we now face is driving data usage in ways not contemplated by consumers. Entities in the private and public sector are confronting new and complex choices about data collection, usage and sharing. Organizations with Chief Privacy Officers, Chief Information Security Officers, and other personnel tasked with managing privacy programs are, relatively speaking, well-equipped to address these issues. Despite the extraordinary circumstances, senior management should continue to rely on the expertise and sound counsel of their CPOs and CISOs, who should continue to make decisions based on their established privacy and data security programs. Although developments are unfolding at warp speed, it is important – arguably now, more than ever – to be intentional about privacy decisions.

For organizations that lack experience with privacy and data security programs (and individuals tasked with oversight for these areas), now is a great time to pause, do some research and exercise care. It is essential to think about the longer-term ramifications of choices made about data collection, use and sharing during the pandemic. The FTC offers easily accessible resources, including Protecting Personal Information: A Guide for Business, Start with Security: A Guide for Business, and Stick with Security: A Business Blog Series. While the Gramm-Leach-Bliley Act (GLB) applies only to financial institutions, the FTC’s GLB compliance blog outlines some data security best practices that apply more broadly. The National Institute for Standards and Technology (NIST) also offers security and privacy resources, including a privacy framework to help organizations identify and manage privacy risks. Private organizations such as the Center for Information Policy Leadership, the International Association of Privacy Professionals and the App Association also offer helpful resources, as do trade associations. While it may seem like a suboptimal time to take a step back and focus on these strategic issues, remember that privacy and data security missteps can cause irrevocable harm. Counterintuitively, now is actually the best time to be intentional about choices in these areas.

Best practices like accountability, risk assessment and risk management will be key to navigating today’s challenges. Companies should take the time to assess and document the new and/or expanded risks from the data collection, use and sharing of personal information. It is appropriate for these risk assessments to incorporate potential benefits and harms not only to the individual and the company, but for society as a whole. Upfront assessments can help companies establish controls and incentives to facilitate responsible behavior, as well as help organizations demonstrate that they are fully aware of the impact of their choices (risk assessment) and in control of their impact on people and programs (risk mitigation). Written assessments can also facilitate transparency with stakeholders, raise awareness internally about policy choices and assist companies with ongoing monitoring and enforcement. Moreover, these assessments will facilitate a return to “normal” data practices when the crisis has passed.  

In a similar vein, companies must engage in comprehensive vendor management with respect to the entities that are proposing to use and analyze their data. In addition to vetting proposed data recipients thoroughly, companies must be selective concerning the categories of information shared. The benefits of the proposed research must be balanced against individual protections, and companies should share only those data necessary to achieve the stated goals. To the extent feasible, data should be shared in de-identified and aggregated formats and data recipients should be subject to contractual obligations prohibiting them from re-identification. Moreover, companies must have policies in place to ensure compliance with research contracts, including data deletion obligations and prohibitions on data re-identification, where appropriate. Finally, companies must implement mechanisms to monitor third party compliance with contractual obligations.

Similar principles of necessity and proportionality should guide governments as they make demands or requests for information from the private sector. Governments must recognize the weight with which they speak during this crisis and carefully balance data collection and usage with civil liberties. In addition, governments also have special obligations to ensure that any data collection done by them or at their behest is driven by the science of Covid-19; to be transparent with citizens about the use of data; and to provide due process for those who wish to challenge limitations on their rights. Finally, government actors should apply good data hygiene, including regularly reassessing the breadth of their data collection initiatives and incorporating data retention and deletion policies. 

In theory, government’s role could be reduced as market-driven responses emerge. For example, assuming the existence of universally accessible daily coronavirus testing with accurate results even during the incubation period, Hal Singer’s proposal for self-certification of non-infection among private actors is intriguing. Thom Lambert identified the inability to know who is infected as a “lemon problem;” Singer seeks a way for strangers to verify each other’s “quality” in the form of non-infection.

Whatever solutions we may accept in a pandemic, it is imperative to monitor the coronavirus situation as it improves, to know when to lift the more dire measures. Former Food and Drug Administration Commissioner Scott Gottlieb and other observers have called for maintaining surveillance because of concerns about a resurgence of the virus later this year. For any measures that conflict with Americans’ constitutional rights to privacy and freedom of movement, there should be metrics set in advance for the conditions that will indicate when such measures are no longer justified. In the absence of pre-determined metrics, governments may feel the same temptation as Hungary’s prime minister to keep renewing a “state of danger” that overrides citizens’ rights. As Slovak lawmaker Tomas Valasek has said, “It doesn’t just take the despots and the illiberals of this world, like Orbán, to wreak damage.” But privacy is not merely instrumental to other interests, and we do not have to sacrifice our right to it indefinitely in exchange for safety.

I recognize that halting the spread of the virus will require extensive and sustained effort, and I credit many governments with good intentions in attempting to save the lives of their citizens. But I refuse to accept that we must sacrifice privacy to reopen the economy. It seems a false choice to say that I must sacrifice my Constitutional rights to privacy, freedom of association and free exercise of religion for another’s freedom of movement. Society should demand that equity, fairness and autonomy be respected in data uses, even in a pandemic. To quote Valasek again: “We need to make sure that we don’t go a single inch further than absolutely necessary in curtailing civil liberties in the name of fighting for public health.” History has taught us repeatedly that sweeping security powers granted to governments during an emergency persist long after the crisis has abated. To resist the gathering momentum toward this outcome, I will continue to emphasize the FTC’s learning on appropriate data collection and use. But my remit as an FTC Commissioner is even broader – when I was sworn in on Sept. 26, 2018, I took an oath to “support and defend the Constitution of the United States” – and so I shall.


[1] Many thanks to my Attorney Advisors Pallavi Guniganti and Nina Frant for their invaluable assistance in preparing this article.

Why Data Is Not the New Oil

Alec Stapp —  8 October 2019

“Data is the new oil,” said Jaron Lanier in a recent op-ed for The New York Times. Lanier’s use of this metaphor is only the latest instance of what has become the dumbest meme in tech policy. As the digital economy becomes more prominent in our lives, it is not unreasonable to seek to understand one of its most important inputs. But this analogy to the physical economy is fundamentally flawed. Worse, introducing regulations premised upon faulty assumptions like this will likely do far more harm than good. Here are seven reasons why “data is the new oil” misses the mark:

1. Oil is rivalrous; data is non-rivalrous

If someone uses a barrel of oil, it can’t be consumed again. But, as Alan McQuinn, a senior policy analyst at the Information Technology and Innovation Foundation, noted, “when consumers ‘pay with data’ to access a website, they still have the same amount of data after the transaction as before. As a result, users have an infinite resource available to them to access free online services.” Imposing restrictions on data collection makes this infinite resource finite. 

2. Oil is excludable; data is non-excludable

Oil is highly excludable because, as a physical commodity, it can be stored in ways that prevent use by non-authorized parties. However, as my colleagues pointed out in a recent comment to the FTC: “While databases may be proprietary, the underlying data usually is not.” They go on to argue that this can lead to under-investment in data collection:

[C]ompanies that have acquired a valuable piece of data will struggle both to prevent their rivals from obtaining the same data as well as to derive competitive advantage from the data. For these reasons, it also  means that firms may well be more reluctant to invest in data generation than is socially optimal. In fact, to the extent this is true there is arguably more risk of companies under-investing in data  generation than of firms over-investing in order to create data troves with which to monopolize a market. This contrasts with oil, where complete excludability is the norm.

3. Oil is fungible; data is non-fungible

Oil is a commodity, so, by definition, one barrel of oil of a given grade is equivalent to any other barrel of that grade. Data, on the other hand, is heterogeneous. Each person’s data is unique and may consist of a practically unlimited number of different attributes that can be collected into a profile. This means that oil will follow the law of one price, while a dataset’s value will be highly contingent on its particular properties and commercialization potential.

4. Oil has positive marginal costs; data has zero marginal costs

There is a significant expense to producing and distributing an additional barrel of oil (as low as $5.49 per barrel in Saudi Arabia; as high as $21.66 in the U.K.). Data is merely encoded information (bits of 1s and 0s), so gathering, storing, and transferring it is nearly costless (though, to be clear, setting up systems for collecting and processing can be a large fixed cost). Under perfect competition, the market clearing price is equal to the marginal cost of production (hence why data is traded for free services and oil still requires cold, hard cash).

5. Oil is a search good; data is an experience good

Oil is a search good, meaning its value can be assessed prior to purchasing. By contrast, data tends to be an experience good because companies don’t know how much a new dataset is worth until it has been combined with pre-existing datasets and deployed using algorithms (from which value is derived). This is one reason why purpose limitation rules can have unintended consequences. If firms are unable to predict what data they will need in order to develop new products, then restricting what data they’re allowed to collect is per se anti-innovation.

6. Oil has constant returns to scale; data has rapidly diminishing returns

As an energy input into a mechanical process, oil has relatively constant returns to scale (e.g., when oil is used as the fuel source to power a machine). When data is used as an input for an algorithm, it shows rapidly diminishing returns, as the charts collected in a presentation by Google’s Hal Varian demonstrate. The initial training data is hugely valuable for increasing an algorithm’s accuracy. But as you increase the dataset by a fixed amount each time, the improvements steadily decline (because new data is only helpful in so far as it’s differentiated from the existing dataset).

7. Oil is valuable; data is worthless

The features detailed above — rivalrousness, fungibility, marginal cost, returns to scale — all lead to perhaps the most important distinction between oil and data: The average barrel of oil is valuable (currently $56.49) and the average dataset is worthless (on the open market). As Will Rinehart showed, putting a price on data is a difficult task. But when data brokers and other intermediaries in the digital economy do try to value data, the prices are almost uniformly low. The Financial Times had the most detailed numbers on what personal data is sold for in the market:

  • “General information about a person, such as their age, gender and location is worth a mere $0.0005 per person, or $0.50 per 1,000 people.”
  • “A person who is shopping for a car, a financial product or a vacation is more valuable to companies eager to pitch those goods. Auto buyers, for instance, are worth about $0.0021 a pop, or $2.11 per 1,000 people.”
  • “Knowing that a woman is expecting a baby and is in her second trimester of pregnancy, for instance, sends the price tag for that information about her to $0.11.”
  • “For $0.26 per person, buyers can access lists of people with specific health conditions or taking certain prescriptions.”
  • “The company estimates that the value of a relatively high Klout score adds up to more than $3 in word-of-mouth marketing value.”
  • [T]he sum total for most individuals often is less than a dollar.

Data is a specific asset, meaning it has “a significantly higher value within a particular transacting relationship than outside the relationship.” We only think data is so valuable because tech companies are so valuable. In reality, it is the combination of high-skilled labor, large capital expenditures, and cutting-edge technologies (e.g., machine learning) that makes those companies so valuable. Yes, data is an important component of these production functions. But to claim that data is responsible for all the value created by these businesses, as Lanier does in his NYT op-ed, is farcical (and reminiscent of the labor theory of value). 

Conclusion

People who analogize data to oil or gold may merely be trying to convey that data is as valuable in the 21st century as those commodities were in the 20th century (though, as argued, a dubious proposition). If the comparison stopped there, it would be relatively harmless. But there is a real risk that policymakers might take the analogy literally and regulate data in the same way they regulate commodities. As this article shows, data has many unique properties that are simply incompatible with 20th-century modes of regulation.

A better — though imperfect — analogy, as author Bernard Marr suggests, would be renewable energy. The sources of renewable energy are all around us — solar, wind, hydroelectric — and there is more available than we could ever use. We just need the right incentives and technology to capture it. The same is true for data. We leave our digital fingerprints everywhere — we just need to dust for them.

Michael Sykuta is Associate Professor, Agricultural and Applied Economics, and Director, Contracting Organizations Research Institute at the University of Missouri.

The US agriculture sector has been experiencing consolidation at all levels for decades, even as the global ag economy has been growing and becoming more diverse. Much of this consolidation has been driven by technological changes that created economies of scale, both at the farm level and beyond.

Likewise, the role of technology has changed the face of agriculture, particularly in the past 20 years since the commercial introduction of the first genetically modified (GMO) crops. However, biotechnology itself comprises only a portion of the technology change. The development of global positioning systems (GPS) and GPS-enabled equipment have created new opportunities for precision agriculture, whether for the application of crop inputs, crop management, or yield monitoring. The development of unmanned and autonomous vehicles and remote sensing technologies, particularly unmanned aerial vehicles (i.e. UAVs, or “drones”), have created new opportunities for field scouting, crop monitoring, and real-time field management. And currently, the development of Big Data analytics is promising to combine all of the different types of data associated with agricultural production in ways intended to improve the application of all the various technologies and to guide production decisions.

Now, with the pending mergers of several major agricultural input and life sciences companies, regulators are faced with a challenge: How to evaluate the competitive effects of such mergers in the face of such a complex and dynamic technology environment—particularly when these technologies are not independent of one another? What is the relevant market for considering competitive effects and what are the implications for technology development? And how does the nature of the technology itself implicate the economic efficiencies underlying these mergers?

Before going too far, it is important to note that while the three cases currently under review (i.e., ChemChina/Syngenta, Dow/DuPont, and Bayer/Monsanto) are frequently lumped together in discussions, the three present rather different competitive cases—particularly within the US. For instance, ChemChina’s acquisition of Syngenta will not, in itself, meaningfully change market concentration. However, financial backing from ChemChina may allow Syngenta to buy up the discards from other deals, such as the parts of DuPont that the EU Commission is requiring to be divested or the seed assets Bayer is reportedly looking to sell to preempt regulatory concerns, as well as other smaller competitors.

Dow-DuPont is perhaps the most head-to-head of the three mergers in terms of R&D and product lines. Both firms are in the top five in the US for pesticide manufacturing and for seeds. However, the Dow-DuPont merger is about much more than combining agricultural businesses. The Dow-DuPont deal specifically aims to create and spin-off three different companies specializing in agriculture, material science, and specialty products. Although agriculture may be the business line in which the companies most overlap, it represents just over 21% of the combined businesses’ annual revenues.

Bayer-Monsanto is yet a different sort of pairing. While both companies are among the top five in US pesticide manufacturing (with combined sales less than Syngenta and about equal to Dow without DuPont), Bayer is a relatively minor player in the seed industry. Likewise, Monsanto is focused almost exclusively on crop production and digital farming technologies, offering little overlap to Bayer’s human health or animal nutrition businesses.

Despite the differences in these deals, they tend to be lumped together and discussed almost exclusively in the context of pesticide manufacturing or crop protection more generally. In so doing, the discussion misses some important aspects of these deals that may mitigate traditional competitive concerns within the pesticide industry.

Mergers as the Key to Unlocking Innovation and Value

First, as the Dow-DuPont merger suggests, mergers may be the least-cost way of (re)organizing assets in ways that maximize value. This is especially true for R&D-intensive industries where intellectual property and innovation are at the core of competitive advantage. Absent the protection of common ownership, neither party would have an incentive to fully disclose the nature of its IP and innovation pipeline. In this case, merging interests increases the efficiency of information sharing so that managers can effectively evaluate and reorganize assets in ways that maximize innovation and return on investment.

Dow and DuPont each have a wide range of areas of application. Both groups of managers recognize that each of their business lines would be stronger as focused, independent entities; but also recognize that the individual elements of their portfolios would be stronger if combined with those of the other company. While the EU Commission argues that Dow-DuPont would reduce the incentive to innovate in the pesticide industry—a dubious claim in itself—the commission seems to ignore the potential increases in efficiency, innovation and ability to serve customer interests across all three of the proposed new businesses. At a minimum, gains in those industries should be weighed against any alleged losses in the agriculture industry.

This is not the first such agricultural and life sciences “reorganization through merger”. The current manifestation of Monsanto is the spin-off of a previous merger between Monsanto and Pharmacia & Upjohn in 2000 that created today’s Pharmacia. At the time of the Pharmacia transaction, Monsanto had portfolios in agricultural products, chemicals, and pharmaceuticals. After reorganizing assets within Pharmacia, three business lines were created: agricultural products (the current Monsanto), pharmaceuticals (now Pharmacia, a subsidiary of Pfizer), and chemicals (now Solutia, a subsidiary of Eastman Chemical Co.). Merging interests allowed Monsanto and Pharmacia & Upjohn to create more focused business lines that were better positioned to pursue innovations and serve customers in their respective industries.

In essence, Dow-DuPont is following the same playbook. Although such intentions have not been announced, Bayer’s broad product portfolio suggests a similar long-term play with Monsanto is likely.

Interconnected Technologies, Innovation, and the Margins of Competition

As noted above, regulatory scrutiny of these three mergers focuses on them in the context of pesticide or agricultural chemical manufacturing. However, innovation in the ag chemicals industry is intricately interwoven with developments in other areas of agricultural technology that have rather different competition and innovation dynamics. The current technological wave in agriculture involves the use of Big Data to create value using the myriad data now available through GPS-enabled precision farming equipment. Monsanto and DuPont, through its Pioneer subsidiary, are both players in this developing space, sometimes referred to as “digital farming”.

Digital farming services are intended to assist farmers’ production decision making and increase farm productivity. Using GPS-coded field maps that include assessments of soil conditions, combined with climate data for the particular field, farm input companies can recommend the types of rates of applications for soil conditioning pre-harvest, seed types for planting, and crop protection products during the growing season. Yield monitors at harvest provide outcomes data for feedback to refine and improve the algorithms that are used in subsequent growing seasons.

The integration of digital farming services with seed and chemical manufacturing offers obvious economic benefits for farmers and competitive benefits for service providers. Input manufacturers have incentive to conduct data analytics that individual farmers do not. Farmers have limited analytic resources and relatively small returns to investing in such resources, while input manufacturers have broad market potential for their analytic services. Moreover, by combining data from a broad cross-section of farms, digital farming service companies have access to the data necessary to identify generalizable correlations between farm plot characteristics, input use, and yield rates.

But the value of the information developed through these analytics is not unidirectional in its application and value creation. While input manufacturers may be able to help improve farmers’ operations given the current stock of products, feedback about crop traits and performance also enhances R&D for new product development by identifying potential product attributes with greater market potential. By combining product portfolios, agricultural companies can not only increase the value of their data-driven services for farmers, but more efficiently target R&D resources to their highest potential use.

The synergy between input manufacturing and digital farming notwithstanding, seed and chemical input companies are not the only players in the digital farming space. Equipment manufacturer John Deere was an early entrant in exploiting the information value of data collected by sensors on its equipment. Other remote sensing technology companies have incentive to develop data analytic tools to create value for their data-generating products. Even downstream companies, like ADM, have expressed interest in investing in digital farming assets that might provide new revenue streams with their farmer-suppliers as well as facilitate more efficient specialty crop and identity-preserved commodity-based value chains.

The development of digital farming is still in its early stages and is far from a sure bet for any particular player. Even Monsanto has pulled back from its initial foray into prescriptive digital farming (call FieldScripts). These competitive forces will affect the dynamics of competition at all stages of farm production, including seed and chemicals. Failure to account for those dynamics, and the potential competitive benefits input manufacturers may provide, could lead regulators to overestimate any concerns of competitive harm from the proposed mergers.

Conclusion

Farmers are concerned about the effects of these big-name tie-ups. Farmers may be rightly concerned, but for the wrong reasons. Ultimately, the role of the farmer continues to be diminished in the agricultural value chain. As precision agriculture tools and Big Data analytics reduce the value of idiosyncratic or tacit knowledge at the farm level, the managerial human capital of farmers becomes relatively less important in terms of value-added. It would be unwise to confuse farmers’ concerns regarding the competitive effects of the kinds of mergers we’re seeing now with the actual drivers of change in the agricultural value chain.

The CPI Antitrust Chronicle published Geoffrey Manne’s and my recent paperThe Problems and Perils of Bootstrapping Privacy and Data into an Antitrust Framework as part of a symposium on Big Data in the May 2015 issue. All of the papers are worth reading and pondering, but of course ours is the best ;).

In it, we analyze two of the most prominent theories of antitrust harm arising from data collection: privacy as a factor of non-price competition, and price discrimination facilitated by data collection. We also analyze whether data is serving as a barrier to entry and effectively preventing competition. We argue that, in the current marketplace, there are no plausible harms to competition arising from either non-price effects or price discrimination due to data collection online and that there is no data barrier to entry preventing effective competition.

The issues of how to regulate privacy issues and what role competition authorities should in that, are only likely to increase in importance as the Internet marketplace continues to grow and evolve. The European Commission and the FTC have been called on by scholars and advocates to take greater consideration of privacy concerns during merger review and encouraged to even bring monopolization claims based upon data dominance. These calls should be rejected unless these theories can satisfy the rigorous economic review of antitrust law. In our humble opinion, they cannot do so at this time.

Excerpts:

PRIVACY AS AN ELEMENT OF NON-PRICE COMPETITION

The Horizontal Merger Guidelines have long recognized that anticompetitive effects may “be manifested in non-price terms and conditions that adversely affect customers.” But this notion, while largely unobjectionable in the abstract, still presents significant problems in actual application.

First, product quality effects can be extremely difficult to distinguish from price effects. Quality-adjusted price is usually the touchstone by which antitrust regulators assess prices for competitive effects analysis. Disentangling (allegedly) anticompetitive quality effects from simultaneous (neutral or pro-competitive) price effects is an imprecise exercise, at best. For this reason, proving a product-quality case alone is very difficult and requires connecting the degradation of a particular element of product quality to a net gain in advantage for the monopolist.

Second, invariably product quality can be measured on more than one dimension. For instance, product quality could include both function and aesthetics: A watch’s quality lies in both its ability to tell time as well as how nice it looks on your wrist. A non-price effects analysis involving product quality across multiple dimensions becomes exceedingly difficult if there is a tradeoff in consumer welfare between the dimensions. Thus, for example, a smaller watch battery may improve its aesthetics, but also reduce its reliability. Any such analysis would necessarily involve a complex and imprecise comparison of the relative magnitudes of harm/benefit to consumers who prefer one type of quality to another.

PRICE DISCRIMINATION AS A PRIVACY HARM

If non-price effects cannot be relied upon to establish competitive injury (as explained above), then what can be the basis for incorporating privacy concerns into antitrust? One argument is that major data collectors (e.g., Google and Facebook) facilitate price discrimination.

The argument can be summed up as follows: Price discrimination could be a harm to consumers that antitrust law takes into consideration. Because companies like Google and Facebook are able to collect a great deal of data about their users for analysis, businesses could segment groups based on certain characteristics and offer them different deals. The resulting price discrimination could lead to many consumers paying more than they would in the absence of the data collection. Therefore, the data collection by these major online companies facilitates price discrimination that harms consumer welfare.

This argument misses a large part of the story, however. The flip side is that price discrimination could have benefits to those who receive lower prices from the scheme than they would have in the absence of the data collection, a possibility explored by the recent White House Report on Big Data and Differential Pricing.

While privacy advocates have focused on the possible negative effects of price discrimination to one subset of consumers, they generally ignore the positive effects of businesses being able to expand output by serving previously underserved consumers. It is inconsistent with basic economic logic to suggest that a business relying on metrics would want to serve only those who can pay more by charging them a lower price, while charging those who cannot afford it a larger one. If anything, price discrimination would likely promote more egalitarian outcomes by allowing companies to offer lower prices to poorer segments of the population—segments that can be identified by data collection and analysis.

If this group favored by “personalized pricing” is as big as—or bigger than—the group that pays higher prices, then it is difficult to state that the practice leads to a reduction in consumer welfare, even if this can be divorced from total welfare. Again, the question becomes one of magnitudes that has yet to be considered in detail by privacy advocates.

DATA BARRIER TO ENTRY

Either of these theories of harm is predicated on the inability or difficulty of competitors to develop alternative products in the marketplace—the so-called “data barrier to entry.” The argument is that upstarts do not have sufficient data to compete with established players like Google and Facebook, which in turn employ their data to both attract online advertisers as well as foreclose their competitors from this crucial source of revenue. There are at least four reasons to be dubious of such arguments:

  1. Data is useful to all industries, not just online companies;
  2. It’s not the amount of data, but how you use it;
  3. Competition online is one click or swipe away; and
  4. Access to data is not exclusive

CONCLUSION

Privacy advocates have thus far failed to make their case. Even in their most plausible forms, the arguments for incorporating privacy and data concerns into antitrust analysis do not survive legal and economic scrutiny. In the absence of strong arguments suggesting likely anticompetitive effects, and in the face of enormous analytical problems (and thus a high risk of error cost), privacy should remain a matter of consumer protection, not of antitrust.

On Wednesday, March 18, our fellow law-and-economics-focused brethren at George Mason’s Law and Economics Center will host a very interesting morning briefing on the intersection of privacy, big data, consumer protection, and antitrust. FTC Commissioner Maureen Ohlhausen will keynote and she will be followed by what looks like will be a lively panel discussion. If you are in DC you can join in person, but you can also watch online. More details below.
Please join the LEC in person or online for a morning of lively discussion on this topic. FTC Commissioner Maureen K. Ohlhausen will set the stage by discussing her Antitrust Law Journal article, “Competition, Consumer Protection and The Right [Approach] To Privacy“. A panel discussion on big data and antitrust, which includes some of the leading thinkers on the subject, will follow.
Other featured speakers include:

Allen P. Grunes
Founder, The Konkurrenz Group and Data Competition Institute

Andres Lerner
Executive Vice President, Compass Lexecon

Darren S. Tucker
Partner, Morgan Lewis

Nathan Newman
Director, Economic and Technology Strategies LLC

Moderator: James C. Cooper
Director, Research and Policy, Law & Economics Center

A full agenda is available click here.