Archives For copyright law

Output of the LG Research AI to the prompt: “a system of copyright for artificial intelligence”

Not only have digital-image generators like Stable Diffusion, DALL-E, and Midjourney—which make use of deep-learning models and other artificial-intelligence (AI) systems—created some incredible (and sometimes creepy – see above) visual art, but they’ve engendered a good deal of controversy, as well. Human artists have banded together as part of a fledgling anti-AI campaign; lawsuits have been filed; and policy experts have been trying to think through how these machine-learning systems interact with various facets of the law.

Debates about the future of AI have particular salience for intellectual-property rights. Copyright is notoriously difficult to protect online, and these expert systems add an additional wrinkle: it can at least argued that their outputs can be unique creations. There are also, of course, moral and philosophical objections to those arguments, with many grounded in the supposition that only a human (or something with a brain, like humans) can be creative.

Leaving aside for the moment a potentially pitched battle over the definition of “creation,” we should be able to find consensus that at least some of these systems produce unique outputs and are not merely cutting and pasting other pieces of visual imagery into a new whole. That is, at some level, the machines are engaging in a rudimentary sort of “learning” about how humans arrange colors and lines when generating images of certain subjects. The machines then reconstruct this process and produce a new set of lines and colors that conform to the patterns they found in the human art.

But that isn’t the end of the story. Even if some of these systems’ outputs are unique and noninfringing, the way the machines learn—by ingesting existing artwork—can raise a number of thorny issues. Indeed, these systems are arguably infringing copyright during the learning phase, and such use may not survive a fair-use analysis.

We are still in the early days of thinking through how this new technology maps onto the law. Answers will inevitably come, but for now, there are some very interesting questions about the intellectual-property implications of AI-generated art, which I consider below.

The Points of Collision Between Intellectual Property Law and AI-Generated Art

AI-generated art is not a single thing. It is, rather, a collection of differing processes, each with different implications for the law. For the purposes of this post, I am going to deal with image-generation systems that use “generated adversarial networks” (GANs) and diffusion models. The various implementations of each will differ in some respects, but from what I understand, the ways that these techniques can be used generate all sorts of media are sufficiently similar that we can begin to sketch out some of their legal implications. 

A (very) brief technical description

This is a very high-level overview of how these systems work; for a more detailed (but very readable) description, see here.

A GAN is a type of machine-learning model that consists of two parts: a generator and a discriminator. The generator is trained to create new images that look like they come from a particular dataset, while the discriminator is trained to distinguish the generated images from real images in the dataset. The two parts are trained together in an adversarial manner, with the generator trying to produce images that can fool the discriminator and the discriminator trying to correctly identify the generated images.

A diffusion model, by contrast, analyzes the distribution of information in an image, as noise is progressively added to it. This kind of algorithm analyzes characteristics of sample images—like the distribution of colors or lines—in order to “understand” what counts as an accurate representation of a subject (i.e., what makes a picture of a cat look like a cat and not like a dog).

For example, in the generation phase, systems like Stable Diffusion start with randomly generated noise, and work backward in “denoising” steps to essentially “see” shapes:

The sampled noise is predicted so that if we subtract it from the image, we get an image that’s closer to the images the model was trained on (not the exact images themselves, but the distribution – the world of pixel arrangements where the sky is usually blue and above the ground, people have two eyes, cats look a certain way – pointy ears and clearly unimpressed).

It is relevant here that, once networks using these techniques are trained, they do not need to rely on saved copies of the training images in order to generate new images. Of course, it’s possible that some implementations might be designed in a way that does save copies of those images, but for the purposes of this post, I will assume we are talking about systems that save known works only during the training phase. The models that are produced during training are, in essence, instructions to a different piece of software about how to start with a text prompt from a user—a palette of pure noise—and progressively “discover” signal in that image until some new image emerges.

Input-stage use of intellectual property

The creators of OpenAI, one of the most popular AI tools, are not shy about their use of protected works in the training phase of AI algorithms. In comments to the U.S. Patent and Trademark Office (PTO), they note that:

…[m]odern AI systems require large amounts of data. For certain tasks, that data is derived from existing publicly accessible “corpora”… of data that include copyrighted works. By analyzing large corpora (which necessarily involves first making copies of the data to be analyzed), AI systems can learn patterns inherent in human-generated data and then use those patterns to synthesize similar data which yield increasingly compelling novel media in modalities as diverse as text, image, and audio. (emphasis added).

Thus, at the training stage, the most popular forms of machine-learning systems require making copies of existing works. And where the material being used is either not in the public domain or is not licensed, an infringement occurs (as Getty Images notes in a suit against Stability AI that it recently filed). Thus, some affirmative defense is needed to excuse the infringement.

Toward this end, OpenAI believes that its algorithmic training should qualify as a fair use. Other major services that use these AI techniques to “learn” from existing media would likely make similar arguments. But, at least in the way that OpenAI has framed the fair-use analysis (that these uses are sufficiently “transformative”), it’s not clear that they should qualify.

The purpose and character of the use

In brief, fair use—found in 17 USC § 107—provides for an affirmative defense against infringement when the use is  “for purposes such as criticism, comment, news reporting, teaching…, scholarship, or research.” When weighing a fair-use defense, a court must balance a number of factors:

  1. the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
  2. the nature of the copyrighted work;
  3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
  4. the effect of the use upon the potential market for or value of the copyrighted work.

OpenAI’s fair-use claim is rooted in the first factor: the nature and character of the use. I should note, then, that what follows is solely a consideration of Factor 1, with special attention paid to whether these uses are “transformative.” But it is important to stipulate fair-use analysis is a multi-factor test and that, even within the first factor, it’s not mandatory that a use be “transformative.” It is entirely possible that a court balancing all of the factors could, indeed, find that OpenAI is engaged in fair use, even if it does not agree that it is “transformative.”

Whether the use of copyrighted works to train an AI is “transformative” is certainly a novel question, but it is likely answered through an observation that the U.S. Supreme Court made in Campbell v. Acuff Rose Music:

[W]hat Sony said simply makes common sense: when a commercial use amounts to mere duplication of the entirety of an original, it clearly “supersede[s] the objects,”… of the original and serves as a market replacement for it, making it likely that cognizable market harm to the original will occur… But when, on the contrary, the second use is transformative, market substitution is at least less certain, and market harm may not be so readily inferred.

A key question, then, is whether training an AI on copyrighted works amounts to mere “duplication of the entirety of an original” or is sufficiently “transformative” to support a fair-use finding. Open AI, as noted above, believes its use is highly transformative. According to its comments:

Training of AI systems is clearly highly transformative. Works in training corpora were meant primarily for human consumption for their standalone entertainment value. The “object of the original creation,” in other words, is direct human consumption of the author’s ​expression.​ Intermediate copying of works in training AI systems is, by contrast, “non-expressive” the copying helps computer programs learn the patterns inherent in human-generated media. The aim of this process—creation of a useful generative AI system—is quite different than the original object of human consumption.  The output is different too: nobody looking to read a specific webpage contained in the corpus used to train an AI system can do so by studying the AI system or its outputs. The new purpose and expression are thus both highly transformative.

But the way that Open AI frames its system works against its interests in this argument. As noted above, and reinforced in the immediately preceding quote, an AI system like DALL-E or Stable Diffusion is actually made of at least two distinct pieces. The first is a piece of software that ingests existing works and creates a file that can serve as instructions to the second piece of software. The second piece of software then takes the output of the first part and can produce independent results. Thus, there is a clear discontinuity in the process, whereby the ultimate work created by the system is disconnected from the creative inputs used to train the software.

Therefore, contrary to what Open AI asserts, the protected works are indeed ingested into the first part of the system “for their standalone entertainment value.” That is to say, the software is learning what counts as “standalone entertainment value” and therefore, the works mustbe used in those terms.

Surely, a computer is not sitting on a couch and surfing for its own entertainment. But it is solely for the very “standalone entertainment value” that the first piece of software is being shown copyrighted material. By contrast, parody or “remixing”  uses incorporate the work into some secondary expression that transforms the input. The way these systems work is to learn what makes a piece entertaining and then to discard that piece altogether. Moreover, this use of art qua art most certainly interferes with the existing market insofar as this use is in lieu of reaching a licensing agreement with rightsholders.

The 2nd U.S. Circuit Court of Appeals dealt with an analogous case. In American Geophysical Union v. Texaco, the 2nd Circuit considered whether Texaco’s photocopying of scientific articles produced by the plaintiffs qualified for a fair-use defense. Texaco employed between 400 and 500 research scientists and, as part of supporting their work, maintained subscriptions to a number of scientific journals. It was common practice for Texaco’s scientists to photocopy entire articles and save them in a file.

The plaintiffs sued for copyright infringement. Texaco asserted that photocopying by its scientists for the purposes of furthering scientific research—that is to train the scientists on the content of the journal articles—should count as a fair use, at least in part because it was sufficiently “transformative.” The 2nd Circuit disagreed:

The “transformative use” concept is pertinent to a court’s investigation under the first factor because it assesses the value generated by the secondary use and the means by which such value is generated. To the extent that the secondary use involves merely an untransformed duplication, the value generated by the secondary use is little or nothing more than the value that inheres in the original. Rather than making some contribution of new intellectual value and thereby fostering the advancement of the arts and sciences, an untransformed copy is likely to be used simply for the same intrinsic purpose as the original, thereby providing limited justification for a finding of fair use… (emphasis added).

As in the case at hand, the 2nd Circuit observed that making full copies of the scientific articles was solely for the consumption of the material itself. A rejoinder, of course, is that training these AI systems surely advances scientific research and, thus, does foster the “advancement of the arts and sciences.” But in American Geophysical Union, where the secondary use was explicitly for the creation of new and different scientific outputs, the court still held that making copies of one scientific article in order to learn and produce new scientific innovations did not count as “transformative.”

What this case represents is that one cannot merely state that some social goal will be advanced in the future by permitting an exception to copyright protection today. As the 2nd Circuit put it:

…the dominant purpose of the use is a systematic institutional policy of multiplying the available number of copies of pertinent copyrighted articles by circulating the journals among employed scientists for them to make copies, thereby serving the same purpose for which additional subscriptions are normally sold, or… for which photocopying licenses may be obtained.

The secondary use itself must be transformative and different. Where an AI system ingests copyrighted works, that use is simply not transformative; it is using the works in their original sense in order to train a system to be able to make other original works. As in American Geophysical Union, the AI creators are completely free to seek licenses from rightsholders in order to train their systems.

Finally, there is a sense in which this machine learning might not infringe on copyrights at all. To my knowledge, the technology does not itself exist, but if it were possible for a machine to somehow “see” in the way that humans do—without using stored copies of copyrighted works—merely “learning” from those works, such as we can call it learning, probably would not violate copyright laws.

Do the outputs of these systems violate intellectual property laws?

The outputs of GANs and diffusion models may or may not violate IP laws, but there is nothing inherent in the processes described above to dictate that they must. As noted, the most common AI systems do not save copies of existing works, but merely “instructions” (more or less) on how to create new works that conform to patterns they found by examining existing work. If we assume that a system isn’t violating copyright at the input stage, it’s entirely possible that it can produce completely new pieces of art that have never before existed and do not violate copyright.

They can, however, be made to violate IP rights. For example, trademark violations appear to be one of the most popular uses of these AI systems by end users. To take but one example, a quick search of Google Images for “midjourney iron man” returns a slew of images that almost certainly violate trademarks for the character Iron Man. Similarly, these systems can be instructed to generate art that is not just “in the style” of a particular artist, but that very closely resembles existing pieces. In this sense, the system would be making a copy that theoretically infringes. 

There is a common bug in such systems that leads to outputs that are more likely to violate copyright in this way. Known as “overfitting,” the training leg of these AI systems can be presented with samples that contain too many instances of a particular image. This leads to a data set that contains too much information about the specific image, such that when the AI generates a new image, it is constrained to producing something very close to the original.

An argument can also be made that generating art “in the style of” a famous artist violates moral rights (in jurisdictions where such rights exist).

At least in the copyright space, cases like Sony are going to become crucial. Does the user side of these AI systems have substantial noninfringing uses? If so, the firms that host software for end users could avoid secondary-infringement liability, and the onus would fall on users to avoid violating copyright laws. At the same time, it seems plausible that legislatures could place some obligation on these providers to implement filters to mitigate infringement by end users.

Opportunities for New IP Commercialization with AI

There are a number of ways that AI systems may inexcusably infringe on intellectual-property rights. As a best practice, I would encourage the firms that operate these services to seek licenses from rightsholders. While this would surely be an expense, it also opens new opportunities for both sides to generate revenue.

For example, an AI firm could develop its own version of YouTube’s ContentID that allows creators to opt their work into training. For some well-known artists this could be negotiated with an upfront licensing fee. On the user-side, any artist who has opted in could then be selected as a “style” for the AI to emulate. When users generate an image, a royalty payment to the artist would be created. Creators would also have the option to remove their influence from the system if they so desired. 

Undoubtedly, there are other ways to monetize the relationship between creators and the use of their work in AI systems. Ultimately, the firms that run these systems will not be able to simply wish away IP laws. There are going to be opportunities for creators and AI firms to both succeed, and the law should help to generate that result.

Activists who railed against the Stop Online Piracy Act (SOPA) and the PROTECT IP Act (PIPA) a decade ago today celebrate the 10th anniversary of their day of protest, which they credit with sending the bills down to defeat.

Much of the anti-SOPA/PIPA campaign was based on a gauzy notion of “realizing [the] democratizing potential” of the Internet. Which is fine, until it isn’t.

But despite the activists’ temporary legislative victory, the methods of combating digital piracy that SOPA/PIPA contemplated have been employed successfully around the world. It may, indeed, be time for the United States to revisit that approach, as the very real problems the legislation sought to combat haven’t gone away.

From the perspective of rightsholders, the bill’s most important feature was also its most contentious: the ability to enforce judicial “site-blocking orders.” A site-blocking order is a type of remedy sometimes referred to as a no-fault injunction. Under SOPA/PIPA, a court would have been permitted to issue orders that could be used to force a range of firms—from financial providers to ISPs—to cease doing business with or suspend the service of a website that hosted infringing content.

Under current U.S. law, even when a court finds that a site has willfully engaged in infringement, stopping the infringement can be difficult, especially when the parties and their facilities are located outside the country. While Section 512 of the Digital Millennium Copyright Act does allow courts to issue injunctions, there is ambiguity as to whether it allows courts to issue injunctions that obligate online service providers (“OSP”) not directly party to a case to remove infringing material.

Section 512(j), for instance, provides for issuing injunctions “against a service provider that is not subject to monetary remedies under this section.” The “not subject to monetary remedies under this section” language could be construed to mean that such injunctions may be obtained even against OSPs that have not been found at fault for the underlying infringement. But as Motion Picture Association President Stanford K. McCoy testified in 2020:

In more than twenty years … these provisions of the DMCA have never been deployed, presumably because of uncertainty about whether it is necessary to find fault against the service provider before an injunction could issue, unlike the clear no-fault injunctive remedies available in other countries.

But while no-fault injunctions for copyright infringement have not materialized in the United States, this remedy has been used widely around the world. In fact, more than 40 countries—including Denmark, Finland, France, India, England, and Wales—have enacted or are under some obligation to enact rules allowing for no-fault injunctions that direct ISPs to disable access to websites that predominantly promote copyright infringement. 

In short, precisely the approach to controlling piracy that SOPA/PIPA envisioned has been in force around the world over the last decade. This demonstrates that, if properly tailored, no-fault injunctions are an ideal tool for courts to use in the fight to combat piracy.

If anything, we should be using the anniversary of SOPA/PIPA as an opportunity to reflect on a missed opportunity. Congress should take this opportunity to amend Section 512 to grant U.S. courts authority to issue no-fault injunctions that require OSPs to block access to sites that willfully engage in mass infringement.

Policy discussions about the use of personal data often have “less is more” as a background assumption; that data is overconsumed relative to some hypothetical optimal baseline. This overriding skepticism has been the backdrop for sweeping new privacy regulations, such as the California Consumer Privacy Act (CCPA) and the EU’s General Data Protection Regulation (GDPR).

More recently, as part of the broad pushback against data collection by online firms, some have begun to call for creating property rights in consumers’ personal data or for data to be treated as labor. Prominent backers of the idea include New York City mayoral candidate Andrew Yang and computer scientist Jaron Lanier.

The discussion has escaped the halls of academia and made its way into popular media. During a recent discussion with Tesla founder Elon Musk, comedian and podcast host Joe Rogan argued that Facebook is “one gigantic information-gathering business that’s decided to take all of the data that people didn’t know was valuable and sell it and make f***ing billions of dollars.” Musk appeared to agree.

The animosity exhibited toward data collection might come as a surprise to anyone who has taken Econ 101. Goods ideally end up with those who value them most. A firm finding profitable ways to repurpose unwanted scraps is just the efficient reallocation of resources. This applies as much to personal data as to literal trash.

Unfortunately, in the policy sphere, few are willing to recognize the inherent trade-off between the value of privacy, on the one hand, and the value of various goods and services that rely on consumer data, on the other. Ideally, policymakers would look to markets to find the right balance, which they often can. When the transfer of data is hardwired into an underlying transaction, parties have ample room to bargain.

But this is not always possible. In some cases, transaction costs will prevent parties from bargaining over the use of data. The question is whether such situations are so widespread as to justify the creation of data property rights, with all of the allocative inefficiencies they entail. Critics wrongly assume the solution is both to create data property rights and to allocate them to consumers. But there is no evidence to suggest that, at the margin, heightened user privacy necessarily outweighs the social benefits that new data-reliant goods and services would generate. Recent experience in the worlds of personalized medicine and the fight against COVID-19 help to illustrate this point.

Data Property Rights and Personalized Medicine

The world is on the cusp of a revolution in personalized medicine. Advances such as the improved identification of biomarkers, CRISPR genome editing, and machine learning, could usher a new wave of treatments to markedly improve health outcomes.

Personalized medicine uses information about a person’s own genes or proteins to prevent, diagnose, or treat disease. Genetic-testing companies like 23andMe or Family Tree DNA, with the large troves of genetic information they collect, could play a significant role in helping the scientific community to further medical progress in this area.

However, despite the obvious potential of personalized medicine, many of its real-world applications are still very much hypothetical. While governments could act in any number of ways to accelerate the movement’s progress, recent policy debates have instead focused more on whether to create a system of property rights covering personal genetic data.

Some raise concerns that it is pharmaceutical companies, not consumers, who will reap the monetary benefits of the personalized medicine revolution, and that advances are achieved at the expense of consumers’ and patients’ privacy. They contend that data property rights would ensure that patients earn their “fair” share of personalized medicine’s future profits.

But it’s worth examining the other side of the coin. There are few things people value more than their health. U.S. governmental agencies place the value of a single life at somewhere between $1 million and $10 million. The commonly used quality-adjusted life year metric offers valuations that range from $50,000 to upward of $300,000 per incremental year of life.

It therefore follows that the trivial sums users of genetic-testing kits might derive from a system of data property rights would likely be dwarfed by the value they would enjoy from improved medical treatments. A strong case can be made that policymakers should prioritize advancing the emergence of new treatments, rather than attempting to ensure that consumers share in the profits generated by those potential advances.

These debates drew increased attention last year, when 23andMe signed a strategic agreement with the pharmaceutical company Almirall to license the rights related to an antibody Almirall had developed. Critics pointed out that 23andMe’s customers, whose data had presumably been used to discover the potential treatment, received no monetary benefits from the deal. Journalist Laura Spinney wrote in The Guardian newspaper:

23andMe, for example, asks its customers to waive all claims to a share of the profits arising from such research. But given those profits could be substantial—as evidenced by the interest of big pharma—shouldn’t the company be paying us for our data, rather than charging us to be tested?

In the deal’s wake, some argued that personal health data should be covered by property rights. A cardiologist quoted in Fortune magazine opined: “I strongly believe that everyone should own their medical data—and they have a right to that.” But this strong belief, however widely shared, ignores important lessons that law and economics has to teach about property rights and the role of contractual freedom.

Why Do We Have Property Rights?

Among the many important features of property rights is that they create “excludability,” the ability of economic agents to prevent third parties from using a given item. In the words of law professor Richard Epstein:

[P]roperty is not an individual conception, but is at root a social conception. The social conception is fairly and accurately portrayed, not by what it is I can do with the thing in question, but by who it is that I am entitled to exclude by virtue of my right. Possession becomes exclusive possession against the rest of the world…

Excludability helps to facilitate the trade of goods, offers incentives to create those goods in the first place, and promotes specialization throughout the economy. In short, property rights create a system of exclusion that supports creating and maintaining valuable goods, services, and ideas.

But property rights are not without drawbacks. Physical or intellectual property can lead to a suboptimal allocation of resources, namely market power (though this effect is often outweighed by increased ex ante incentives to create and innovate). Similarly, property rights can give rise to thickets that significantly increase the cost of amassing complementary pieces of property. Often cited are the historic (but contested) examples of tolling on the Rhine River or the airplane patent thicket of the early 20th century. Finally, strong property rights might also lead to holdout behavior, which can be addressed through top-down tools, like eminent domain, or private mechanisms, like contingent contracts.

In short, though property rights—whether they cover physical or information goods—can offer vast benefits, there are cases where they might be counterproductive. This is probably why, throughout history, property laws have evolved to achieve a reasonable balance between incentives to create goods and to ensure their efficient allocation and use.

Personal Health Data: What Are We Trying to Incentivize?

There are at least three critical questions we should ask about proposals to create property rights over personal health data.

  1. What goods or behaviors would these rights incentivize or disincentivize that are currently over- or undersupplied by the market?
  2. Are goods over- or undersupplied because of insufficient excludability?
  3. Could these rights undermine the efficient use of personal health data?

Much of the current debate centers on data obtained from direct-to-consumer genetic-testing kits. In this context, almost by definition, firms only obtain consumers’ genetic data with their consent. In western democracies, the rights to bodily integrity and to privacy generally make it illegal to administer genetic tests against a consumer or patient’s will. This makes genetic information naturally excludable, so consumers already benefit from what is effectively a property right.

When consumers decide to use a genetic-testing kit, the terms set by the testing firm generally stipulate how their personal data will be used. 23andMe has a detailed policy to this effect, as does Family Tree DNA. In the case of 23andMe, consumers can decide whether their personal information can be used for the purpose of scientific research:

You have the choice to participate in 23andMe Research by providing your consent. … 23andMe Research may study a specific group or population, identify potential areas or targets for therapeutics development, conduct or support the development of drugs, diagnostics or devices to diagnose, predict or treat medical or other health conditions, work with public, private and/or nonprofit entities on genetic research initiatives, or otherwise create, commercialize, and apply this new knowledge to improve health care.

Because this transfer of personal information is hardwired into the provision of genetic-testing services, there is space for contractual bargaining over the allocation of this information. The right to use personal health data will go toward the party that values it most, especially if information asymmetries are weeded out by existing regulations or business practices.

Regardless of data property rights, consumers have a choice: they can purchase genetic-testing services and agree to the provider’s data policy, or they can forgo the services. The service provider cannot obtain the data without entering into an agreement with the consumer. While competition between providers will affect parties’ bargaining positions, and thus the price and terms on which these services are provided, data property rights likely will not.

So, why do consumers transfer control over their genetic data? The main reason is that genetic information is inaccessible and worthless without the addition of genetic-testing services. Consumers must pass through the bottleneck of genetic testing for their genetic data to be revealed and transformed into usable information. It therefore makes sense to transfer the information to the service provider, who is in a much stronger position to draw insights from it. From the consumer’s perspective, the data is not even truly “transferred,” as the consumer had no access to it before the genetic-testing service revealed it. The value of this genetic information is then netted out in the price consumers pay for testing kits.

If personal health data were undersupplied by consumers and patients, testing firms could sweeten the deal and offer them more in return for their data. U.S. copyright law covers original compilations of data, while EU law gives 15 years of exclusive protection to the creators of original databases. Legal protections for trade secrets could also play some role. Thus, firms have some incentives to amass valuable health datasets.

But some critics argue that health data is, in fact, oversupplied. Generally, such arguments assert that agents do not account for the negative privacy externalities suffered by third-parties, such as adverse-selection problems in insurance markets. For example, Jay Pil Choi, Doh Shin Jeon, and Byung Cheol Kim argue:

Genetic tests are another example of privacy concerns due to informational externalities. Researchers have found that some subjects’ genetic information can be used to make predictions of others’ genetic disposition among the same racial or ethnic category.  … Because of practical concerns about privacy and/or invidious discrimination based on genetic information, the U.S. federal government has prohibited insurance companies and employers from any misuse of information from genetic tests under the Genetic Information Nondiscrimination Act (GINA).

But if these externalities exist (most of the examples cited by scholars are hypothetical), they are likely dwarfed by the tremendous benefits that could flow from the use of personal health data. Put differently, the assertion that “excessive” data collection may create privacy harms should be weighed against the possibility that the same collection may also lead to socially valuable goods and services that produce positive externalities.

In any case, data property rights would do little to limit these potential negative externalities. Consumers and patients are already free to agree to terms that allow or prevent their data from being resold to insurers. It is not clear how data property rights would alter the picture.

Proponents of data property rights often claim they should be associated with some form of collective bargaining. The idea is that consumers might otherwise fail to receive their “fair share” of genetic-testing firms’ revenue. But what critics portray as asymmetric bargaining power might simply be the market signaling that genetic-testing services are in high demand, with room for competitors to enter the market. Shifting rents from genetic-testing services to consumers would undermine this valuable price signal and, ultimately, diminish the quality of the services.

Perhaps more importantly, to the extent that they limit the supply of genetic information—for example, because firms are forced to pay higher prices for data and thus acquire less of it—data property rights might hinder the emergence of new treatments. If genetic data is a key input to develop personalized medicines, adopting policies that, in effect, ration the supply of that data is likely misguided.

Even if policymakers do not directly put their thumb on the scale, data property rights could still harm pharmaceutical innovation. If existing privacy regulations are any guide—notably, the previously mentioned GDPR and CCPA, as well as the federal Health Insurance Portability and Accountability Act (HIPAA)—such rights might increase red tape for pharmaceutical innovators. Privacy regulations routinely limit firms’ ability to put collected data to new and previously unforeseen uses. They also limit parties’ contractual freedom when it comes to gathering consumers’ consent.

At the margin, data property rights would make it more costly for firms to amass socially valuable datasets. This would effectively move the personalized medicine space further away from a world of permissionless innovation, thus slowing down medical progress.

In short, there is little reason to believe health-care data is misallocated. Proposals to reallocate rights to such data based on idiosyncratic distributional preferences threaten to stifle innovation in the name of privacy harms that remain mostly hypothetical.

Data Property Rights and COVID-19

The trade-off between users’ privacy and the efficient use of data also has important implications for the fight against COVID-19. Since the beginning of the pandemic, several promising initiatives have been thwarted by privacy regulations and concerns about the use of personal data. This has potentially prevented policymakers, firms, and consumers from putting information to its optimal social use. High-profile issues have included:

Each of these cases may involve genuine privacy risks. But to the extent that they do, those risks must be balanced against the potential benefits to society. If privacy concerns prevent us from deploying contact tracing or green passes at scale, we should question whether the privacy benefits are worth the cost. The same is true for rules that prohibit amassing more data than is strictly necessary, as is required by data-minimization obligations included in regulations such as the GDPR.

If our initial question was instead whether the benefits of a given data-collection scheme outweighed its potential costs to privacy, incentives could be set such that competition between firms would reduce the amount of data collected—at least, where minimized data collection is, indeed, valuable to users. Yet these considerations are almost completely absent in the COVID-19-related privacy debates, as they are in the broader privacy debate. Against this backdrop, the case for personal data property rights is dubious.

Conclusion

The key question is whether policymakers should make it easier or harder for firms and public bodies to amass large sets of personal data. This requires asking whether personal data is currently under- or over-provided, and whether the additional excludability that would be created by data property rights would offset their detrimental effect on innovation.

Swaths of personal data currently lie untapped. With the proper incentive mechanisms in place, this idle data could be mobilized to develop personalized medicines and to fight the COVID-19 outbreak, among many other valuable uses. By making such data more onerous to acquire, property rights in personal data might stifle the assembly of novel datasets that could be used to build innovative products and services.

On the other hand, when dealing with diffuse and complementary data sources, transaction costs become a real issue and the initial allocation of rights can matter a great deal. In such cases, unlike the genetic-testing kits example, it is not certain that users will be able to bargain with firms, especially where their personal information is exchanged by third parties.

If optimal reallocation is unlikely, should property rights go to the person covered by the data or to the collectors (potentially subject to user opt-outs)? Proponents of data property rights assume the first option is superior. But if the goal is to produce groundbreaking new goods and services, granting rights to data collectors might be a superior solution. Ultimately, this is an empirical question.

As Richard Epstein puts it, the goal is to “minimize the sum of errors that arise from expropriation and undercompensation, where the two are inversely related.” Rather than approach the problem with the preconceived notion that initial rights should go to users, policymakers should ensure that data flows to those economic agents who can best extract information and knowledge from it.

As things stand, there is little to suggest that the trade-offs favor creating data property rights. This is not an argument for requisitioning personal information or preventing parties from transferring data as they see fit, but simply for letting markets function, unfettered by misguided public policies.

Earlier this week Senators Orrin Hatch and Ron Wyden and Representative Paul Ryan introduced bipartisan, bicameral legislation, the Bipartisan Congressional Trade Priorities and Accountability Act of 2015 (otherwise known as Trade Promotion Authority or “fast track” negotiating authority). The bill would enable the Administration to negotiate free trade agreements subject to appropriate Congressional review.

Nothing bridges partisan divides like free trade.

Top presidential economic advisors from both parties support TPA. And the legislation was greeted with enthusiastic support from the business community. Indeed, a letter supporting the bill was signed by 269 of the country’s largest and most significant companies, including Apple, General Electric, Intel, and Microsoft.

Among other things, the legislation includes language calling on trading partners to respect and protect intellectual property. That language in particular was (not surprisingly) widely cheered in a letter to Congress signed by a coalition of sixteen technology, content, manufacturing and pharmaceutical trade associations, representing industries accounting for (according to the letter) “approximately 35 percent of U.S. GDP, more than one quarter of U.S. jobs, and 60 percent of U.S. exports.”

Strong IP protections also enjoy bipartisan support in much of the broader policy community. Indeed, ICLE recently joined sixty-seven think tanks, scholars, advocacy groups and stakeholders on a letter to Congress expressing support for strong IP protections, including in free trade agreements.

Despite this overwhelming support for the bill, the Internet Association (a trade association representing 34 Internet companies including giants like Google and Amazon, but mostly smaller companies like coinbase and okcupid) expressed concern with the intellectual property language in TPA legislation, asserting that “[i]t fails to adopt a balanced approach, including the recognition that limitations and exceptions in copyright law are necessary to promote the success of Internet platforms both at home and abroad.”

But the proposed TPA bill does recognize “limitations and exceptions in copyright law,” as the Internet Association is presumably well aware. Among other things, the bill supports “ensuring accelerated and full implementation of the Agreement on Trade-Related Aspects of Intellectual Property Rights,” which specifically mentions exceptions and limitations on copyright, and it advocates “ensuring that the provisions of any trade agreement governing intellectual property rights that is entered into by the United States reflect a standard of protection similar to that found in United States law,” which also recognizes copyright exceptions and limitations.

What the bill doesn’t do — and wisely so — is advocate for the inclusion of mandatory fair use language in U.S. free trade agreements.

Fair use is an exception under U.S. copyright law to the normal rule that one must obtain permission from the copyright owner before exercising any of the exclusive rights in Section 106 of the Copyright Act.

Including such language in TPA would require U.S. negotiators to demand that trading partners enact U.S.-style fair use language. But as ICLE discussed in a recent White Paper, if broad, U.S.-style fair use exceptions are infused into trade agreements they could actually increase piracy and discourage artistic creation and innovation — particularly in nations without a strong legal tradition implementing such provisions.

All trade agreements entered into by the U.S. since 1994 include a mechanism for trading partners to enact copyright exceptions and limitations, including fair use, should they so choose. These copyright exceptions and limitations must conform to a global standard — the so-called “three-step test,” — established under the auspices of the 1994 Trade-Related Aspects of Intellectual Property Rights (TRIPS) Agreement, and with roots going back to the 1967 amendments to the 1886 Berne Convention.

According to that standard,

Members shall confine limitations or exceptions to exclusive rights to

  1. certain special cases, which
  2. do not conflict with a normal exploitation of the work and
  3. do not unreasonably prejudice the legitimate interests of the right holder.

This three-step test provides a workable standard for balancing copyright protections with other public interests. Most important, it sets flexible (but by no means unlimited) boundaries, so, rather than squeezing every jurisdiction into the same box, it accommodates a wide range of exceptions and limitations to copyright protection, ranging from the U.S.’ fair use approach to the fair dealing exception in other common law countries to the various statutory exceptions adopted in civil law jurisdictions.

Fair use is an inherently common law concept, developed by case-by-case analysis and a system of binding precedent. In the U.S. it has been codified by statute, but only after two centuries of common law development. Even as codified, fair use takes the form of guidance to judicial decision-makers assessing whether any particular use of a copyrighted work merits the exception; it is not a prescriptive statement, and judicial interpretation continues to define and evolve the doctrine.

Most countries in the world, on the other hand, have civil law systems that spell out specific exceptions to copyright protection, that don’t rely on judicial precedent, and that are thus incompatible with the common law, fair use approach. The importance of this legal flexibility can’t be understated: Only four countries out of the 166 signatories to the Berne Convention have adopted fair use since 1967.

Additionally, from an economic perspective the rationale for fair use would seem to be receding, not expanding, further eroding the justification for its mandatory adoption via free trade agreements.

As digital distribution, the Internet and a host of other technological advances have reduced transaction costs, it’s easier and cheaper for users to license copyrighted content. As a result, the need to rely on fair use to facilitate some socially valuable uses of content that otherwise wouldn’t occur because of prohibitive costs of contracting is diminished. Indeed, it’s even possible that the existence of fair use exceptions may inhibit the development of these sorts of mechanisms for simple, low-cost agreements between owners and users of content – with consequences beyond the material that is subject to the exceptions. While, indeed, some socially valuable uses, like parody, may merit exceptions because of rights holders’ unwillingness, rather than inability, to license, U.S.-style fair use is in no way necessary to facilitate such exceptions. In short, the boundaries of copyright exceptions should be contracting, not expanding.

It’s also worth noting that simple marketplace observations seem to undermine assertions by Internet companies that they can’t thrive without fair use. Google Search, for example, has grown big enough to attract the (misguided) attention of EU antitrust regulators, despite no European country having enacted a U.S-style fair use law. Indeed, European regulators claim that the company has a 90% share of the market — without fair use.

Meanwhile, companies like Netflix contend that their ability to cache temporary copies of video content in order to improve streaming quality would be imperiled without fair use. But it’s impossible to see how Netflix is able to negotiate extensive, complex contracts with copyright holders to actually show their content, but yet is somehow unable to negotiate an additional clause or two in those contracts to ensure the quality of those performances without fair use.

Properly bounded exceptions and limitations are an important aspect of any copyright regime. But given the mix of legal regimes among current prospective trading partners, as well as other countries with whom the U.S. might at some stage develop new FTAs, it’s highly likely that the introduction of U.S.-style fair use rules would be misinterpreted and misapplied in certain jurisdictions and could result in excessively lax copyright protection, undermining incentives to create and innovate. Of course for the self-described consumer advocates pushing for fair use, this is surely the goal. Further, mandating the inclusion of fair use in trade agreements through TPA legislation would, in essence, force the U.S. to ignore the legal regimes of its trading partners and weaken the protection of copyright in trade agreements, again undermining the incentive to create and innovate.

There is no principled reason, in short, for TPA to mandate adoption of U.S-style fair use in free trade agreements. Congress should pass TPA legislation as introduced, and resist any rent-seeking attempts to include fair use language.

Interested observers on all sides of the contentious debate over Aereo have focused a great deal on the implications for cloud computing if the Supreme Court rules against Aereo. The Court hears oral argument next week, and the cloud computing issue is sure to make an appearance.

Several parties that filed amicus briefs in the case weighed in on the issue. The Center for Democracy & Technology, for example, filed abrief arguing that a ruling against Aereo would hinder the development of cloud computing. Thirty-six Intellectual Property and Copyright Law Professors also filed a brief arguing this point. On the other hand, the United States—represented by the Solicitor General—devoted a section of its amicus brief in support of copyright owners’ argument that the Court could rule against Aereo without undermining cloud computing.

Our organizations, the International Center for Law and Economics and the Competitive Enterprise Institute, filed an amicus brief in the case in support of the Petitioners (as did many other policy groups, academics, and trade associations). In our brief we applied the consumer welfare framework to the question whether allowing Aereo’s business practice would increase the societal benefits that copyright law seeks to advance. We argued that holding Aereo liable for copyright infringement was well within the letter and spirit of the Copyright Act of 1976. In particular, we argued that Aereo’s model is less a disruptive innovation than a technical work-around taking advantage of the Second Circuit’s overbroad reading of the law in the Cablevision case.

Although our brief didn’t directly address cloud computing writ large, we did articulate a crucial distinction between Aereo and other cloud computing providers. Under our reasoning, the Court could rule against Aereo—as it should—without destroying cloud computing—as it should not.

Background

By way of background, at the center of the legal debate is what it means to “perform [a] copyrighted work publicly.” Aereo argues that because only one individual subscriber is “capable of receiving” each transmission its service delivers, its performances are private, not public. The Copyright Act gives copyright owners the exclusive right to publicly perform their works, but not the right to perform them privately. Therefore, Aereo contends, its service doesn’t infringe upon copyright owners’ exclusive rights.

We disagree. As our brief explains, Aereo’s argument ignores Congress’ decision in the Copyright Act of 1976 to expressly define the transmission of a television broadcast “by means of any device or process” to the public as a public performance, “whether the members of the public capable of receiving the performance … receive it in the same place or in separate places and at the same time or at different times.” Aereo has built an elaborate system for distributing live high-def broadcast television content to subscribers for a monthly fee—without obtaining permission from, or paying royalties to, the copyright owners in the audiovisual works aired by broadcasters.

Although the Copyright Act’s text is less than artful, Congress plainly wrote it so as to encompass businesses that sell consumers access to live television broadcasts, whether using traditional means—such as coaxial cable lines—or some high-tech system that lawmakers couldn’t foresee in 1976.

What does this case mean for cloud computing? To answer this question, it’s worth dividing the discussion into two parts: one addressing cloud providers that don’t sell their users licenses to copyrighted works, and the other addressing cloud providers that do. Dropbox and Mozy fit in the first category; Amazon and iTunes fit in the second.

A Ruling Against Aereo Won’t Destroy Cloud Computing Services like Dropbox

According to the 36 Intellectual Property and Copyright Law Professors, a loss for Aereo would be bad news for cloud storage providers such as Dropbox:

If any service making multiple transmissions of the same underlying copyrighted audiovisual work is publicly performing that work, then the distinction between video-on-demand services and online storage services would vanish, and all such services would henceforth face infringement liability. Thus, if two Dropbox users independently streamed “We, the Juries,” then under Petitioners’ theory, those two transmissions would be aggregated together, making them collectively “to the public.” Under Petitioners’ theory of this case—direct infringement by public performance—that would be game, set, and match against Dropbox.

This sounds like bad news for the cloud. Fortunately, however, Dropbox has little to fear from an Aereo defeat, even if the professors are right to worry about an overbroad public performance right (more on this below). The Digital Millenium Copyright Act (DMCA) grants online service providers—including cloud hosting services such as Dropbox—a safe harbor from copyright infringement liability for unwittingly storing infringing files uploaded by their users. In exchange for this immunity, service providers must comply with the DMCA’snotice and takedown system and adopt a policy to terminate repeat-infringing users, among other duties.

Although 17 U.S.C. § 512(c) refers only to infringement “by reason of … storage” directed by a user, courts have consistently interpreted this language to “encompass[] the access-facilitating processes that automatically occur when a user uploads” a file to a cloud hosting service. Whether YouTube streams an infringing video once or 1,000,000 times, therefore, it retains its DMCA immunity so long as it complies with the safe harbor’s requirements. So even if Aereo loses, and every DropBox user who streams “We, the Juries” is receiving a public performance, DropBox will still be safe from copyright infringement liability in the same way as YouTube, Vimeo, DailyMotion, and countless other services are safe today.

An Aereo Defeat Won’t Kill Cloud Computing Services like Amazon and Google

As for cloud computing providers that provide copyrighted content, the legal analysis is admittedly trickier. These providers, such as Google and Amazon, contract with copyright holders to sell their users licenses to copyrighted works. Some providers offer a subscription to streaming content, for which the provider has typically secured public performance licenses from the copyright owners. Cloud providers also sell digital copies of copyrighted works—that is, non-transferable lifetime licenses—for which the provider has generally obtained reproduction and distribution licenses, but not public performance rights.

But, as copyright law guru Devlin Hartline argues, determining if a performance is public or private turns on whether the cloud provider’s “volitional conduct [is] sufficient such that it directly causes the transmission.” When a user streams her own licensed content from a cloud service, it remains a private performance because the cloud service took no willful steps to facilitate the playback of copyrighted material. (The same is true for Dropbox-like services, as well.) Aereo, conversely, “crosse[s] the line from being a passive conduit to being an active participant because it supplies the very content that is available using its service.”

Neither Google’s nor Amazon’s business models much resemble Aereo’s, which entails transmitting content for which the company has secured no copyright licenses—either for itself or for its users. And to the extent that these services do supply the content being transmitted (as Spotify or Google Play All Access do, for example), they secure the appropriate public performance right to do so. Indeed, critics who have focused on cloud computing fail to appreciate how the Copyright Act distinguishes between infringing technologies such as Aereo and lawful uses of the cloud to store, share, and transmit copyrighted works.

For instance, as CDT notes:

[S]everal companies (including Google and Amazon) have launched personal music locker services, allowing individuals to upload their personal music collections “to the cloud” and enabling them to transmit that music back to their own computers, phones, and tablets when, where, and how they find most convenient.

And other critics of broadcasters’ legal position have made similar arguments, claiming that the Court cannot reach a holding that simultaneously bars Aereo while allowing cloud storage:

[I]f Aereo is publicly performing when you store a unique copy of the nightly news online and watch it later, then why aren’t cloud services publicly performing if they host your (lawful) unique mp3 of the latest hit single and stream it to you later?… The problem with this rationale is that it applies with equal force to cloud storage like Dropbox, SkyDrive, iCloud, and Google Drive. If multiple people store their own, unique, lawfully acquired copy of the latest hit single in the cloud, and then play it to themselves over the Internet, that too sounds like the broadcasters’ version of a public performance. The anti-Aereo rationale doesn’t distinguish between Aereo and the cloud.

The Ability to Contract is Key

These arguments miss the important concept of privity. A copyright holder who does not wish to license the exclusive rights in her content cannot be forced to do so (unless the content is subject to a compulsory license). If a copyright holder prefers its users not upload their licensed videos to the cloud and later stream them for personal use, the owner can include such a prohibition in its licenses. This may affect users’ willingness to pay for such encumbered content—but this is private ordering in action, with copyright holders and licensees bargaining over control over copyrighted works, a core purpose of the Copyright Act.

When a copyright holder wishes to license content to a cloud provider or user, the parties can bargain over whether users may stream their content from the cloud. These deals can evolve over time in response to new technology and changing consumer demand. This happens all the time—as in therecent deal between Dish and Disney over the Hopper DVR, wherein Dish agreed that Hopper would automatically excise the commercials accompanying ABC content only after three days elapse after each show airs.

But Aereo forecloses the possibility of such negotiation, making all over-the-air content available online to subscribers absent any agreement with the underlying copyright owners of such programs. Aereo is thus distinct from other cloud services that supply content to their users, as the latter have permission to license their content.

Of course, broadcasters make their programming freely available over the airwaves, without any express agreement with viewers. But this doesn’t mean broadcasters lose their legal right to restrict how third parties distribute and monetize their content. While consumers can record and watch such broadcasts at their leisure, they can’t record programs and then sell the rights to the content, for example, simply replicating the broadcast. The fact that copyright holders have entered into licenses to “cloud-ify” content with dedicated over-the-top apps and Hulu clearly suggests that the over-the-air “license” is limited. And because Aereo refuses to deal with the broadcasters, there’s no possibility of a negotiated agreement between Aereo and the content owners, either. The unique combination of broadcast content and an unlicensed distributor differentiates the situation in Aereo from typical cloud computing.

If broadcasters can’t rely on copyright law to protect them from companies like Aereo that simply repackage over-the-air content, they may well shift all of their content to cable subscriptions instead of giving a free option to consumers. That’s bad news for folks who access free television—regardless of the efficiency of traditional broadcasting, or lack thereof.

The Cablevision Decision Doesn’t Require a Holding for Aereo

Commentators argue that overruling the Second Circuit in Aereo necessarily entails overruling the Second Circuit’s Cablevision holding—and with it that ruling’s fair use protections for DVRs and other cloud computing functionality. We disagree, however. Rather, regardless of whether Cablevision was correctly decided, its application to Aereo is improper.

In Cablevision, the individual cable subscribers to whom Cablevision transmitted copies of plaintiff Cartoon Network’s television programming were already paying for lawful access to it. Cartoon Network voluntarily agreed to license its copyrighted works to Cablevision and, in turn, to each Cablevision subscriber whose cable package included the Cartoon Network channel.

The dispute in Cablevision thus involved a copyright holder and a licensee with a preexisting contractual relationship; the parties simply disagreed on the terms by which Cablevision was permitted to transmit Cartoon Network’s content. But even after the decision, Cartoon Network remained (and remains) free to terminate or renegotiate its licensing agreement with Cablevision.

Again, this dynamic of voluntary exchange mitigates Cablevision’s impact on the market for television programming, as copyright holders and cable companies settle on a new equilibrium. But unlike the cable company in Cablevision, Aereo has neither sought nor received permission from any holders of copyrights in broadcast television programming before retransmitting their works to paying subscribers.

Even if it is correct that Aereo itself isn’t engaging in public performance of copyrighted work, it remains the case that its subscribers haven’t obtained the right to use Aereo’s services, either. But one party or the other must obtain this right or else establish that it’s a fair use.

Fair Use Won’t Save Aereo

The only way legitimately to rule in Aereo’s favor would be to decide that Aereo’s retransmission of broadcast content is a fair use. But as Cablevision’s own amicus brief in Aereo (supporting Aereo) argues, fair use rights don’t cover Aereo’s non-transformative retransmission of broadcast content. Cloud computing providers, on the other hand, offer services that enable distinct functionality independent of the mere retransmission of copyrighted content:

Aereo is functionally identical to a cable system. It captures over-the-air broadcast signals and retransmits them for subscribers to watch. Aereo thus is not meaningfully different from services that have long been required to pay royalties. That fact sharply distinguishes Aereo from cloud technologies like remote-storage services and remote DVRs.

* * *

Aereo is not in the business of transmitting recorded content from individual hard-drive copies to subscribers. Rather, it is in the business of retransmitting broadcast television to subscribers.

* * *

Aereo…is not relying on its separate hard-drive copies merely to justify the lawfulness of its pause, rewind, and record functions. It is relying on those copies to justify the entire television retransmission service. It is doing so even in the many cases where subscribers are not even using the pause, rewind, or record functions but are merely watching television live.

It may be that the DVR-like functions that Aereo provides are protected, but that doesn’t mean that it can retransmit copyrighted content without a license. If, like cable companies, it obtained such a license, it might be able to justify its other functionality (and negotiate license terms with broadcasters to reflect the value to each of such functionality). But that is a fundamentally different case. Similarly, if users were able to purchase licenses to broadcast content, Aereo’s additional functionality might also be protected (with the license terms between users and broadcasters reflecting the value to each). But, again, that is a fundamentally different case. Cloud computing services don’t create these problems, and thus need not be implicated by a proper reading of the Copyright Act and a ruling against Aereo.

Conclusion

One of the main purposes of copyright law is to secure for content creators the right to market their work. To allow services like Aereo undermines that ability and the incentives to create content in the first place. But, as we have shown, there is no reason to think a ruling against Aereo will destroy cloud computing.