Archives For statistics

If you do research involving statistical analysis, you’ve heard of John Ioannidis. If you haven’t heard of him, you will. He’s gone after the fields of medicine, psychology, and economics. He may be coming for your field next.

Ioannidis is after bias in research. He is perhaps best known for a 2005 paper “Why Most Published Research Findings Are False.” A professor at Stanford, he has built a career in the field of meta-research and may be one of the most highly cited researchers alive.

In 2017, he published “The Power of Bias in Economics Research.” He recently talked to Russ Roberts on the EconTalk podcast about his research and what it means for economics.

He focuses on two factors that contribute to bias in economics research: publication bias and low power. These are complicated topics. This post hopes to provide a simplified explanation of these issues and why bias and power matters.

What is bias?

We frequently hear the word bias. “Fake news” is biased news. For dinner, I am biased toward steak over chicken. That’s different from statistical bias.

In statistics, bias means that a researcher’s estimate of a variable or effect is different from the “true” value or effect. The “true” probability of getting heads from tossing a fair coin is 50 percent. Let’s say that no matter how many times I toss a particular coin, I find that I’m getting heads about 75 percent of the time. My instrument, the coin, may be biased. I may be the most honest coin flipper, but my experiment has biased results. In other words, biased results do not imply biased research or biased researchers.

Publication bias

Publication bias occurs because peer-reviewed publications tend to favor publishing positive, statistically significant results and to reject insignificant results. Informally, this is known as the “file drawer” problem. Nonsignificant results remain unsubmitted in the researcher’s file drawer or, if submitted, remain in limbo in an editor’s file drawer.

Studies are more likely to be published in peer-reviewed publications if they have statistically significant findings, build on previous published research, and can potentially garner citations for the journal with sensational findings. Studies that don’t have statistically significant findings or don’t build on previous research are less likely to be published.

The importance of “sensational” findings means that ho-hum findings—even if statistically significant—are less likely to be published. For example, research finding that a 10 percent increase in the minimum wage is associated with a one-tenth of 1 percent reduction in employment (i.e., an elasticity of 0.01) would be less likely to be published than a study finding a 3 percent reduction in employment (i.e., elasticity of –0.3).

“Man bites dog” findings—those that are counterintuitive or contradict previously published research—may be less likely to be published. A study finding an upward sloping demand curve is likely to be rejected because economists “know” demand curves slope downward.

On the other hand, man bites dog findings may also be more likely to be published. Card and Krueger’s 1994 study finding that a minimum wage hike was associated with an increase in low-wage workers was published in the top-tier American Economic Review. Had the study been conducted by lesser-known economists, it’s much less likely it would have been accepted for publication. The results were sensational, judging from the attention the article got from the New York Times, the Wall Street Journal, and even the Clinton administration. Sometimes a man does bite a dog.

Low power

A study with low statistical power has a reduced chance of detecting a true effect.

Consider our criminal legal system. We seek to find criminals guilty, while ensuring the innocent go free. Using the language of statistical testing, the presumption of innocence is our null hypothesis. We set a high threshold for our test: Innocent until proven guilty, beyond a reasonable doubt. We hypothesize innocence and only after overcoming our reasonable doubt do we reject that hypothesis.


An innocent person found guilty is considered a serious error—a “miscarriage of justice.” The presumption of innocence (null hypothesis) combined with a high burden of proof (beyond a reasonable doubt) are designed to reduce these errors. In statistics, this is known as “Type I” error, or “false positive.” The probability of a Type I error is called alpha, which is set to some arbitrarily low number, like 10 percent, 5 percent, or 1 percent.

Failing to convict a known criminal is also a serious error, but generally agreed it’s less serious than a wrongful conviction. Statistically speaking, this is a “Type II” error or “false negative” and the probability of making a Type II error is beta.

By now, it should be clear there’s a relationship between Type I and Type II errors. If we reduce the chance of a wrongful conviction, we are going to increase the chance of letting some criminals go free. It can be mathematically shown (not here), that a reduction in the probability of Type I error is associated with an increase in Type II error.

Consider O.J. Simpson. Simpson was not found guilty in his criminal trial for murder, but was found liable for the deaths of Nicole Simpson and Ron Goldman in a civil trial. One reason for these different outcomes is the higher burden of proof for a criminal conviction (“beyond a reasonable doubt,” alpha = 1 percent) than for a finding of civil liability (“preponderance of evidence,” alpha = 50 percent). If O.J. truly is guilty of the murders, the criminal trial would have been less likely to find guilt than the civil trial would.

In econometrics, we construct the null hypothesis to be the opposite of what we hypothesize to be the relationship. For example, if we hypothesize that an increase in the minimum wage decreases employment, the null hypothesis would be: “A change in the minimum wage has no impact on employment.” If the research involves regression analysis, the null hypothesis would be: “The estimated coefficient on the elasticity of employment with respect to the minimum wage would be zero.” If we set the probability of Type I error to 5 percent, then regression results with a p-value of less than 0.05 would be sufficient to reject the null hypothesis of no relationship. If we increase the probability of Type I error, we increase the likelihood of finding a relationship, but we also increase the chance of finding a relationship when none exists.

Now, we’re getting to power.

Power is the chance of detecting a true effect. In the legal system, it would be the probability that a truly guilty person is found guilty.

By definition, a low power study has a small chance of discovering a relationship that truly exists. Low power studies produce more false negative than high power studies. If a set of studies have a power of 20 percent, then if we know that there are 100 actual effects, the studies will find only 20 of them. In other words, out of 100 truly guilty suspects, a legal system with a power of 20 percent will find only 20 of them guilty.

Suppose we expect 25 percent of those accused of a crime are truly guilty of the crime. Thus the odds of guilt are R = 0.25 / 0.75 = 0.33. Assume we set alpha to 0.05, and conclude the accused is guilty if our test statistic provides p < 0.05. Using Ioannidis’ formula for positive predictive value, we find:

  • If the power of the test is 20 percent, the probability that a “guilty” verdict reflects true guilt is 57 percent.
  • If the power of the test is 80 percent, the probability that a “guilty” verdict reflects true guilt is 84 percent.

In other words, a low power test is more likely to convict the innocent than a high power test.

In our minimum wage example, a low power study is more likely find a relationship between a change in the minimum wage and employment when no relationship truly exists. By extension, even if a relationship truly exists, a low power study would be more likely to find a bigger impact than a high power study. The figure below demonstrates this phenomenon.


Across the 1,424 studies surveyed, the average elasticity with respect to the minimum wage is –0.190 (i.e., a 10 percent increase in the minimum wage would be associated with a 1.9 percent decrease in employment). When adjusted for the studies’ precision, the weighted average elasticity is –0.054. By this simple analysis, the unadjusted average is 3.5 times bigger than the adjusted average. Ioannidis and his coauthors estimate among the 60 studies with “adequate” power, the weighted average elasticity is –0.011.

(By the way, my own unpublished studies of minimum wage impacts at the state level had an estimated short-run elasticity of –0.03 and “precision” of 122 for Oregon and short-run elasticity of –0.048 and “precision” of 259 for Colorado. These results are in line with the more precise studies in the figure above.)

Is economics bogus?

It’s tempting to walk away from this discussion thinking all of econometrics is bogus. Ioannidis himself responds to this temptation:

Although the discipline has gotten a bad rap, economics can be quite reliable and trustworthy. Where evidence is deemed unreliable, we need more investment in the science of economics, not less.

For policymakers, the reliance on economic evidence is even more important, according to Ioannidis:

[P]oliticians rarely use economic science to make decisions and set new laws. Indeed, it is scary how little science informs political choices on a global scale. Those who decide the world’s economic fate typically have a weak scientific background or none at all.

Ioannidis and his colleagues identify several way to address the reliability problems in economics and other fields—social psychology is one of the worst. However these are longer term solutions.

In the short term, researchers and policymakers should view sensational finding with skepticism, especially if those sensational findings support their own biases. That skepticism should begin with one simple question: “What’s the confidence interval?”


This was previously posted to the Center for the Protection of Intellectual Property Blog on October 4, and given that Congress is rushing headlong into enacting legislation to respond to an alleged crisis over “patent trolls,” it bears reposting if only to show that Congress is ignoring its own experts in the Government Accountability Office who officially reported this past August that there’s no basis for this legislative stampede.

As previously reported, there are serious concerns with the studies asserting that a “patent litigation explosion” has been caused by patent licensing companies (so-called non-practicing entities (“NPEs”) or “patent trolls”). These seemingly alarming studies (see here and here) have drawn scholarly criticism for their use of proprietary, secret data collected from companies like RPX and Patent Freedom – companies whose business models are predicated on defending against patent licensing companies. In addition to raising serious questions about self-selection and other biases in the data underlying these studies, the RPX and Patent Freedom data sets to this day remain secret and are unknown and unverifiable.  Thus, it is impossible to apply the standard scientific and academic norm that all studies make available data for confirmation of the results via independently produced studies.  We have long suggested that it was time to step back from such self-selecting “statistics” based on secret data and nonobjective rhetoric in the patent policy debates.

At long last, an important and positive step has been taken in this regard. The Government Accountability Office (GAO) has issued a report on patent litigation, entitled “Intellectual Property: Assessing Factors that Affect Patent Infringement Litigation Could Help Improve Patent Quality,” (“the GAO Report”), which was mandated by § 34 of the America Invents Act (AIA). The GAO Report offers an important step in the right direction in beginning a more constructive, fact-based discussion about litigation over patented innovation.

The GAO is an independent, non-partisan agency under Congress.  As stated in its report, it was tasked by the AIA to undertake this study in response to “concerns that patent infringement litigation by NPEs is increasing and that this litigation, in some cases, has imposed high costs on firms that are actually developing and manufacturing products, especially in the software and technology sectors.”  Far from affirming such concerns, the GAO Report concludes that no such NPE litigation problem exists.

In its study of patent litigation in the United States, the GAO primarily utilized data obtained from Lex Machina, a firm specialized in collecting and analyzing IP litigation data.  To describe what is known about the volume and characteristics of recent patent litigation activity, the GAO utilized data provided by Lex Machina for all patent infringement lawsuits between 2000 and 2011.  Additionally, Lex Machina also selected a sample of 500 lawsuits – 100 per year from 2007 to 2011 – to allow estimated percentages with a margin of error of no more than plus or minus 5% points over all these years and no more than plus or minus 10% points for any particular year.  From this data set, the GAO extrapolated its conclusion that current concerns expressed about patent licensing companies were misplaced. 

Interestingly, the methodology employed by the GAO stands in stark contrast to the prior studies based on secret, proprietary data from RPX and Patent Freedom. The GAO Report explicitly recognized that these prior studies were fundamentally flawed given that they relied on “nonrandom, nongeneralizable” data sets from private companies (GAO Report, p. 26).  In other words, even setting aside the previously reported concerns of self-selection bias and nonobjective rhetoric, it is inappropriate to draw statistical inferences from such sample data sets to the state of patent litigation in the United States as a whole.  In contrast, the sample of 500 lawsuits selected by Lex Machina for the GAO study is truly random and generalizable (and its data is publicly available and testable by independent scholars).

Indeed, the most interesting results in the GAO Report concern its conclusions from the publicly accessible Lex Machina data about the volume and characteristics of patent litigation today.  The GAO Report finds that between 1991 and 2011, applications for all types of patents increased, with the total number of applications doubling across the same period (GAO Report, p.12, Fig. 1).  Yet, the GAO Report finds that over the same period of time, the rate of patent infringement lawsuits did not similarly increase.  Instead, the GAO reports that “[f]rom 2000 to 2011, about 29,000 patent infringement lawsuits were filed in the U.S. district courts” and that the number of these lawsuits filed per year fluctuated only slightly until 2011 (GAO Report, p. 14).  The GAO Report also finds that in 2011 about 900 more lawsuits were filed than the average number of lawsuits in each of the four previous years, which an increase of about 31%, but it attributes this to the AIA’s prohibition on joinder of multiple defendants in a single patent infringement lawsuit that went into effect in 2011 (GAO Report, p. 14).  We also discussed the causal effect of the AIA joinder rules on the recent increase in patent litigation here and here.

The GAO Report next explores the correlation between the volume of patent infringement lawsuits filed and the litigants who brought those suits.  Utilizing the data obtained from Lex Machina, the GAO observed that from 2007 to 2011 manufacturing companies and related entities brought approximately 68% of all patent infringement lawsuits, while patent aggregating and licensing companies brought only 19% of such lawsuits. (The remaining 13% of lawsuits were brought by individual inventors, universities, and a number of entities the GAO was unable to verify.) The GAO Report acknowledged that lawsuits brought by patent licensing companies increased in 2011 (24%), but it found that this increase is not statistically significant. (GAO Report, pp. 17-18)

The GAO also found that the lawsuits filed by manufacturers and patent licensing companies settled or likely settled at similar rates (GAO Report, p. 25).  Again, this contradicts widely asserted claims today that patent licensing companies bring patent infringement lawsuits solely for purposes of only nuisance settlements (implying that manufacturers litigate patents to trial at a higher rate than patent licensing companies).

In sum, the GAO Report reveals that the conventional wisdom today about a so-called “patent troll litigation explosion” is unsupported by the facts (see also here and here).  Manufacturers – i.e., producers of products based upon patented innovation – bring the vast majority of patent infringement lawsuits, and that these lawsuits have similar characteristics as those brought by patent licensing companies.

The GAO Report shines an important spotlight on a fundamental flaw in the current policy debates about patent licensing companies (the so-called “NPEs” or “patent trolls”).  Commentators, scholars and congresspersons pushing for legislative revisions to patent litigation to address a so-called “patent troll problem” have relied on overheated rhetoric and purported “studies” that simply do not hold up to empirical scrutiny.  While mere repetition of unsupported and untenable claims makes such claims conventional wisdom (and thus “truth” in the minds of policymakers and the public), it is still no substitute for a sensible policy discussion based on empirically sound data. 

This is particularly important given that the outcry against patent licensing companies continues to sweep the popular media and is spurring Congress and the President to propose substantial legislative and regulatory revisions to the patent system.  With the future of innovation at stake, it is not crazy to ask that before we make radical, systemic changes to the patent system that we have validly established empirical evidence that such revisions are in fact necessary or at least would do more good than harm. The GAO Report reminds us all that we have not yet reached this minimum requirement for sound, sensible policymaking.