Statistics: Don’t Believe Everything you Read

Fake news is a problem, and the Pew Research Center reports that 64% of Americans think it is creating significant confusion about current events and issues. And it’s not just the politically driven websites that are making things up: In April, the Securities and Exchange Commission filed fraud charges against three public companies, seven stock promotion firms and 27 individuals (including two CEOs) for “alleged stock promotion schemes” in which supposedly independent writers published positive analyses while “being secretly compensated for touting company stocks.”

Statistics are often at the heart of fake news and other misleading reports. Sometimes the numbers are intentionally twisted or fabricated to make a point. Other times they have simply been unintentionally misinterpreted. Either way, as executives look to research to help guide decisions, they should keep an eye out for data-driven distortions of reality.

In judging statistics, executives can bring two powerful weapons to bear: common sense and experience. Do the numbers in a study seem unrealistically high—for example, “There are 1 million taxis in New York City”? If the data doesn’t line up with what you know about the world, be skeptical. “If you had no idea things were that bad, they probably aren’t,” writes Joel Best, author of Stat Spotting: A Field Guide to Identifying Dubious Data.

At the same time, make sure your own biases aren’t getting in the way of that skepticism. As a National Geographic visual-data specialist recently said, “Let’s be honest: Not everybody is willing to look further into a chart if the result confirms what they want to believe.”

Who did the research and the reporting? Who funded it? Who stands to gain from the claims being made by a study? Is it an organization with a financial or political agenda? Is a publication cherry-picking findings to make a study more newsworthy?

Watch for conflicts of interest behind the research, sensational headlines and claims that don’t seem to be backed up by the data. Fortunately, the Internet not only makes it easy to spread false information, it also makes it possible to quickly look into the sources behind reports.

As statistics writer Joel Best has noted, “Every statistic is the product of a series of choices made by the people who produce, process and report the data.” Consider that chain as you consume the numbers.

Studies will vary in rigor: A quick online survey will be less structured than a long-term academic study employing control groups. Here, key questions revolve around samples: How large was the sample? Was this an opt-in study where anyone could participate—which tends to attract those with strong negative or positive views—or was it one that used a scientifically determined sample population? It can also be useful to look at what the study asked.

In a recent TED Talk, data journalist Mona Chalabi cited a widely reported study in which 41% of U.S. Muslims said they support jihad. But another question in the study found that the vast majority defined jihad as personal, peaceful religious struggle rather than violent holy war—a data point that was largely ignored in press reports.

Putting two sets of statistics side by side can imply comparisons that are simply not accurate. Looking at the sheer number of murders, rather than percentages of murders in, say, New York vs. Albany, would not really be useful. It’s also important to remember that correlation does not mean causation: Two sets of statistics may have similar trend lines but no meaningful relationship.

To illustrate, a Spurious Correlations website has calculated close correlations between thousands of disparate data sets, including the divorce rate in Maine and U.S. margarine consumption (99.26% correlation); the number of lawyers in Puerto Rico and the number of people who die from falling out of bed (95.70%); and online Black Friday revenues and the number of people killed by dogs (99.56%).

Even if the statistics are correct, the way they are presented in charts can be misleading. Some common mistakes (or techniques, if one is trying to mislead) are shown above. A. Data points are omitted, time scale is uneven—the results look like a constant increase. B. Two sets of unrelated data on one chart create the impression they are linked. C. 3-D rendering makes identical amounts seem different.

D. Cumulative amounts rather than annual amounts create a false impression of growth. E. Use of truncated Y axis starting above zero makes small fluctuations look dramatic. Today’s tools make it easy for virtually anyone to churn out charts that lead to erroneous conclusions—intentionally or inadvertently.