How science is learning to admit mistakes

May 10, 2019

Why should we trust science? If you ask scientists, it’s because science is based on a careful study of the outside world—not guesses, hunches, philosophizing, or rumors—and because science is a self-correcting system, continually revising theories and updating facts to reflect new evidence.

When the British Royal Society was founded in 1660, its mission was to “improve natural knowledge” and its motto was nullius in verba — Latin for “take nobody’s word for it.” In the Society’s view, it was wrong to accept any claim based simply on the authority of who said it—the church, the king, the ancient philosophers. Their principle was “to withstand the domination of authority and to verify all statements by an appeal to facts determined by experiment.”

This idealized vision of science has been with us ever since: claims are true or false because of what experiments say, not what a particular person says.

Reality, however, tells a different story. Experiments are conducted by human beings who then tell everyone else about what they found. Scientific knowledge is the accumulation of people’s claims about the results of their experiments.

More people claiming to find the same thing can increase confidence in the conclusion, but some amount of trust is ineradicable in science, because nobody has the time to personally check every fact they might see or rely on. If scientists have no reason to doubt a study on its face—it comes from a credible author, it’s published in a peer-reviewed journal, and it’s not obviously crazy—they may simply accept it and build on its conclusions.

Scientific careers are built by finding new discoveries to publish, not redoing other people’s experiments. Meanwhile, well-documented publication biases means that journals are more likely to publish studies that report positive results, rather than boring “null” results.

This means that, even to the extent that scientists are testing each other’s work, they are more likely to hear about experiments that confirm their results than experiments that failed to do so.

Even more troubling, the pressure to find new discoveries seems to increase the bias toward finding positive results. Researchers know that journals want positive results, so they search their data to find something “statistically significant” to publish about.

But statistical significance (the likelihood of a result being a coincidence) depends on researchers not gaming the system. If a relationship only had a 5% (or 1 in 20) probability of happening by chance, but researchers tested 20 or 200 possible connections, the odds that they were going to find something go way up. With modern computer programs, researchers can now easily test connections between thousands of variables.

This is called “p-hacking” or “data fishing,” and it is officially frowned upon, but scientists often do it unconsciously, without intending to mislead anyone, simply because they are naturally making choices about where to look for data and how to analyze it. They are often guided by unconscious biases about what “works” to find results.

The consequences of biases and practices like this were not well appreciated for most of the 20th century, while the number of PhDs, scientific journals, and research papers were skyrocketing. Unfortunately, this rapidly accumulating body of knowledge was about to be cast into doubt, once again.

The Reproducibility Crisis

In 2005, John P.A. Ioannidis published a shocking paper in the journal PLOS Medicine, titled “Why Most Published Research Findings Are False.” Ioannidis argued that biases from scientists and journals in publication, research methods, and study design strongly implied that the majority of published biomedical research studies are false—and other, less rigorous fields were likely to be worse.

This bombshell accusation spurred many scientists to try to replicate important studies in their field. Over half of published psychology studies in top journals couldn’t be replicated in one 2015 re-testing effort. In another, only 6 out of 53 “landmark” cancer studies were found to be replicable. In another social science replication study, even when experiments confirmed the original finding, the effect sizes were about half as large as those reported in the original paper.

This “reproducibility crisis” has sprawled out over dozens of academic disciplines, and now fills a large and growing entry on Wikipedia.

Recently, image search programs scanning top scientific journals have found a large number (over 6%) of biomedical studies containing manipulated, duplicated, or mislabeled images. According to one study, as many as 35,000 medical papers may need corrections or retractions over “inappropriately duplicated images.”

And images are just one easily detectable type of problem. Coding mistakes (misplacing a variable in a spreadsheet or making a typo in a formula) are easy to make and hard to detect without the original data. Coding mistakes have undermined landmark research by top scholars on hot-button topics as the effects of discrimination on health, abortion on crime rates, and government debt on the economy.

In retrospect, this shouldn’t have been a surprise to anyone who had bothered to ask scientists themselves. In a survey of 1,500 scientists conducted by the journal Nature, over 50% admitted that they had failed to replicate some of their own work, and 70% reported failing to replicate a colleague’s experiment.

And reasons why aren’t hard to understand, either. In a meta-analysis of 18 surveys asking about scientific misconduct, 2% of scientists admitted that they had personally falsified or fabricated data, and about 14% reported seeing a colleague do so.

A 2012 survey asking about less serious research mistakes (so-called “questionable research practices” or QRPs) found that admitted rates of QRPs were high across all disciplines. “Even raw self-admission rates were surprisingly high,” the authors, concluded, “and for certain practices, the inferred actual estimates approached 100%, which suggests that these practices may constitute the de facto scientific norm.”

In other words, everyone’s doing it.

Loyalty and Cognitive Dissonance

What’s going on here? How can so much publish research be wrong but un-retracted? How can so many researchers know that some of their own findings don’t hold up to scrutiny, but not come clean? How can questionable research practices be so widespread that they “constitute the scientific norm”?

Carol Tavris, a social psychologist and co-author of the classic book Mistakes Were Made (But Not by Me), wouldn’t be surprised by any of this. She points out how cognitive dissonance, or the discomfort people feel from having their beliefs, identity, or ideas contradicted, leads directly to self-justification.

“We reduce dissonance by doing one of two things: we have to change our ideas or behavior to make it consonant with our beliefs—or we justify our beliefs and actions,” says Tavris.

“If you’re in a state of cognitive dissonance, you could say, ‘Gee, thank you for this really important evidence that shows that I’m wrong. I’m really grateful for this very important information’—that, after all, should be the goal of being a free thinker. Unfortunately, all too many people reduce dissonance by maintaining their false beliefs and telling free thinkers where they can stick their data!”

“Unfortunately, all too many people
reduce dissonance by maintaining their
false beliefs and telling free thinkers
where they can stick their data!”

If someone very smart discovers that they made a mistake, they are forced to choose between admitting they’re not as smart as they thought, or finding a way to show that they really were basically right all along.

If a scientist who is generally very ethical bent the scientific rules a bit, they could admit it, retract the paper, and maybe lose a job, connections, or some reputation—or they can find a way to justify it. “It didn’t change my results much, other studies have found the same thing, everyone does it a little, I can’t punish my co-authors for my mistake…”

With such high self-reported rates of failed replication, it’s likely that it’s simply easier to forget about it and leave the second set of results in a file drawer than it is to try to drill down and resolve the dissonance. Few people spend much time try to prove they were wrong, so why should they worry about it?

The huge gap between rates of observed fraud or QRPs by other colleagues than admitted fraud by individuals suggests that cognitive dissonance is at work there, too: literally, “mistakes were made, but not by me.”

Collaboration Diffuses Responsibility

Another factor at work is the increasing tendency toward greater co-authorship. Scientific collaboration is valuable, and more eyes checking work can be helpful, but it also means that individual’s responsibility for the final product is diffused.

Checking that part was someone else’s responsibility, and it’s easier for co-authors to not know everything that was done by other collaborators.

If an error is discovered after publication, making a big deal out of it could seriously damage colleagues’ relationships and careers. And if it wasn’t intentional, and no real fraud or harm was committed, the desire to just drop it is understandable.

Science is self-correcting, after all, so if it doesn’t hold up, someone else will come along and show it. The results may or may not be valid anymore, but if all the correct steps were taken, the data is honest, and the methods are transparent, that’s for other people judge. Nullius in verba can morph into “buyer beware.”

Making a Virtue out of Changing Your Mind

Many of the issues around the reproducibility crisis and questionable research methods are being tackled by encouraging more transparency and closer scrutiny.

Scientists are encouraged to pre-register the hypothesis they are testing before they begin their work, to discourage p-hacking and fishing around for correlations after the fact. More journals encourage or require sharing the underlying data for calculations, making it easier for other scientists to double check their conclusions. PLOS One, a major group of open-access journals, has recently become a “retraction engine” in part because it has established a dedicated team to review questions about research integrity.

Simply greater awareness of the problem has encouraged many scholars to go back through journals and systematically check the most foundational studies in their field, something that was only rarely done before.

But all of these efforts depend on journals, universities, and other scientists policing each other’s work more aggressively, and that costs time and money, and it can reduce trust between researchers. That kind of suspicion and friction between colleagues has its own costs, which are harder to quantify but just as significant.

The people who are most likely to know about potential issues with their research are the authors themselves, and surveys show that many scholars are well aware of issues with their methodology, data quality, or reproducibility that wouldn’t be obvious to other researchers, peer reviewers, or journalists.

To try to promote a new culture of openness and self-correction in science, in 2016, a team of psychologists created a pilot program called the Loss of Confidence Project to encourage researchers to come forward with concerns or disclosures that have undermined their faith in their own work.

The project collects statements from scientists who no longer believe their research holds up. Unlike a retraction, which would be an extreme and costly punishment for messing up, a loss of confidence (LOC) statement doesn’t imply that the authors did anything dishonest or made a major technical mistake.

Rather, the work was done in good faith and is technically correct, but the author no longer thinks the conclusion is defensible because of flaws in their methodology, design, or interpretation.

For instance, the psychologist Stefan Schmukle issued a LOC statement about a study he co-authored about a correlation between finger length and gender bias. Schmukle now suggests the result was a false-positive created by multiple tests, but they only published the test that gave a positive outcome.

Nonetheless, he doesn’t think the paper should be outright retracted, because he and his co-authors did the work in good faith. In a world where the number of publications is the key to getting a job or a promotion, other scientists agree that there ought to be a way to publicly doubt a publication without losing it altogether.

“In my view, a retraction would be appropriate if the data were faked, if the statistical analysis was wrong, or something like that,” Schmukle says. The problem with his paper, however, is that some of the results, which in hindsight are important for understanding it, were not reported.

… Marcus Munafò, a biological psychologist at the University of Bristol in the U.K. who has pulled a paper after spotting an error, made a similar case. “Whether or not to retract a paper is a tricky issue,” he said. “But I wouldn’t retract papers that report results that are almost certainly wrong but that were conducted in good faith.”

The project wants to give researchers an avenue for questioning research that won’t punish them for being honest about mistakes or changing their minds. In a paper analyzing the first round of LOC statements they collected, titled “Putting the Self in Self-Correction,” the team noted that

A research culture in which individual self-corrections are the default reaction when errors or misinterpretations have occurred might not only reduce conflict between authors and critics, but also change the way in which criticism from third parties is dealt with, as it would create more distance between researchers’ identities and their findings.

Tavris says that simply acknowledging cognitive dissonance can help a person overcome it. Acknowledging that there is a difference between what you may have written and your self-image is an important way to reduce cognitive dissonance, without digging in your heels or losing faith in yourself as a careful, honest researcher.

In their paper, the LOC team was careful to emphasize that admitting to mistakes doesn’t prove that someone is a bad researcher, and that correcting the record may actually be a sign of scientific virtue:

Confronted with researchers who publicly admit to errors, one should keep in mind that this is not a reliable indicator that the researchers were less diligent in their work than others — after all, errors are frequent throughout the scientific record.

On the contrary, given the potential (or perceived) costs of individual self-corrections, public admission of error could be taken as a costly and thus credible signal that the issuer cares about the correctness of the scientific record.

They hope this new paradigm will encourage the scientific community as a whole to practice more self-correction. They suggest journals allow authors to attach LOC statements, explaining their reasons for doubting a result, instead of requiring total retractions. They also mention a more radical idea: a publishing system that allows for continuously updating articles based on new evidence or understanding while still preserving previous versions, in a kind of “wiki for science” model.

Ultimately, regardless of the publishing model, the goal is to change the culture around scientific correction from a mostly antagonistic ethos, where other scientists must tear each other down, to permit a self-improvement track as well. With luck, seeing scientists personally practice the model of revision and self-correction that the system as a whole aspires to could be the key to restoring public confidence in the scientific community and science itself.