Want to stay current with Arthur’s writing? Sign up to get an email every time a new column comes out.
We live in interesting times for the social sciences. For the past several decades, disciplines such as social psychology and behavioral economics seemed to unlock many of the mysteries of human life, and academics and laypeople alike couldn’t get enough of what they revealed. Journalists, too, lapped up insights into how, say, otherwise decent people given arbitrary power over others can become brutal and even sadistic—as the famous Stanford prison experiment purported to find when it asked volunteers to simulate the roles of prisoners and prison guards. And everyone delighted in the cleverly eye-catching ways researchers designed such studies.
Lately, however, the social-science world has become mired in controversy. Researchers themselves have started to note that many famous experiments have been debunked—such as, indeed, the Stanford prison experiment—or simply can’t be replicated. Scholars writing in the journal Science in 2015 reproduced 100 experiments published in three highly influential psychology journals and found that just 36 percent yielded results consistent with the original findings.
Worse still, multiple allegations of unethical data practices have emerged, some proved. The misconduct has included so-called p-hacking—processing data in search of spurious correlations and publishing them as valid results—and even outright fraud, in which researchers have altered their data to fit a preconceived result.
[Read: An unsettling hint at how much fraud could exist in science]
A natural conclusion for many outside the profession is grave skepticism about the field: People are bound to wonder if research based on behavioral experiments simply can’t be trusted. I get that, but I still reject the notion that the whole enterprise has been discredited. As an academic social scientist—and, more to the point, an arbiter of such research for readers of The Atlantic—I’d like to offer a look behind the curtain, to show how research works, why it goes wrong, how to read it, what to trust, and what to disregard.
Misbegotten studies are not limited to the social sciences. The natural sciences suffer from very similar difficulties. John Ioannidis, a medical researcher at Stanford University, attributes this problem to a series of study flaws: experiments that are too small to trust; cases in which a result seems valid because of mathematical sleight of hand but is effectively meaningless; poor experimental designs; and—the most entrenched—academic bias caused by career incentives to find particular results.
These incentives are brutal. In some fields, a professor needs to have published dozens of research articles to have a prayer of getting tenure (without which they are usually shown the door). Imagine studying for a decade through college and graduate school, then beavering away for perhaps another 10 years on research while managing a hefty teaching load, only to be in the position of betting your career on the up-or-out decision of a university committee. As one researcher wrote on Political Science Rumors, an anonymous message board where academics in this field can openly converse, “The tenure track is a dehumanizing process, because it treats us as apprentices until we’re at least well into our 30s and often significantly past 40.” Although my own quest for tenure was years ago, I well recall the 70-hour weeks in a windowless office and intense stress that it involved.
At top universities, even for published research to be properly executed is not good enough. The topic also has to be clever, and ideally the findings should be surprising. Empirical evidence that dog bites man doesn’t get much credit; an ingenious experiment that uncovers evidence suggesting that, actually, men bite dogs 73.4 percent of the time—now we’re talking.
Not surprisingly, perhaps, this impulse to center on the catchily counterintuitive leads to problems. A 2019 article titled “Cheat or Perish?” in the journal Research Policy looked at the incentives in academia to cut corners or straight-up cheat. Its authors noted that although carrots can encourage productive work that is high quality, even in outwardly dispassionate research, sticks can deter mistakes or misbehavior. But the scholars also wrote that errors and misconduct in research can be hard to detect, which effectively reduces the ability to investigate them. This means that the carrots heavily outweigh the sticks, predictably affecting the integrity of the work.
Tenure alone does not necessarily eliminate the problematic incentives. Senior researchers easily fall prey to hubris and fail to seek advice and constructive criticism. Research has also shown that people with more expertise in a field tend to react more poorly than those with less experience by becoming even more pedantic when they receive “disconfirming feedback”—in other words, they don’t want to hear that they’re wrong. And the stakes rise in an era when behavioral scientists can become superstars outside academia, earning speaking engagements and book deals for their thrillingly heterodox findings.
I do not believe that most experimental research in the social sciences is false, and I’ve been blessed to work with colleagues and collaborators throughout my career who are uncompromisingly scrupulous. Yet I am very cautious about the research that I cite in this column, in addition to our careful routine fact-checking process, precisely because of the problems described above. In preparing each week, I usually get the lay of a research landscape and then closely read 10 or 12 relevant academic articles. When I’m deciding what to use to make my arguments, I first look at the quality of the research design, using my academic training. Then I follow three basic rules.
Over the past few years, three social scientists—Uri Simonsohn, Leif Nelson, and Joseph Simmons—have become famous for their sleuthing to uncover false or faked research results. To make the point that many apparently “legitimate” findings are untrustworthy, they tortured one particular data set until it showed the obviously impossible result that listening to the Beatles song “When I’m Sixty-Four” could literally make you younger.
So if a behavioral result is extremely unusual, I’m suspicious. If it is implausible or runs contrary to common sense, I steer clear of the finding entirely because the risk that it is false is too great. I like to subject behavioral science to what I call the “grandparent test”: Imagine describing the result to your worldly-wise older relative, and getting their response. (“Hey, Grandma, I found a cool new study showing that infidelity leads to happier marriages. What do you think?”)
I tend to trust a sweet spot for how recent a particular research finding is. A study published more than 20 years ago is usually too old to reflect current social circumstances. But if a finding is too new, it may have so far escaped sufficient scrutiny—and been neither replicated nor shredded by other scholars. Occasionally, a brand-new paper strikes me as so well executed and sensible that it is worth citing to make a point, and I use it, but I am generally more comfortable with new-ish studies that are part of a broader pattern of results in an area I am studying. I keep a file (my “wine cellar”) of very recent studies that I trust but that I want to age a bit before using for a column.
The perverse incentive is not limited to the academy alone. A lot of science journalism values novelty over utility, reporting on studies that turn out to be more likely to fail when someone tries to replicate them. As well as leading to confusion, this misunderstands the point of behavioral science, which is to provide not edutainment but insights that can improve well-being.
I rarely write a column because I find an interesting study. Instead, I come across an interesting topic or idea and write about that. Then I go looking for answers based on a variety of research and evidence. That gives me a bias—for useful studies over clever ones.
Beyond checking the methods, data, and design of studies, I feel that these three rules work pretty well in a world of imperfect research. In fact, they go beyond how I do my work; they actually help guide how I live.
In life, we’re constantly beset by fads and hacks—new ways to act and think and be, shortcuts to the things we want. Whether in politics, love, faith, or fitness, the equivalent of some hot new study with counterintuitive findings is always demanding that we throw out the old ways and accept the latest wisdom.
[Read: Online bettors can sniff out weak psychology studies]
I believe in personal progress. That is why I write this column, and I like to think I’ve made some in my life. But I also know that our novelty- and technology-obsessed culture is brimming with bad ideas and misinformation—the equivalent of p-hacked results, false findings, and outright fraud for personal gain.
So, in life as in work, when I see a bandwagon going by, I always ask, Does this seem too good to be true? I let the cultural moment or social trend age for a while. And then I consider whether it is useful, as opposed to simply novel. Maybe this pause makes me a bit of a square—but it rarely fails me.