Last week I declared the social psychology of death officially dead. I may have also gotten a little carried away accusing my friend Steven Heine of motivated reasoning in the process. You know, just your typical Wednesday of calling out a respected colleague for being blinded by his own theory.
Steve could have responded by telling me exactly where I could shove my hot takes. Instead, like the true scientist and mensch he is, he rolled up his sleeves and wrote a thoughtful defense of his position. This is exactly what good science looks like: when smart people disagree, they hash it out with evidence, not Twitter feuds.
So here's Steve's rebuttal. Read what he wrote, then jump into the comments and tell us both who's right, who's wrong, and who's just being a nihilist.
I thank Mickey for inviting me to respond to his post, and to discuss our disagreements. I launched a systematic review of the literature on the mortality salience hypothesis with some grad students (I’m particularly indebted to Lihan Chen and Rachele Benjamin who led this project, although what I’m expressing here are my own views). We wanted to consider, with several of the most state-of-the-art tools available, how trustworthy is the Terror Management Theory (TMT) literature? This question is all the more important in an era when we’ve become aware of how publication bias and questionable research practices (QRPs) make us suspicious of past findings.
Our approach was to try to lay out all the most relevant evidence to evaluate the literature. To do this we summarized the results of a previous multisite registered replication report (RRR), the set of other published preregistered efforts to test it, a p-curve, a z-curve, a random effects meta-analysis using WAAP-WLS, selection models, and PET-PEESE as bias-reduction methods, and we also looked at how moderators, such as the date of publication and adherence to key methods, affected the results.
What did we find? The parable of the blind men and the elephant seems apt, as it was striking how different were the conclusions based on which analytic tool we adopted. Indeed, if someone only relied upon a single tool they could reach a conclusion that varied between a moderate-sized effect in support of TMT to a significant effect in the OPPOSITE direction, with the other tools pointing to something in between. We drew a synthetic conclusion by considering all of the evidence, weighing the strengths and weaknesses of the different approaches, and concluded that most studies were under-powered, there was likely evidence of QRPs in the literature, the overall meta-analytic effect was exaggerated, and the underlying effect was non-zero, with an estimate of r=.18. We encourage others to read what we did so they can draw their own conclusions, and we created a ShinyApp to help people analyze different sets of studies.
Our take is obviously different from Mickey’s, and I think it’s worth trying to highlight where we agree and disagree. First, Mickey seems skeptical because the theory “doesn’t make much sense.” This is perhaps the point that I’m most in agreement with, as while I find the theory intriguing, I’ve published a couple of papers where we have critiqued the underlying logic of TMT, and instead pointed to our Meaning Maintenance Model as an alternative. But the evidence that we considered in our review doesn’t speak to whether all of the aspects of TMT’s reasoning are supported; rather, it tests whether people engage in some kind of worldview defense following a mortality salience manipulation.
I think a key point of disagreement between us is with regard to the best way to evaluate the evidence. I feel that Mickey is considering the question of whether TMT is replicable in an overly rigid binary way. For example, he highlights the 3 preregistered studies by Schindler and colleagues that failed to support it, concluding “Mark it zero!” But the internal meta-analysis from that paper pointed to a very small effect at p = .058. It’s true that this is not significant, and we also counted it as a failed replication, but I thought that the replication crisis taught us to avoid treating this p < .05 threshold as though it cuts cleanly between truth and fiction. Should our decisions on the whole merit of the theory be concluded in such a binary way when the evidence really falls along a continuum? And I’m also confused why Mickey is dismissing the 5 preregistered studies (we count one of these differently) that provided support for the hypothesis. Isn’t this the kind of evidence that he’s saying the field should focus on? The evidence from the published preregistered studies on TMT is decidedly mixed, and warrants a far more nuanced conclusion.
It seems that the bulk of Mickey’s rejection of TMT is based on the results of the 2019 RRR. I agree that this RRR was quite damning for TMT, and that we should take the results of RRRs very seriously. But I think it’s worth quoting David Funder’s Fourth Law here: “There are only two kinds of data: terrible data and no data.” Funder’s tongue is firmly in his cheek, but he reminds us that none of the data in our studies are perfect – they’re all fallible in some ways, with of course some kinds being worse than others. And I think that RRRs, like all other kinds of data, are also fallible. RRRs represent a datapoint – one with a narrower confidence interval surrounding the effect size estimate – but wouldn’t we expect that a dozen different RRRs on the identical topic would yield a distribution of findings? If a second RRR was conducted that yielded an overall positive effect should we suddenly reverse our conclusion? I think drawing firm conclusions about replicability on single datapoints is going to give the field whiplash. We can see evidence for this in the competing RRRs on the facial feedback hypothesis, which highlight just how inconclusive single RRRs can be. I also think that the individual studies that make up an RRR suffer from many of the same limitations that standalone studies do – researchers often have a preferred conclusion (either for a successful or a failed replication), and their biases may influence how they use some of the experimental degrees of freedom that always remain. To be sure, the preregistered protocols of RRRs substantially reduce these problems (which is one of their great strengths), but they can never eliminate them completely. I think the field hasn’t spent as much effort as we should evaluating the quality of RRRs and has been too quick at drawing rigid binary conclusions from them. They provide a valuable but fallible perspective on the replicability of a phenomenon.
Like the blind men and the elephant, we’re going to gain a more complete picture of a phenomenon’s replicability if we consider it from more than one perspective, and that’s what motivated us to launch our systematic review. I think we can gain some valuable guidance by turning to Funder’s Second Law: “Something beats nothing, two times out of three.” That is, it’s almost always better to have additional data on hand, as all kinds of data provide important information (unless the data is so misleading that it actually has negative informational value). It seems a lot to say that all of the various bias-reduction tools that we used have nothing to offer. I don’t think it’s reasonable to say that we should only evaluate the evidence for TMT with what may be our single best meta-science tool (the RRR) and cover our ears when people refer to other kinds of evidence.
Mickey questions the use of these meta-analytic tools because they are compromised from what he says is “garbage in and garbage out.” But the question we raised here is how much garbage is there in the literature? Mickey seems rather convinced that the TMT literature is garbage all the way down. But how can any of us know the state of a large literature? The weaknesses of meta-analyses are well-known as they aggregate all the file drawer problems and QRPs of the individual studies. But the bias-reduction tools that we used were all specifically designed to address these limitations –they were created precisely to address the kinds of shortcomings of meta-analyses that Mickey raises. I don’t know how you can p-hack a p-curve. Sure, these individual tools are messy and each has its own limitations, but this doesn’t mean that we should throw up our hands and refuse to try to summarize past literatures. And this isn’t just an issue for controversial literatures where the data quality is questioned – many fields of science rely heavily on meta-analyses to summarize the state of literatures. I think that this flawed set of data from various bias-corrected meta-analytic tools provides some useful informational value that we should consider, and we should continue to make efforts to correct their shortcomings. Typically, these individual bias-reduction tools are evaluated through data simulations, but I think their effectiveness should also be assessed in actual literatures, such as TMT, as these may differ from the simulations in important ways.
I have great respect for Mickey’s commitment to improving our science, and he’s always a sharp and provocative thinker. But we can gain a much broader perspective on the question of replicability by considering the evidence from as many of the best tools at our disposal. Our analysis suggests that making a confident conclusion about a literature on the basis of any single tool, including the RRR, will be misleading. To combat the various biases that individual researchers have, and the shortcomings inherent in individual meta-analytic tools, we need to weigh all of the relevant information. And when we did this with TMT, we ended up detecting a pulse.
Would that be reliable? As you make cautionary tales of researcher biases. How do you know that looking through the garbage bin to find the good looking nitbits is not just cherry picking the few of the thousands that seemed to work without serious problems? But does that mean it is a viable theory still, or that you did a good job in noise mining for false positives?
This is good news! Because of Inzlicht's judgment of TMT, I put a cautionary note at the beginning of my article on the subject, but now I think I can take it off. Should I? And I can go on believing that a man will sit further away from a woman if he has seen one of her tampons, a weird TMT finding that I discuss. https://open.substack.com/pub/eclecticinquiries/p/terror-and-human-weirdness?r=4952v2&utm_campaign=post&utm_medium=web&showWelcomeOnShare=false