A Note to Readers
This essay was co-authored with my longtime collaborator and friend, C. Daryl Cameron, an Associate Professor of Psychology and Senior Research Associate at the Rock Ethics Institute at Penn State University. Daryl and I have been thinking about empathy, its limitations, and its possibilities for about a decade now—including recent explorations into AI's capacity for empathic expression. While some scholars are raising alarm bells about empathic AI, we've found ourselves in the unusual position of being cautious optimists. We're not AI evangelists, but we do think the evidence deserves a fair hearing before we sound the alarm.
When the two of us were young, the telephone was our lifeline. Even though long distance rates were expensive, the phone allowed us each to connect with our grandparents, parents, and later girlfriends across states, countries, and even continents. It wasn't being there, but it was real connection: meaningful, rich, and nourishing. These phone calls weren't merely sufficient substitutes for human contact, they were necessary elements of human connection, allowing each of our relationships to grow and thrive.
Yet when telephones first appeared, moral panic ensued. Critics warned this device would erode true human intimacy and destroy our social fabric. These devices were lifeless and disembodied, after all; they could never capture the richness of face-to-face interaction.
Sound familiar?
The same breathless warnings now dominate discussions about AI. But are we just repeating history, mistaking a new tool for an existential threat to human connection?
This pattern of alarm is unfolding in our headlines right now. On February 27, The Guardian published an opinion piece by Princeton psychologist MJ Crockett ("AI is 'beating' humans at empathy and creativity. But these games are rigged") that opens not with data or evidence, but with unsettling imagery of polarizing tech CEOs poised to replace humans with algorithms. This alarmist framing colours everything that follows, subtly positioning readers to reject research findings before examining them on their merits.
The real question isn't whether we should fear villainous tech bros, but whether studies comparing AI and human capabilities actually reveal something meaningful.
Crockett argues that studies showing AI outperforming humans in various capacities like empathy, creativity, and mediation are methodologically flawed because they force humans to operate in contexts that are so alien, so lifeless, that the results fail to reflect genuine human capabilities. Worse, these contexts are “rigged” to favour AI because they miss the embodied nature of genuine human connection. In brief, these studies are unfair and can be more or less ignored.
As psychologists studying both human empathy and emerging technologies, we find Crockett's dismissal of this research as “rigged” bizarre. It overlooks how people genuinely connect in today's digital world and undervalues these interactions. The critique mischaracterizes what artificial lab studies can reveal about the workings of the mind. It also romanticizes human connection, presenting a utopian version that doesn't acknowledge the flaws, biases, and limitations that often characterize human interactions.
When studying moral psychology, one must set aside one's personal moral convictions when trying to understand how people think, feel, and behave around moral issues. This essay offers a revealing example of what happens when the boundary between moral conviction and scientific analysis blurs. The author's intuitions about AI appear to be driving their interpretation of the evidence, rather than the other way around.
Digital Communication Is Human Communication
The critique claims that studies comparing humans to AI are unfair because they force people to engage with “lifeless simulacra of human tasks”. What are these alien tasks exactly?
Just the thing most of us do all day, every day: communicating online. This framing dismisses digital communication as somehow inauthentic, even though it constitutes the dominant form of human connection in today's world.
For example, Crockett dismisses an influential study suggesting that AI responds with greater compassion than real physicians on a real Reddit forum, called r/AskDocs. They argue people wouldn't seek "generic written responses from a stranger" when in distress. But this is precisely what thousands of real people voluntarily do every day on this very same subreddit and countless other support forums.
Millions seek support through online communities today, which aren't poor substitutes for "real" connections—they are real connections. Human connection exists on a spectrum, from deep in-person bonds with loved ones to meaningful digital interactions with friends, colleagues, and even strangers online. Dismissing digital connections as unreal imposes a narrow definition of social reality that overlooks how most young people around the world now connect. We suggest it is more inclusive to be open to these different realities.
We text, email, post on forums, and comment on social media. Many people now receive therapy via text or Zoom, contexts in which AI could serve a meaningful role. Limiting "real" empathy to those deeply embedded in our social world would disqualify many therapeutic relationships that exist today, including those with doctors, therapists, and crisis counselors.
To be sure, human connection extends far beyond digital text. Some aspects—particularly physical presence and touch—remain domains where AI cannot (yet) compete effectively. And perhaps there is something intrinsically nourishing about embodied presence. Either way, this does not mean digital interactions are hollow; instead, it suggests that different modes of interactions serve different emotional needs.
AI-mediated empathy might be especially comforting in contexts where in-person support is unavailable or inaccessible, such as with people who are disabled, house-bound, or who live in rural areas. Not everyone has the privilege of engaging in the rich, conversational empathy that is often put on a pedestal; and sometimes, the least threatening option occurs over digital platforms.
Experiments Are Not Always Meant to Simulate Real Life
Crockett's critique of creativity research raises important questions about scientific methodology. They argue that because real-world scientific creativity often happens collaboratively, testing individual PhDs against AI is not a valid way to assess creativity.
Setting aside the fact that individual creativity has produced numerous scientific breakthroughs—from Newton's laws to Einstein's theories—this critique fails to appreciate the purpose of controlled psychological experiments. Scientists often put people, other animals, and now machines through completely unnatural scenarios to help them make sense of how “minds” work. These artificial conditions aren't bugs in the methodology; they're deliberate features that isolate specific cognitive processes to enhance our understanding.
Consider the Stroop task—a nearly 100-year-old psychological test in which you name the font colour of written words (e.g., the word GREEN presented in red font) while ignoring the word itself. One is unlikely to encounter colour-word conflicts like this in the wild, yet this artificial exercise has revealed fundamental insights about attention, executive function, and cognitive control that form the backbone of modern cognitive psychology. These insights helped shape concepts in Israeli Nobel Laureate Daniel Kahneman's Thinking, Fast and Slow and helped birth an entire new field of study, behavioural economics.
It accomplished all this despite being completely artificial. The knowledge gained from this "lifeless" task has enhanced our understanding of the mind and improved human life. Scientific tasks don't need to perfectly mirror reality to tell us something meaningful about it. This is a lesson we sometimes forget in our eagerness to make our science relevant and impactful[i].
Scientists deliberately trade some ecological validity for precision and causal clarity. With these laboratory findings in hand, researchers can and should stress-test their validity in real-world contexts, something we've previously advocated for.
While AI-human comparisons may not capture all real-world complexity, they provide valuable insights into specific cognitive capabilities. And once enough basic findings accrue, scholars need to leave the lab and examine to what extent they generalize to the real world. And this is exactly what's happening with empathic AI research, where qualitative studies and interventions are examining impacts on tangible outcomes like loneliness and suicide prevention. This applied work demonstrates that AI's advantages are far from "rigged"—they translate to meaningful benefits in real human lives.
The Idealization of Human Capacities
Perhaps most problematic is how Crockett romanticizes human capabilities by describing only their idealized versions while focusing on AI's limitations. Crockett writes that humans "build relationships thick with meaning" and "find common ground reflected in our faces, voices and postures." Although AI itself may be able to emulate voices, postures, and the like, we grant that there are important differences between the two.
Yet this poetic description ignores the reality that human connection is far from perfect. Crockett doesn’t mention the many biases that sometimes plague empathy: it excludes those not like us, it is insensitive to mass suffering, and it occasionally leads to immoral acts. Though we suggest these biases reflect motives and choices, not hard limits on empathy itself, the fact remains that human empathy is flawed.
Our own research on empathy avoidance has demonstrated that people regularly choose to avoid empathizing with—or generating compassion for—others, even loved ones, because empathy is emotionally taxing. Empathy is challenging, especially when understanding experiences and social realities that diverge from our own, including the very human-AI interactions being dismissed here.
Our friends, family—and, yes, even our therapists—are sometimes tired, distracted, and just going through the motions of caring. Those physicians in that Reddit study? Maybe they lacked kindness because they were overworked. Empathy is hard, after all.
People often find themselves unable or unwilling to reveal vulnerable truths to loved ones or therapists due to embarrassment or fear of judgment—barriers that notably diminish when disclosing to AI. This doesn't mean genuine human connection isn't valuable—it absolutely is—but that does not mean there is not a place for AI to supplement our care. For those who benefit from such support, dismissing AI's possibilities dismisses their experiences.
Moving Beyond False Dichotomies
Fears about AI are not new; they echo past moral panics over technologies that were initially seen as dehumanizing. The printing press was feared for spreading misinformation, the telephone for undermining real conversation, and calculators for destroying mental arithmetic.
We now appreciate that these fears were misguided. What makes AI so different?
We’re not suggesting that we should uncritically embrace every AI claim. Even now, we'd probably get more from talking with a colleague over a beer than with ChatGPT. Nevertheless, AI systems are becoming increasingly capable in domains once considered uniquely human. This doesn't mean they'll replace the richness of human connection or collaborative innovation. But it does mean we should take these capabilities seriously and consider both their potential benefits and limitations. It’s essential to understand these emerging tools without the ideological baggage that has turned this scientific question into a false binary between tech enthusiasts and humanist defenders.
This is not to say there aren't legitimate concerns about AI, including empathic AI. In our own writing, we've highlighted several, including the risk of habituating to AI empathy, potentially making human empathy seem poor by comparison. Anat Perry suggests another concern: AI empathy might never feel as nourishing because AI doesn't bear effort costs; although surprising new research led by Joshua Wenger challenges this. Finally, work by Morteza Dehghani suggests that perhaps AI's empathic edge will evaporate over time as we get bored with its predictable, homogeneous responses; maybe the oscillation between clumsy and profound care is what makes human connection so nourishing.
As we navigate this complex terrain, we need more nuanced conversations that avoid both uncritical enthusiasm and reflexive dismissal. We need to recognize that AI may indeed surpass humans at certain tasks while falling short at others. And we need to approach these questions with intellectual curiosity and openness to evidence.
AI alarmism thrives on idealized visions of human connection and flawed comparisons. It’s a fantasy of a world where human connection is sacred and creativity is collective, if only we could break free from the sterile logic of the machine. But reality has never worked that way. If we truly want to understand AI’s place in society, we must move beyond these romanticized illusions of human community and engage with the evidence on its own terms.
Right to Reply
My last few posts have been more critical than usual. While I believe robust debate advances our understanding, I want to ensure those I critique feel heard in return. So, taking a page from one of my favorite podcasts, Decoding the Gurus, I'd like to extend an open invitation: if you feel I've treated your ideas unfairly, I'm offering you a right to reply. I promise to publish your response, editing only for length and clarity if needed.
Science thrives when we can disagree vigorously about ideas while maintaining respect for each other as thinkers. This policy is my attempt to balance my tendency toward the impulsive and provocative with a commitment to fairness.
My inbox is open.
[i] Full Disclosure time. One of us (Michael) made this same error in a recent Substack. Paul Bloom was kind enough to set me straight.
I think you correctly point out that empathy expressed via the written word can be nourishing, significant, and genuine. There is nothing inherently inferior about it (which is not to deny that there are characteristic and significant advantages to the face-to-face embodied encounter). Are the loving epistles of Eloise and Abelard any less human or empathic because they are written down rather than spoken live? Of course not. Singling out empathy expressed in written form is a red herring.
That said, I think there are important shortcomings of the extant AI empathy studies that focus on written expressions. But Molly Crockett misdiagnoses the problem in the Guardian piece.
Here's my best attempt at articulating what I feel is missing. I'd love to hear your thoughts. The shortcoming of the studies is not that empathy expressed in written form is second-rate. The shortcoming of the studies is that it is very hard to determine whether empathy expressed in written form is real empathy. That is, it is very hard to tell whether such expressions of empathy issue from real feeling or not. The task, "respond empathically (in writing)" is a task that asks us to perform empathy rather than actually empathize. We can get very good at the performance without actually being good at empathizing at all. Consequently, when I compare two passages and rate them for how "empathic" they are, I'm really just assessing the virtuosity of a certain kind of performance. LLMs are truly virtuosic. I bet they would beat humans at the task of expressing empathy through a facial expression, too. [Here's a study idea: you get people to comparatively rate pictures of real human faces responding empathically to prompt and AI-generate images of faces responding empathically to a prompt.]
I imagine LLMs would outperform similarly for tasks like, "respond passive aggressively to this letter" or "respond in a way that expresses bitter envy" or "respond in a way that expresses incredulity". Would we conclude that performance on such tasks show that AI surpasses humans in the ability to respond with passive aggressiveness or outrage or incredulity? Would we even be troubled or fascinated with the capacity of LLMs to do well at such tasks? It certainly wouldn't be fascination of the kind that induces AI vertigo. So why should we be troubled and fascinated by the capacity of LLMs to produce empathic sounding written passages in response to prompts? The positive assessment of AI-generated expressions of empathy may have little or nothing to do with assessing whether whatever generated that performance is empathizing well, poorly, or not at all.
I think that the fact that people sincerely report feeling seen and heard by their Repliika boyfriends and girlfriends is pretty interesting. I would go so far as to say it's troubling and fascinating. These are extended, reciprocal, and richly contextualized encounters. These are the kind of contexts where it becomes hard to fake empathy, and where we care about more than a particular kind of virtuosic performance. But my sense is that the subset of humans who experience being seen and heard in such contexts is a small minority. Most people won't conflate the simulacrum of empathy with the real thing, and they won't feel seen and heard when they know the entity in question has no experience of seeing or hearing. That's my hunch, at least. But maybe as things progress we'll start to lose our grip. My intuition is that this would be a lamentable eventuality. I wonder what you think.
Really enjoyed this! Wanted to share a few of my thoughts:
First, is the telephone really the right analogy for AI? Phones, video chat, the printing press -- these are pieces of tech that mediate human communication without replacing it. Yet, almost all the use cases right now for AI in the loneliness/empathy space seem to be focused on replacing human interactions. This seems categorically different than most of the technology you highlighted.
I also totally agree with your point about the importance of lab experiments, and the AI studies you mentioned have been important proof of concepts for our ability to benefit from or connect with AI in ways we might not have expected. But, the vast majority of these studies are focused on asynchronous, or extremely brief interactions, and only look at the immediate emotional impact of the interactions with AI - meaning that we actually know very little about the long-term consequences of AI empathy/companionship. I know you mentioned this, but it seems like an absolutely critical asterisk around the conclusions we are making about AI right now. If we were studying the impact of potato chips, and we only focused on the immediate term impact, we might conclude that potato chips are perfectly fine meal replacement since people seem to be pretty happy every time they eat one, but this would mask serious long-term consequences we'd be totally blind to.
I also think it's impossible to divorce research on the impact of empathic AI from concerns about these companies motives. Unfortunately, these companies are not driven by concerns about our well-being, and the vast majority of people will be interacting with AI created by these conglomerates. There are a number of decisions that Replika has made for example, that seem optimized for engagement rather than wellbeing. Unfortunately, this means that any given study on AI companions will be divorced from the real-world, given that it's unlikely any given company will be designing these things in the exact same way a well-intentioned researcher would.
Thanks for the thoughtful post!