I think you correctly point out that empathy expressed via the written word can be nourishing, significant, and genuine. There is nothing inherently inferior about it (which is not to deny that there are characteristic and significant advantages to the face-to-face embodied encounter). Are the loving epistles of Eloise and Abelard any less human or empathic because they are written down rather than spoken live? Of course not. Singling out empathy expressed in written form is a red herring.
That said, I think there are important shortcomings of the extant AI empathy studies that focus on written expressions. But Molly Crockett misdiagnoses the problem in the Guardian piece.
Here's my best attempt at articulating what I feel is missing. I'd love to hear your thoughts. The shortcoming of the studies is not that empathy expressed in written form is second-rate. The shortcoming of the studies is that it is very hard to determine whether empathy expressed in written form is real empathy. That is, it is very hard to tell whether such expressions of empathy issue from real feeling or not. The task, "respond empathically (in writing)" is a task that asks us to perform empathy rather than actually empathize. We can get very good at the performance without actually being good at empathizing at all. Consequently, when I compare two passages and rate them for how "empathic" they are, I'm really just assessing the virtuosity of a certain kind of performance. LLMs are truly virtuosic. I bet they would beat humans at the task of expressing empathy through a facial expression, too. [Here's a study idea: you get people to comparatively rate pictures of real human faces responding empathically to prompt and AI-generate images of faces responding empathically to a prompt.]
I imagine LLMs would outperform similarly for tasks like, "respond passive aggressively to this letter" or "respond in a way that expresses bitter envy" or "respond in a way that expresses incredulity". Would we conclude that performance on such tasks show that AI surpasses humans in the ability to respond with passive aggressiveness or outrage or incredulity? Would we even be troubled or fascinated with the capacity of LLMs to do well at such tasks? It certainly wouldn't be fascination of the kind that induces AI vertigo. So why should we be troubled and fascinated by the capacity of LLMs to produce empathic sounding written passages in response to prompts? The positive assessment of AI-generated expressions of empathy may have little or nothing to do with assessing whether whatever generated that performance is empathizing well, poorly, or not at all.
I think that the fact that people sincerely report feeling seen and heard by their Repliika boyfriends and girlfriends is pretty interesting. I would go so far as to say it's troubling and fascinating. These are extended, reciprocal, and richly contextualized encounters. These are the kind of contexts where it becomes hard to fake empathy, and where we care about more than a particular kind of virtuosic performance. But my sense is that the subset of humans who experience being seen and heard in such contexts is a small minority. Most people won't conflate the simulacrum of empathy with the real thing, and they won't feel seen and heard when they know the entity in question has no experience of seeing or hearing. That's my hunch, at least. But maybe as things progress we'll start to lose our grip. My intuition is that this would be a lamentable eventuality. I wonder what you think.
Great reflections, Jason. So to sum, you beleive AI can play-act/parrot empathy (and many other emotions and soon expressions), but that that does not mean it is actually empathic. I agree with this 100%. And in our paper, we sidestep this entirely, saying AI is not empathic (for now). The question then becomes not whether it can perform empathy (it can), but whether people legitimately feel cared for/heard by the performance of empathy. And may be what you are saying is that whether an agent can *be* empathic will determine whether people feel heard/cared for. So, perhaps the question of whether AI can generate/feel real empathy should not be sidestepped because this is the essence of feeling loved...but maybe also to love. I'm not sure yet, but I might agree with you. It's funny because as I received your message I finished this beautiful novel by Kazuo Ishiguro called Klara and the Sun about an artificial friend (AF). Spoiler Alert: At the very end the AF realizes that others could not truly love it, not because the AF is not lovable, but because others do not believe it is real. So, perhaps the heart of the matter is beleiving an agent is real, autonomous, and free. I guess we'll see how the next generation feels about interacting with AIs... Thanks for engaging!
Thanks for the comment, Jason! I'll echo Mickey that Klara and the Sun is a great novel on this topic, I actually have drawn on a quote from it for a few AI-empathy talks recently, on limits of empathy.
I'd like to interrogate the point about performance vs. interior feeling. I think we'd all agree that human emotions are in many respects dyadic, interpersonal in their function and form. Empathy as an expression conveyed from one person to another, the generative act and experience of the expresser matched by what is received, feelings of being heard. It's an open question whether expression + internal experience is felt as more real or empathetic than expression alone. We could ask this of humans too -- we can think of examples where humans maybe don't internally empathize as much as they express in conversation to another person, for example.
Put another way, when we judge a person's empathy, a human that is, I wonder to what extent we are really judging their felt experience vs. performance of it, given the dyadic nature of empathetic expression. We might make inferences about interiority, but I wonder if they're guided by the performative behavioral expression.
And then taking a step back, I wonder to what extent this makes us reflect on what we consider the boundaries of "empathy" as a construct. In everyday perception, to what extent do we draw inferences about its presence vs. absence (here, really engaging with the "does AI truly empathize?" question), based on limited behavioral observations and then guiding to the best inferences we can about internal states. In this way, I think the human-AI interaction work can really guide us in refining how we think about our empathy more generally.
Really enjoyed this! Wanted to share a few of my thoughts:
First, is the telephone really the right analogy for AI? Phones, video chat, the printing press -- these are pieces of tech that mediate human communication without replacing it. Yet, almost all the use cases right now for AI in the loneliness/empathy space seem to be focused on replacing human interactions. This seems categorically different than most of the technology you highlighted.
I also totally agree with your point about the importance of lab experiments, and the AI studies you mentioned have been important proof of concepts for our ability to benefit from or connect with AI in ways we might not have expected. But, the vast majority of these studies are focused on asynchronous, or extremely brief interactions, and only look at the immediate emotional impact of the interactions with AI - meaning that we actually know very little about the long-term consequences of AI empathy/companionship. I know you mentioned this, but it seems like an absolutely critical asterisk around the conclusions we are making about AI right now. If we were studying the impact of potato chips, and we only focused on the immediate term impact, we might conclude that potato chips are perfectly fine meal replacement since people seem to be pretty happy every time they eat one, but this would mask serious long-term consequences we'd be totally blind to.
I also think it's impossible to divorce research on the impact of empathic AI from concerns about these companies motives. Unfortunately, these companies are not driven by concerns about our well-being, and the vast majority of people will be interacting with AI created by these conglomerates. There are a number of decisions that Replika has made for example, that seem optimized for engagement rather than wellbeing. Unfortunately, this means that any given study on AI companions will be divorced from the real-world, given that it's unlikely any given company will be designing these things in the exact same way a well-intentioned researcher would.
Love these thoughts, Dunigan! I think I agree with nearly all of them, especially the part about the need for longitudinal studies. As far as I know, there are only a few of these out there now, and this is definitely the next step. (In fact, I was just asked to review such a longitudinal study not a few hours ago!). I also think more first-person studies are needed, not third party evaluations. I also agree the telephone isn’t a perfect analogy. We were trying to draw a comparison to another moral panic. Maybe AI is fundamentally different, but something tells me we carve out every new tech as an exception to the moral panic pattern. But only time will tell… re: corporate ownership. To me, this is not a problem in principle. We can enact regulation, build open source models, etc. So, yes, currently a policy issue, but to me, this is extra-scientific: Basic scientists should be able to ask research questions and their *science* should not be questioned on *scientific grounds* because we don’t like the folks who own these companies. But I also realize that my views on separating activism from science is waning in support. Thanks for the thoughtful engagement! I look forward to chatting with you more about this.
I enjoyed this piece, Mickey. While my overall position on this issue is closer to Molly’s, I (and she I would imagine), agree that exploring what AI is capable of is a worthy goal. To the extent that it can help remediate certain problems (eg loneliness, self-exploration in a therapy sense, etc) that’s great. And I also agree with Paul and both of you that experiments run in artificial realms (things that, at least for now, might not replicate in the real world) have their place. But all that said, I do think that Molly’s point is an important one, and not just moral panic. The reason I say this is from my experience in the affective science literature. As you know, there has been debate for years about the ability to decode emotion from facial/body expressions. The data supporting this are really shaky, but industry has grabbed onto it and spent millions (and duped millions) into buying systems that supposedly can read people’s emotions and intent. To back up their “science,” the research done typically puts emotion coding on metrics that machines excel at. Pattern recognition, etc. But there’s little external validity to the stimuli, besides a smile or frown or change in blood flow that, in reality, is a very poor indicator of affective experience. So, on two levels, I see a problem here. One, the findings are suspect because of the impoverished nature of the stimuli/paradigm. Two, there’s a huge profit motive to justify it anyway. This criticism in no way applies to researchers like you and Daryl, who are asking and answering good scientific questions. It’s just to say that I don’t think Molly’s tech exec example is far afield. Industry will spin findings, and if we don’t make the limits and caveats clear, that’s on us.
I do also believe that Molly’s argument for the importance of embodiment has merit. I just saw a recent study where AI has now been shown to both become anxious in response to stimuli and , to adopt mindfulness to regulate that anxiety. In reality, this is simply AI learning to give appropriate responses to what it’s been trained on. It mimics emotion but is simply making predictions based on training. But that misses all the contextual mechanisms that come with a body — mechanisms we treat as error variance because we still don’t know what they are even though they have an effect, but which AI will studiously avoid and thus remove from the picture, thereby distorting reality.
All this is to say that while AI is in its infancy, we engage in irrational exuberance (or profit motive) at our peril.
Thanks for the thoughtful comment, Dave! You make some excellent points, and I think we're probably closer in our views than it might seem. Daryl and I are mostly pushing against a reflexive dismissal that seems more grounded in alarm than evidence. To dismiss studies because they rely on text forms of communication, for example, is a bizarre point. Let's criticize AI, let's say where it is a poor substitute for humans. But, please, let's not say studies are rigged because we are morally opposed to tech owners or because they use the same sorts of artificial lab studies we have used in our own non-AI studies. I appreciate the engagement!
Nice thoughts Dave! I agree that embodiment can matter for connection, and that face to face connection can matter so much. I suppose that irrespective if the ontological status of empathy without embodiment, we can still interrogate *perceived* empathy and how that makes people feel when receiving it from AI. We suggest that it's important to take participant experiences at greater face value, if we want to respect the range of social experiences people can construct.
I think you correctly point out that empathy expressed via the written word can be nourishing, significant, and genuine. There is nothing inherently inferior about it (which is not to deny that there are characteristic and significant advantages to the face-to-face embodied encounter). Are the loving epistles of Eloise and Abelard any less human or empathic because they are written down rather than spoken live? Of course not. Singling out empathy expressed in written form is a red herring.
That said, I think there are important shortcomings of the extant AI empathy studies that focus on written expressions. But Molly Crockett misdiagnoses the problem in the Guardian piece.
Here's my best attempt at articulating what I feel is missing. I'd love to hear your thoughts. The shortcoming of the studies is not that empathy expressed in written form is second-rate. The shortcoming of the studies is that it is very hard to determine whether empathy expressed in written form is real empathy. That is, it is very hard to tell whether such expressions of empathy issue from real feeling or not. The task, "respond empathically (in writing)" is a task that asks us to perform empathy rather than actually empathize. We can get very good at the performance without actually being good at empathizing at all. Consequently, when I compare two passages and rate them for how "empathic" they are, I'm really just assessing the virtuosity of a certain kind of performance. LLMs are truly virtuosic. I bet they would beat humans at the task of expressing empathy through a facial expression, too. [Here's a study idea: you get people to comparatively rate pictures of real human faces responding empathically to prompt and AI-generate images of faces responding empathically to a prompt.]
I imagine LLMs would outperform similarly for tasks like, "respond passive aggressively to this letter" or "respond in a way that expresses bitter envy" or "respond in a way that expresses incredulity". Would we conclude that performance on such tasks show that AI surpasses humans in the ability to respond with passive aggressiveness or outrage or incredulity? Would we even be troubled or fascinated with the capacity of LLMs to do well at such tasks? It certainly wouldn't be fascination of the kind that induces AI vertigo. So why should we be troubled and fascinated by the capacity of LLMs to produce empathic sounding written passages in response to prompts? The positive assessment of AI-generated expressions of empathy may have little or nothing to do with assessing whether whatever generated that performance is empathizing well, poorly, or not at all.
I think that the fact that people sincerely report feeling seen and heard by their Repliika boyfriends and girlfriends is pretty interesting. I would go so far as to say it's troubling and fascinating. These are extended, reciprocal, and richly contextualized encounters. These are the kind of contexts where it becomes hard to fake empathy, and where we care about more than a particular kind of virtuosic performance. But my sense is that the subset of humans who experience being seen and heard in such contexts is a small minority. Most people won't conflate the simulacrum of empathy with the real thing, and they won't feel seen and heard when they know the entity in question has no experience of seeing or hearing. That's my hunch, at least. But maybe as things progress we'll start to lose our grip. My intuition is that this would be a lamentable eventuality. I wonder what you think.
Great reflections, Jason. So to sum, you beleive AI can play-act/parrot empathy (and many other emotions and soon expressions), but that that does not mean it is actually empathic. I agree with this 100%. And in our paper, we sidestep this entirely, saying AI is not empathic (for now). The question then becomes not whether it can perform empathy (it can), but whether people legitimately feel cared for/heard by the performance of empathy. And may be what you are saying is that whether an agent can *be* empathic will determine whether people feel heard/cared for. So, perhaps the question of whether AI can generate/feel real empathy should not be sidestepped because this is the essence of feeling loved...but maybe also to love. I'm not sure yet, but I might agree with you. It's funny because as I received your message I finished this beautiful novel by Kazuo Ishiguro called Klara and the Sun about an artificial friend (AF). Spoiler Alert: At the very end the AF realizes that others could not truly love it, not because the AF is not lovable, but because others do not believe it is real. So, perhaps the heart of the matter is beleiving an agent is real, autonomous, and free. I guess we'll see how the next generation feels about interacting with AIs... Thanks for engaging!
Thanks for the comment, Jason! I'll echo Mickey that Klara and the Sun is a great novel on this topic, I actually have drawn on a quote from it for a few AI-empathy talks recently, on limits of empathy.
I'd like to interrogate the point about performance vs. interior feeling. I think we'd all agree that human emotions are in many respects dyadic, interpersonal in their function and form. Empathy as an expression conveyed from one person to another, the generative act and experience of the expresser matched by what is received, feelings of being heard. It's an open question whether expression + internal experience is felt as more real or empathetic than expression alone. We could ask this of humans too -- we can think of examples where humans maybe don't internally empathize as much as they express in conversation to another person, for example.
Put another way, when we judge a person's empathy, a human that is, I wonder to what extent we are really judging their felt experience vs. performance of it, given the dyadic nature of empathetic expression. We might make inferences about interiority, but I wonder if they're guided by the performative behavioral expression.
And then taking a step back, I wonder to what extent this makes us reflect on what we consider the boundaries of "empathy" as a construct. In everyday perception, to what extent do we draw inferences about its presence vs. absence (here, really engaging with the "does AI truly empathize?" question), based on limited behavioral observations and then guiding to the best inferences we can about internal states. In this way, I think the human-AI interaction work can really guide us in refining how we think about our empathy more generally.
Really enjoyed this! Wanted to share a few of my thoughts:
First, is the telephone really the right analogy for AI? Phones, video chat, the printing press -- these are pieces of tech that mediate human communication without replacing it. Yet, almost all the use cases right now for AI in the loneliness/empathy space seem to be focused on replacing human interactions. This seems categorically different than most of the technology you highlighted.
I also totally agree with your point about the importance of lab experiments, and the AI studies you mentioned have been important proof of concepts for our ability to benefit from or connect with AI in ways we might not have expected. But, the vast majority of these studies are focused on asynchronous, or extremely brief interactions, and only look at the immediate emotional impact of the interactions with AI - meaning that we actually know very little about the long-term consequences of AI empathy/companionship. I know you mentioned this, but it seems like an absolutely critical asterisk around the conclusions we are making about AI right now. If we were studying the impact of potato chips, and we only focused on the immediate term impact, we might conclude that potato chips are perfectly fine meal replacement since people seem to be pretty happy every time they eat one, but this would mask serious long-term consequences we'd be totally blind to.
I also think it's impossible to divorce research on the impact of empathic AI from concerns about these companies motives. Unfortunately, these companies are not driven by concerns about our well-being, and the vast majority of people will be interacting with AI created by these conglomerates. There are a number of decisions that Replika has made for example, that seem optimized for engagement rather than wellbeing. Unfortunately, this means that any given study on AI companions will be divorced from the real-world, given that it's unlikely any given company will be designing these things in the exact same way a well-intentioned researcher would.
Thanks for the thoughtful post!
Love these thoughts, Dunigan! I think I agree with nearly all of them, especially the part about the need for longitudinal studies. As far as I know, there are only a few of these out there now, and this is definitely the next step. (In fact, I was just asked to review such a longitudinal study not a few hours ago!). I also think more first-person studies are needed, not third party evaluations. I also agree the telephone isn’t a perfect analogy. We were trying to draw a comparison to another moral panic. Maybe AI is fundamentally different, but something tells me we carve out every new tech as an exception to the moral panic pattern. But only time will tell… re: corporate ownership. To me, this is not a problem in principle. We can enact regulation, build open source models, etc. So, yes, currently a policy issue, but to me, this is extra-scientific: Basic scientists should be able to ask research questions and their *science* should not be questioned on *scientific grounds* because we don’t like the folks who own these companies. But I also realize that my views on separating activism from science is waning in support. Thanks for the thoughtful engagement! I look forward to chatting with you more about this.
I enjoyed this piece, Mickey. While my overall position on this issue is closer to Molly’s, I (and she I would imagine), agree that exploring what AI is capable of is a worthy goal. To the extent that it can help remediate certain problems (eg loneliness, self-exploration in a therapy sense, etc) that’s great. And I also agree with Paul and both of you that experiments run in artificial realms (things that, at least for now, might not replicate in the real world) have their place. But all that said, I do think that Molly’s point is an important one, and not just moral panic. The reason I say this is from my experience in the affective science literature. As you know, there has been debate for years about the ability to decode emotion from facial/body expressions. The data supporting this are really shaky, but industry has grabbed onto it and spent millions (and duped millions) into buying systems that supposedly can read people’s emotions and intent. To back up their “science,” the research done typically puts emotion coding on metrics that machines excel at. Pattern recognition, etc. But there’s little external validity to the stimuli, besides a smile or frown or change in blood flow that, in reality, is a very poor indicator of affective experience. So, on two levels, I see a problem here. One, the findings are suspect because of the impoverished nature of the stimuli/paradigm. Two, there’s a huge profit motive to justify it anyway. This criticism in no way applies to researchers like you and Daryl, who are asking and answering good scientific questions. It’s just to say that I don’t think Molly’s tech exec example is far afield. Industry will spin findings, and if we don’t make the limits and caveats clear, that’s on us.
I do also believe that Molly’s argument for the importance of embodiment has merit. I just saw a recent study where AI has now been shown to both become anxious in response to stimuli and , to adopt mindfulness to regulate that anxiety. In reality, this is simply AI learning to give appropriate responses to what it’s been trained on. It mimics emotion but is simply making predictions based on training. But that misses all the contextual mechanisms that come with a body — mechanisms we treat as error variance because we still don’t know what they are even though they have an effect, but which AI will studiously avoid and thus remove from the picture, thereby distorting reality.
All this is to say that while AI is in its infancy, we engage in irrational exuberance (or profit motive) at our peril.
Thanks for the thoughtful comment, Dave! You make some excellent points, and I think we're probably closer in our views than it might seem. Daryl and I are mostly pushing against a reflexive dismissal that seems more grounded in alarm than evidence. To dismiss studies because they rely on text forms of communication, for example, is a bizarre point. Let's criticize AI, let's say where it is a poor substitute for humans. But, please, let's not say studies are rigged because we are morally opposed to tech owners or because they use the same sorts of artificial lab studies we have used in our own non-AI studies. I appreciate the engagement!
Nice thoughts Dave! I agree that embodiment can matter for connection, and that face to face connection can matter so much. I suppose that irrespective if the ontological status of empathy without embodiment, we can still interrogate *perceived* empathy and how that makes people feel when receiving it from AI. We suggest that it's important to take participant experiences at greater face value, if we want to respect the range of social experiences people can construct.