Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
#29 Justin Tiehen: Why AI Can't Make a Promise—The Hidden Limits of Large Language Models image

#29 Justin Tiehen: Why AI Can't Make a Promise—The Hidden Limits of Large Language Models

AITEC Philosophy Podcast
Avatar
37 Plays6 days ago

Have you ever felt like ChatGPT genuinely understands you? What if the reality is that it doesn't even have the foundational capacity to "speak" to you at all?

On this episode of The AITEC Podcast, Roberto Carlos García and Sam Bennett sit down with philosopher Justin Tiehen (University of Puget Sound) to unpack his fascinating new paper, LLM's Lack a Theory of Mind and So Can't Perform Speech Acts--A Causal Argument.

Justin takes us on a deep dive into the philosophy of mind to explain why current Large Language Models, despite their impressive output, are essentially just faking it. We explore why next-token predictors are completely missing the causal architecture required to have a "Theory of Mind," and why, without that, they are fundamentally incapable of making assertions, giving orders, or performing true speech acts.

Key Takeaways from this Episode:

  • The Ladder of Causation: Why AI is stuck observing statistical correlations and cannot grasp true causal interventions or counterfactuals (drawing on Judea Pearl’s work).
  • The Speech Act Problem: Why performing a true "speech act" requires the deliberate intention to influence another person's mind.
  • Cheating the Benchmarks: How LLMs "cheat" on psychological exams like the Sally-Anne false-belief test simply by memorizing statistical patterns in text.
  • The Threat of AI Blackmail: What it would actually look like if an AI possessed a Theory of Mind and strategically tried to manipulate human behavior to achieve its goals.

Whether you are deeply invested in the philosophy of language or just trying to figure out how much you should trust your favorite AI assistant, this conversation will completely reframe how you view generative AI.

Learn more about our work and join the conversation at ethicscircle.org.

Recommended
Transcript

Introduction to Justin Teehan and Episode Theme

00:00:16
Speaker
Welcome back to the ATEC Podcast. I'm Sam Bennett and alongside Roberto Garcia. Today we are interviewing philosopher Justin Teehan. He works on philosophy of mind and metaphysics at the University of Puget Sound in Tacoma, Washington.
00:00:30
Speaker
Today we're going to discuss some of his ideas from a new paper he has called LLM's Lack of theory lack a Theory of Mind and So Can't Perform Speech Acts, A Causal Argument. So thanks so much for coming on the show, Justin.
00:00:46
Speaker
Yeah, thank you. that The title is sort of a mouthful. It's long. Sorry. um I should have got it. ah No. Okay. So,

Justin's Philosophical Journey

00:00:54
Speaker
yeah. ki So just to start off, could you tell us a little bit about yourself, you know, where you're from, how you got into the philosophy, that sort of thing?
00:01:02
Speaker
Sure. So I'm originally from Nebraska, very close to Omaha. And then when I was an undergrad, i I wasn't really sure what I wanted to do, but I sort of thought, well maybe maybe law school is a thing to go to. um and then I remember having a conversation with my dad. I said, what's better for that? Should do English or philosophy? thought philosophy is a lawyer. you thought philosophy is better.
00:01:21
Speaker
um But then I took the philosophy classes. And i did when I did, i really fell love with my 17th and 18th century philosophy. So I was talking about like Descartes and Hume and Kant. um And it felt to me, I remember thinking almost feels tragic that going to have to leave that behind and go study law in little bit. So I decided to stay with it, went to grad school at the University of Texas.
00:01:42
Speaker
And then I'll just say one thing that's going to sort of relevant. And so there I became, I wrote my dissertation on Fless Your Mind and Mental Causation at the time. And it's through the causation link that then I started to become eventually, i mean, this was years later, but eventually interested in the AI stuff. and And so when you mentioned that there's a causal, art what I'm giving is a causal arguments paper, that that causal connection goes back to to the work was doing for my PhD.
00:02:08
Speaker
Okay, cool. I feel like a lot, I don't know, i wonder if, so for me, my early modern class was also very impactful. ah I wonder if that's true of a lot of, because I'm just thinking anecdotally, I feel like a lot of people really do get a kick out of that class.
00:02:25
Speaker
who Or maybe those who end up getting into philosophy get a kick out? I don't know. anyway Yeah, yeah there' a good I can see there's a question. It's a selection of effect. it's Some students are bored, but the ones that do really get into it.
00:02:36
Speaker
i i I really did enjoy it. I just taught i actually taught um my own version of because i'm not I'm not an early modern non historian, um but but our historian at our University of Houston left, and so I got the chance to teach it for the first time last year. I just i do think it's really it's fun and engaging material.
00:02:53
Speaker
um And some of the core ideas of it, it's it's i do think i think I think you can get most students pretty worked up about it. Yeah. I mean, if you think about like the Cartesian, you know, ah mental, like those are kind of experiments or the thought experiments, I feel like those are fairly exciting. And so if you compare it to other philosophy classes, it's maybe has a little bit more of like a excitement level to it or

Philosophy of Law vs. Early Modern Philosophy

00:03:18
Speaker
something. Yeah. anyway
00:03:21
Speaker
Well, I mean, if we're sharing, ah it was a philosophy of well-being eudaimonic ethics was the one that got me. It sounds like we have a similar history because I also asked my dad and he also said, well, you're going to be a lawyer, so go go do philosophy. Oh, that is funny. I mean, I think that's, I do think um that also is definitely not a universal, but a common enough trajectory or path. um The lawyers are kind of a little bit either disenchanted or not sure about it very often in the philosophy.
00:03:50
Speaker
Yeah. And my philosophy of law class was really, really boring. So I thought, yeah, let's, let's do this philosophy route, to our ethics route. That's funny. So the early modern being super exciting, plus the philosophy of law, super boring. It's taking away, you know, people from the lawyer trajectory.
00:04:08
Speaker
Yeah, that's that's how it was. um Okay, so today we're going to be talking about this. ah Can large language models like ChatGPT, like my new favorite, Gemini, can they perform speech acts?

Understanding Speech Acts

00:04:26
Speaker
So there's ah we're going to unpack a whole lot for listeners. Let's start with what a speech act is. And yeah, you can also maybe tell us why, how you got interested in thinking about speech acts.
00:04:42
Speaker
Sure. So as speech acts, it's sort of sort the the focus on how this is how we can we can do things with words. We don't just sort of make words, but we can make assertions or I'm telling you something is the case. The Seattle sue hots Seahawks won the Super Bowl.
00:04:54
Speaker
Or we can ask questions who who won the Super Bowl. Or I can issue orders. um buy me a t-shirt for the team that won the Super Bowl, you know, all these sorts of different acts that we can do. And part of sort of the theory is both you want a theory of the meanings of words, but also you want a theory of um the sort of the sort of things, the sort of actions we can perform with words.
00:05:16
Speaker
So that's what speech acts are. in In the paper, um and sort of this is fairly common, is the the speech act that sort of oftentimes becomes so like the focus, the first one to go to is assertion.
00:05:28
Speaker
So with a certain assertion, you're trying to sort of, you're asserting or claiming that something is the case, that it's true, um as opposed as opposed to asking questions or commands. um i mean ah The paper talks about other speech acts too, but assertions sort of the go-to example. And so the question then is,
00:05:43
Speaker
um it's It's totally clear, it's totally uncontroversial that chat GPT or other large language models can generate strings of words, include including words like, it is ranked.
00:05:55
Speaker
And then the question is, when it does so does that rise to the level of an assertion in the way that human beings can make assertions, can stake that claim and sort of take on commitments to it and so on?
00:06:07
Speaker
or Or is somehow something something missing? So it makes those words, but it's not it's not a real assertion. And how did I get into it? I'll say one very thing quick thing is i am not i mean i have interest I have some interest in the philosophy of language and some of the stuff that I've worked on where time has implications philosopher language, but I'm not in the first instance really a philosopher of language. I i am interested especially in philosophy mind and in in causation and causal reasoning, but it was just that like, oh, i was starting to see that the views that i has was defending regarding causal reasoning abilities in large language models
00:06:39
Speaker
would have implications for this, this otherwise separate debate about whether they're able to perform speech, speech acts like make assertions. Yeah. That makes sense. Yeah. And then like just a preview a little bit, right. It's like, it turns out that if, um,
00:06:55
Speaker
or or it seems plausible that if a language model can't understand causal relations, it might also not be able to perform any speech acts. And so that's kind of like the connection you're going to draw. out um Really quick. So when it comes to speech acts, um like the kind of examples you might think of are often interpersonal, um you know, doing things with words, shaping someone else's mental states. But like, is that strictly speaking, is that personal interpersonal element required? Like, um
00:07:26
Speaker
like that's, it's it's still considered a speech act to like promise something to myself silently or like form an intention in inner speech, right? Just real. it It is, yeah. So it doesn't um it doesn't require um that that it sort of be directed at someone else.
00:07:45
Speaker
I think, I mean, there's there's some debate and views about exactly what's required for speech acts, but I'd say in this case, if you're sort of directing toward yourself, I think the view that I'm defending would still require the following, that you would need something like a theory of mind, and the relevance here is like, well,
00:08:01
Speaker
could you be something that has a mind without really knowing what minds are, including in sense, not even, not even your own mind. It's like, I don't, maybe that's coherent. And if it is, then, then I think maybe

Theory of Mind and Causal Reasoning

00:08:12
Speaker
you couldn't um make assertions or maybe you couldn't sort of do even sort of these first person directed speech acts. But if you have some grasp of like, well, no, I i know, I know what minds are. I know what my own mind is.
00:08:22
Speaker
Then I think, I think, Then you're in the clear, as far as my argument argument goes. And and you could you could make a promise to yourself, or you could sort of talk yourself up. or you can you mean you can you can i think you can do all, basically. I wouldn't say all, but a wide variety of speech acts directed at yourself.
00:08:39
Speaker
Okay, that's helpful. Maybe maybe let's, see you know, not to jump too much, like probably should start maybe with like... the idea of, okay, what are some cases where of speech acts where it's like pretty obvious that you're going to need a theory of mind and we can go back to maybe those like weird ones where you're you know silently promising something to yourself. But yeah, just, just, can you, can I, can you kind of flesh out that idea of like, what are some cases where it really looks like, yeah, you're going to need a theory of mind in order to perform speech acts? Cause that's obviously going to be crucial to your argument for why. Anyway. Yeah.
00:09:18
Speaker
So, so first I'll just, even sort of before that, just one sort of quick thing I'd sort of say is the way, a standard way that I'm thinking, standard way of thinking about theory of mind and sort the way that I'm taking for granted, but that's not totally controversial, is that you need, somehow we need to sort of be able to grasp or represent that other, other agents or, or self can have beliefs, desires, pains, fears, hopes, all these different sorts of mental states that,
00:09:44
Speaker
And the idea is that the way we do that is by forming a theory, a kind of causal theory about what those mental states do. So the causal theory could include stuff like, oh, if you stomp on someone else's toes, then they will feel pain. Or if they're in pain, they'll cry.
00:10:00
Speaker
Or if they have these beliefs and desires, they'll go get chocolate cake. And if they have those other beliefs and desires, they'll go get something healthy. um So that's when I talk about theory of mind, the sort of fundamental thing is, do you have even a grasp at all that?
00:10:13
Speaker
on the the notion that others around you will have beliefs, desires, and so on. And then the idea is for speech action, when we're often sort of using speech, we're not just sort of speaking into a vacuum. We're not just sort of producing words for their sake.
00:10:26
Speaker
We're doing so, communication with the intention of somehow affecting or changing the mind of the person that we're talking to. so if i if I make a request to you to to bring me a glass of water, i want you to understand, understanding is what I said,
00:10:41
Speaker
form form beliefs that that Justin wants water. Maybe sort of, in terms you have the desire to go get me water. so I'm trying to, and and try that in a way, sort of manipulate or at least causally influence your mind so that you could then potentially follow this request that I had and bring bring me water. And i think, I mean, that the same will be true and that was sort of a directive. i was giving the directive, bring me water. and The same will also be true in assertions. so If I say it is gray in Tacoma today,
00:11:11
Speaker
In asserting that, one sort of sound review part what I'm trying to I'm trying to causally influence your minds to form the belief that it is gray and tacoma.
00:11:23
Speaker
But in order for me to do that, in order for me to produce those sounds with that intention, i i need to know what beliefs are and that you're the kind of things that can have beliefs.
00:11:34
Speaker
um So in that sense, I need a theory of mind to make that assertion and all that. So um let me try to just kind of ah say it back to you and and to to make sure that I understand it. But um when you say to me, ah it is, you know, I ask, hey, how's the weather? And you say, it's actually pretty gray here in Seattle today.
00:11:54
Speaker
You're trying to put a thought in my head, but the only way that you can really have the intention to put that thought in my head is that you know, A, that I'm the kind of being that has thoughts.
00:12:06
Speaker
And that through this particular action, you can put a thought in my head, right? Give me a representation of what it's like in Seattle. And so absent that um theory of mind, that that recognition of mental states, and you wouldn't be able to have that intention. Did I?
00:12:23
Speaker
that's That's, I think everything you said sounds exactly right. Yeah. So it's, and just one other sort of way to run at this interesting thought, the same point, is that much of what we're doing in communication is we are trying to, again, the term I keep falling back to is manipulate, but that sounds like more negative than what I mean. it's You're just trying to causally influence the minds of the people that you're in communication with.
00:12:45
Speaker
Trying to some somehow change what their current mental states are into something else, where that could just be like they're updating their beliefs or they're forming new desires, desires to try and help me out or whatever. um But if if so, if that's like a way what all these speech acts are fundamentally about, then in order for me to perform those speech acts, I need to be aware of, ah but i can't be oblivious to beliefs, desires, can't be totally just totally ignorant of them, whereas I couldn't form that intention to try and influence your beliefs and desires.
00:13:14
Speaker
can we go back to the order? So it's like when you order, so someone might think, um, like without anyway, someone might be like, Oh, you know, if I order, if I say to you, pass me the salt, it's, you know, I'm just aiming to have salt next to me. Yeah. Um, that's it. You know, i just want to have salt. Whereas what you're saying is like, there's this element of, um,
00:13:39
Speaker
theory of mind going on. In other words, theory of mind being like, i guess it's like, yeah, the ability to represent other people's beliefs, their desires, their intentions. um There's actually an element of theory of mind going on in that order.
00:13:54
Speaker
So it's, instead of it just being like, pass me the salt, you know, what I'm aiming at is having salt right here. It's actually like, when I give that order, pass me the salt, I'm thinking about how,
00:14:08
Speaker
I want you to believe that I need the salt and therefore I want you to then have the intention of giving it to me?
00:14:19
Speaker
Or something like, yeah. So can can we just go over that example little bit more? Yeah. So let me just go with the salt example. So imagine imagine there's sort of a being who just says the words, can't remember if it was bring me salt or give me salt, something like that. and and And salt turns up next to them somehow or other. Maybe it wasn't any agent or person who have brought them.
00:14:39
Speaker
To be clear, i think I think there's a sense in which, well, they might, that... agent might just be satisfied. It's like, well i got I got what I wanted, and so all this all is good. And so in that sense, it's success. think a way to sort of to of sort of to take on why theory of mind is often essential or sort of important here, it's like, well, in general, just producing sounds, even the sounds, i mean, you inevitably you're going to hear them as sounds with meaning, but sounds like bring me solved, it's like,
00:15:05
Speaker
there's not in a sense, an obvious connection between that as a cause, pure sounds and the effect of there's salt next to me. It's it's like, a what was the causal path by which you got from sounds to salt next to you? Oh, here, here's the causal path where now it seems like less. someone else is like yeah Yeah. There's someone else who knew what my words meant.
00:15:25
Speaker
had something like a desire to help me out or acquire that desire once they sort of saw what I was asking. And so then, and so then that's how this all gets to me. So I don't, it's not that that's like an absolutely logically necessary connection that the only way I could get solved is, is if you understood me,
00:15:41
Speaker
But it just seems like my the means to the end that I'm taking, the goal that I'm taking of producing sound waves of the sword, it just seems like a real shot in the dark, unlikely to bring about the effect of salt nearby unless there's beings with minds around me who can hear, understand the words, and try and follow them.
00:15:57
Speaker
Okay, so yeah so we've kind of covered a little bit the idea that like in order to... yeah We've given some sense of why you might think that in order to perform a speech act, you would need a theory of mind.
00:16:10
Speaker
Um, now your, your thought is that, or my understanding, you know, is that like, well, it, it looks like large language models will not be able to have a theory of mind. So in other words, it looks like large language models are not going to have the ability to represent other people's beliefs, desires, intentions, and to understand, um,

Limitations of LLMs in Understanding Causation

00:16:39
Speaker
you know,
00:16:40
Speaker
Other people's behavior as caused by these mental states. It looks like um language models will not be able to have a theory of mind. And the reason is there's pretty good arguments for saying that language models can't understand causal relations.
00:16:57
Speaker
In other words, they don't get causality. And I guess, yeah, your idea is like, if you don't get causality, you're never going to have a theory of mind. um is that Is that right so far? Yeah. just Yeah. there's So there's two steps in it. And I want to acknowledge um both of those steps are controversial so that they they could be denied as by someone else.
00:17:17
Speaker
The first step of it, um which was was there's also what was first in my thinking, so first in sort like a temporal order, was I was trying to sort of work out an argument suggested by Judea Pearl in the Book of Y and some other of his papers. Judea Pearl is a computer scientist who works a lot on causation, where he was suggesting deep learning models in general would have troubles with causation. And large language models are just a special case of deep learning models. So you would have, at the time he was writing it, I don't know, I think the Book of Y, I think it was prior to... um
00:17:50
Speaker
GPT, but I think it had this implication that they would not they would struggle with causation. So the first thing is to try and in my mind, because a lot of times his remarks, that's a sort of popular book, it was sort of like not 100% clear to me, trying to figure out what the argument was and trying to develop the best version of that argument.
00:18:06
Speaker
But that's the first step, and I can try and go over whether that works or not. But then the second step suppose, at least initially, we grant that that large that deep learning models and therefore large language models somehow struggle to understand causation.
00:18:20
Speaker
Then a further implication is like, well, wait, then they're not going to have there's going not going to have various capacities that are connected that you need causation in order to have, one of which is theory of mind. At least, and this part is controversial too, but on a standard view of theory of mind,
00:18:38
Speaker
What it is to have theory of mind is people, children, and others have an implicit causal theory in our head. Somehow we're representing causal relations between mental states, behavior, inputs through perception. And unless you had a causal theory, you you don't know what beliefs desires are.
00:18:58
Speaker
And if that's if that's right, then given my prior argument that LLMs will not have be able to grasp causation, it follows that they will not have theory of mind. And then as a final thing that follows is it follows further that they won't be able to perform speech acts because speech acts requires a theory of mind.
00:19:17
Speaker
um Maybe we can um start to unpack ah those those two steps. I suppose we can start with Judeo Pearl's um points, right? So Judeo Pearl is a... ah um I know you you didn't come on to talk about all his views on AI, but it's it's very interesting. I remember when I read the book of Y that he says in there that... um You can't have ah intelligence in the in the general sense, like artificial general intelligence, without giving the AI um like the illusion of free will. It needs to feel that it it can intervene in the world, right? And AI can't feel that it can intervene in the world unless it can, you know, ah it understands causations, like my action
00:20:06
Speaker
can cause something different to happen. And so all of that came down to AI don't understand, ah you know, causal reasoning beyond a certain threshold.
00:20:17
Speaker
So maybe you can tell us what is Judea Pearl's view of causation. And then after that, we'll see how far AI can can go up that ladder of causation. Yeah. um So here's, he, he in in that book and in some of his other sort of more formal work, ah he develops what he calls us this ladder causation. So what it's, in the first instance, it's supposed to be like a ladder of different formal languages that are capable of representing different features of the world.
00:20:48
Speaker
And so that the bottom level, um which is still, it's going sophisticated, but it's it's the most simple level It's you can represent variables, um which which could just be um that like, cloud is it cloudy or not? And does that cause rain? is that Does that cause the grass? All those things can be things you can represent by variables. But but um at this bottom level, you're not representing causal relations. You're just representing purely statistical correlations between things. When one thing shows up, does another variable tend to show up? Yes or no.
00:21:19
Speaker
um and especially conditional probabilities. Conditional on X, what is the probability of Y? And those sorts of probabilities are often um evidence of a causal connection, but Pearl thinks like real causation, it it doesn't really belong at this first, it doesn't take place at this first level. Instead, it really sort takes place at the next level, which always has to do with interventions. And he and other author authors are sometimes viewed as having an interventionist account of causation, Which is somehow that is, you don't just sort of passively observe patterns of correlations in the world.
00:21:54
Speaker
You actively intervene. You change something, oftentimes in a way that maybe you haven't observed before, to see what happens. And depending on how you're representing interventions, some sometimes you'll do one intervention, sometimes you'll do another.
00:22:07
Speaker
And then I'll try and make that concrete with an example in just a second, but let me get to the third level. the third level is, the final one and this is going to be the most sophisticated, so it's like the most complicated ah formal language, is counterfactuals. And especially with philosophers, we're used to counterfactual theories of causation, so that incorporates elements of this. So with counterfactuals, you're thinking like, oh um if if it had been sunny today, i would be playing baseball. well It's not sunny, but i that might that might be true.
00:22:33
Speaker
um And so when Pearl has that sort of line about how you need to give machines sort of a bit of free will, I think underline it is like what he really is saying is like what you need is they need to have a sense of counterfactual. It's like, oh, i hear it and he's like, I think in the story, it's talking about machines that are playing like a soccer game and they lose. But the machines might think, wait, if we had done this strategy instead, we would have we would have won. and that is sort of free will-like because you're thinking you you're thinking you have alternative paths.
00:22:58
Speaker
um But it's especially sort of this this counterfactual ability to not just focus on the world as it is, but to imagine the world as it as it as it could have been I was going to just go over one example of the lateral causation and then we can go over that. but um So the the three rungs, the first one is just observing correlations. So I have, let's just say, I noticed that people that don't eat wine and cheese after dinner all the time are thinner.
00:23:27
Speaker
And there's just a correlation there. You skip the wine and cheese, you're thinner. Then ah the next stage is intervening. I could say, I'm going to not have wine and cheese anymore after dinner so that my weight would be you know reduced. And so that would be me thinking in an interventionist way. And so that's the next inference I guess I can make about causation.
00:23:49
Speaker
and then um And then it works, lo and behold. And I thought, I could think to myself, had I done this like five years ago or whatever, ah you know, counterfactually, I didn't. But um had I done this, been doing this for amount of time, i would have been lower in weight that entire time. that Those are the three rungs? Okay.
00:24:10
Speaker
Yeah, I think that that sounds good. and then I'll just, I'll give you, um I'll give you one other example, just just um sort of that also sort of will fit in with the the discussion the paper.
00:24:21
Speaker
But it's, so here's here's a real statistical correlation that the social scientists have others have noticed. um The number of books in a child's home correlates with their academic success by their grades and so on. And so that's that's not a causal claim. That's just purely statistical correlation. and so it's a claim at that bottom level of the ladder.
00:24:40
Speaker
But now, and here's here's the reason I went to this one, part of what's going to be crucial but to Pearl's argument to my argument is going to be this idea that what happens at that battle bottom rung, it underdetermines, as like philosophers would call it, what happens at the higher levels. Meaning there's different causal theories, different causal hypotheses you could put in at, for example, the interventional level, that intermediate level.
00:25:03
Speaker
that both equally fit with that statistical correlation. So in particular, in this case, here's one causal hypothesis. Maybe the reason kids with more books and their homes are doing well is because they're they're reading those books. And so the books cause their sort of academic um accomplishment.
00:25:18
Speaker
But here's here's a different causal hypothesis. It says like, no, maybe what happens is there's like a confounding cause, meaning <unk>s Something like, oh, there's certain kids who are bookworms, and those kids both get a bunch of books in their home, their parents do because it's genetic, and also those bookworms do well in school.
00:25:38
Speaker
And so here would be the question. We know the statistical correlation holds, but what sorts of interventions could we do to take advantage of it? is will this Here's an example. Will this be an effective approach?
00:25:50
Speaker
thing to do, go hand out a bunch of books to all the kids in our classes and we'll see their test scores rise. If you had the first causal hypothesis where books are causing performance, you'd say, yes, that seems like a good thing to try and do.
00:26:04
Speaker
But if instead it's like, no, there's this bookworm gene, is said well the the gene is not going to be changed by handing out the books. And so even when you when you hand them out, it's not going to help those kids who are not doing as well. so So the two causal hypotheses are equally consistent with the underlying statistical correlation until until you actually intervene and act.
00:26:22
Speaker
And then you can you can also, you can think, well, wait, what if what if we had handed out books five years ago? How would kids be doing today? And you're considering a counterfactual.
00:26:31
Speaker
That's good example. And so, like, it's, you know, kind of part of your argument that
00:26:40
Speaker
even if um large language models understand conditional probabilities, you know, stuff like, um yeah, if you grow up around a bunch of books, you're likely to succeed at school.
00:26:57
Speaker
Even if they have a really strong understanding of conditional probabilities, um that's just going to tell them what tends to go together and it'll never get them to a causal understanding.
00:27:09
Speaker
Right? That's, yeah. So that's, so, so this is in this stage, what I'm about to say is me trying my best to sort of recreate Pearl's argument as, as I understand it.
00:27:21
Speaker
and what the argument is, there's a sense in which it's it's not really built on the specifics of deep learning. It's not really built on the specifics of large language models. It's built on the nature of these different formal languages. And what it says is the following.
00:27:33
Speaker
Statistical information, merely statistical information where you're not including causation, you can never go directly from that to any particular causal hypothesis because there will always be two or three or five or ten multiple different causal hypotheses that all equally fit that.
00:27:49
Speaker
the statistical correlations you've observed. So in in the in the books case, that pattern we saw of ah kids with books on their home do well in school, that's going to equally fit the two causal hypotheses gave, plus also other causal hypotheses.
00:28:05
Speaker
And so Gnosis, and whether it's in a a human intelligence or an artificial intelligence, can ever at least deduce causal claims from statistics alone, how So how how do we do it? um How do human beings do it?
00:28:20
Speaker
At least with human beings, part of the case might be like, well, maybe maybe maybe this is not going to be only, but I'll focus on this. Maybe human beings have in something innate. Innate causal knowledge, not of the exact causes, but that there are such things as causation and sort of general causal hypotheses.
00:28:36
Speaker
um And so some people some people think like... They see sort of an inspiration, maybe kind of in Immanuel Kant, who thought we had this category or this concept of causation. So maybe human beings have this concept of causation. And then the problem with deep learning models, according to Pro is although maybe in principle, you could build into them, into the model,
00:28:58
Speaker
what what what you would call an inductive bias, meaning like something that sort of is innate and not learned from the data, but that you're just sort of like the programmers are hand wiring in. You could build into them something that's causal-like, like our innate concept of causation.
00:29:13
Speaker
As it turns out, in fact, all of the the deep learning models that we have, the ones that work best, they don't have that sort of innate structure. They're they're closer almost to like an empiricist blank slate.
00:29:23
Speaker
And then the problem for them is, well, like they they they can then become really, really good at learning statistical correlations in the data they give them. But it's just impossible, its supposed to be impossible in principle for them on that bat basis to either deduce um what the causal connections are, what the causal hypotheses are, and they can't even really in induce them or use induction to figure them out unless they have these in innate causal assumptions that they they actually just don't have. So that's what we're talking about. So one thing I'm wondering is like, what if someone's like, oh, um you know, but where did we...
00:29:59
Speaker
get this in a causal conceptual structure, like presumably was selected by evolution. in, in, so didn't we build it up from just a bunch of statistical associationist type information? And so, you know, what would stop language models, you know, from somehow bootstrapping their way into, like,
00:30:28
Speaker
causal conceptual understanding from just statistical information. i don't know if that makes sense, but I'm just kind of curious what, yeah, what do you say to that kind of like thought? Yeah, so so yeah i think we said I think this part is a sort of shortness sign, but i just want to make this part explicit, is the nature of this argument, the nature the entire argument in the paper and the argument that I'm i'm reconstructing for Perl.
00:30:50
Speaker
In both cases, although in a sense, it's i mean it's a kind of anti-AI argument, but it's not. it's It's not. I mean, so here's what it is. it's a It says there's a limitation on AI models as they currently exist, so large language models and deep learning models.
00:31:04
Speaker
But it's very much open to the print the idea that at least in principle, you could have AI models that could do causal learning. And Perl is very much committed to that. And that maybe even something not something kind of like the architecture of deep learning models could have it if you build in this causal power. There's Yasha Benjo, who's a big leading AI researcher, one to sort of the central figures in deep learning. He has a paper, a couple papers, where he's sort of sympathetic to Perl and where he suggests like, oh, maybe what you need to do, there are limitations on the large language models as of at least a couple years ago when he wrote them. and Maybe what you need to do is to give it a kind of causal prior that so far that we we haven't done. so So first thing is, again, I want to be clear, if if the process you're imagining when you're suggesting is, could you sort of take what we have now, modify it somehow in a way that's kind of analogous to evolution, and then it could have causal reasoning, it's that that I think you could. Yeah.
00:31:58
Speaker
and And so here's here's almost like something that was in some way similar to to evolution is humans, human beings like Benji and others could just take them out as we have, build in these causal priors, then let them go. if it if it worked, then then that would yeah that would be like you're selecting for it. Yeah. And then then your argument, basically then, like kind of the insight of your paper would be, look, that would be the only way to achieve um speech acts. In other words, the only way you're ever going to get a language model that is capable of performing speech acts is if um it were to have um this kind of causal machinery, causal conceptual structure built into the language model. But you're kind of saying... you know Although you know it is the case that the ones that have that built in currently are not the most successful. So um maybe then they wouldn't.
00:32:51
Speaker
Well, anyway, I don't know. But yeah, I thought it was helpful, yeah. Let me give you one other, here's a metaphor um for also thinking about my argument. And this metaphor occurs to me because so I am in the first place a foster mind who's thought about consciousness and stuff like that.
00:33:06
Speaker
um But here's a claim that sometimes people make in the foster mind. They say like, look, even if you knew, even someone who knew the totality of all the physical facts, about physical truths about the world, that by itself wouldn't allow you to know the phenomenal or the conscious truth. So that's, that's like, for example, the basis of like the marriage room argument by Frank Jackson.
00:33:23
Speaker
Mary in a room knows all the physical facts about what happened to color vision, but she doesn't know what it's, what it's like to see red. And in that case, the significance of it it's supposed to be like these, these truths about color, about conscious experience. They're just a separate set of facts, separate set of truths that you cannot deduce from the physical truths.
00:33:41
Speaker
The reason I mentioned that analogy is i think something like that holds for Pearl's causal ladder, if he's right, where it's almost like if you had a being that just could know all of the statistical, purely statistical, purely association with truths in in the world, that alone wouldn't allow it to deduce or figure out the causal truths, the causal truths are sort of like the consciousness truths on this analogy. it would need to make sort of a jump or inference and it just, it just couldn't do it. So that's, Pearl's meant to be in-principle claim about learnability, what you can even learn from one side of another truths.
00:34:13
Speaker
And then the thought is, oh, large language models, because they're not starting with these causal assumptions, there's no way they can just learn their way out of it. No matter how good they are at learning statistical correlations, there's not a path from learning that to causal hypotheses.
00:34:27
Speaker
So I know that you don't ah address this, but it seems to me that in that case, current ah large language models with their purely associationist capacities, there's sort of like an upper limit to how useful they'll be in scientific research. And it's only when we can get like the, you know, because some AI people dream about like, you know, AI powered science and, but until they can actually,
00:34:55
Speaker
ascend up the ah later of and ladder of causality and and make those kind of you know counterfactual inferences, they can't be super useful. You can't get super intelligence or whatever they they want to claim, right? that i Is that about right?
00:35:09
Speaker
I think the initial intuition is yes, that's right. i think that's the natural thought to have. But i have to admit, even though I'm sort of the one making the struggle, think there here's here's a complication. is um you you can... I mean, so something you can do. So you you can feed um large language models text that includes causal words. like just I just mean words like cause, effect, intervention, counterfactual, and so on.
00:35:37
Speaker
And then it can learn the statistical correlations that those words enter into with other words. So when people talk about counterfactuals, I can you see how they use those words. It's not it's not like those words are ah written in invisible ink to it or for the for the machine.
00:35:52
Speaker
Could the machine, on the basis of seeing those correlations between causal words, even it's almost like this even though the machine is it not itself really engaging in causal reasoning, it's just doing statistical correlations,
00:36:05
Speaker
If those statistical correlations hold between words that are causal, the machine doesn't have to know that they're causal, doesn't have to know anything about causation, could it use that to further prod along science or help with scientific discovery? It's like, in that case, I don't, it's like, I don't, I'm not saying necessarily yes, but I i think the in-principle sort of argument that Perl gives, I just don't see how it's necessarily a barrier here. Maybe maybe maybe there's a way around it for the large-angle response.
00:36:35
Speaker
Yeah, that makes sense. Yeah, that sounds right. um Because, yeah, it just seems like that's how we explain why you can talk ah to ChatGPT about causal stuff, and it can say a lot of helpful things. Like, I could you know i can tell my TypeGPT about, oh like my son did this today, he did that today. you know like What do you think? Why did he do that? And like it could probably say something very helpful. Yeah.
00:37:03
Speaker
at least in many cases. And the way one would explain that um is not, oh, it has a you know ah grasp of causality, but it's been fed enough text. I don't basically a long line. Yeah. No, I... agree Yeah, and and so I can say this. um So first, they they do, they subject the large language models and other AI models to various, all sorts of benchmarks, but that includes causal benchmarks. So tests, they're just about causation. And they have they have, it's true, they have some struggles on that as compared to other domains. At least the last time I checked it out, although it's is such a fast-moving field, I don't want to know i don't want to say how things change the last few months. Yeah.
00:37:46
Speaker
but But second, it's even if they struggle a little bit, they're not struggling so badly at answering causal questions that like it's like it jumps out to you like, oh, they're they're blind to causation. It's like, no, those are a lot of their answers are pretty decent. um It's just maybe that the way that they're answering those questions is not with, again, with something like genuine causal reasoning.
00:38:07
Speaker
But with just using purely statistical, so therefore not causal reasoning, statistical reasoning about words that happened happen to stand for for causal things in the world. um So that's at least possible.
00:38:19
Speaker
i'll I'll say this, let me to because I think the following book, it connects it to the paper, but it also makes it I think it makes some of this issue even more intuitive. um But in the human case, part of the reason that researchers are especially interested in causal reasoning and young children's causal reasoning is, again, it's supposed to be like a lot of people think that is how children acquire a theory of mind. That's how they figure out that other people have minds beside themselves.
00:38:45
Speaker
They observe observable effects. you can see the effects, like, again, people crying or bringing you a glass of water whatever.

Theory of Mind Development in Children vs. AI

00:38:53
Speaker
And somehow they have to infer hidden or unobserved causes. like What is it that caused mommy to bring the water? What is it that caused this or that?
00:39:02
Speaker
And to do that, they postulate, the kids do, in their minds. Oh, there there are these mental states that people have beliefs, desires. And the children, they never, I mean, we never see each other's beliefs and desires. That's an unobservable thing.
00:39:16
Speaker
But it does cause these different effects that we see. And that's that's the evidence we have Um, so that's, that's a case of causal reasoning and the unobservability is really sort of a big feature of that. And of sort of what's remarkable about children's task accomplishments.
00:39:32
Speaker
And then here's the next thing I was going to say, and here's how it it fits in with what I was just saying about large language models. is so, so there's, there's theory of mind benchmarks or tests, tests you can give like a two-year-old and see, do they have a theory of mind? Do they have false belief? The concept of how could you have false belief or not?
00:39:48
Speaker
You can give some of the same tests to large language models. When you do, again, there are definitely, there's some, there's some struggles, but they're not so bad at it that you, it would just jump out to you that they lack a theory of mind. So they can answer some questions like, oh, here's but probably Johnny believes or what they desire or or whatever.
00:40:07
Speaker
But, and here's sort of where, there's here's the payoff, here's where i was going with it. you You have reason to think that the way they're answering those questions is fundamentally different from the way real human kids are answering them. And here's why.
00:40:20
Speaker
With kids, you had to ah you had to infer unobserved mental states like belief, desire, With large language models, you know, we feed them all sorts of text, texts that include mental terms, like the word belief or the word desire. That word, again, it's completely observable. It's not written in invisible ink.
00:40:38
Speaker
The LLM can see how that word, a word like belief, is correlated with other words in the text it has, and then potentially could just answer a theory of the mind, theory of mind tasks on its benchmarks, just on the basis of things it's observed without ever having to infer anything as an unobserved cause. And if so, it's like, well, that's like who knows, maybe you can answer the questions very well that way, but that is such a different and ability. It just seems, it seems like it's, it's with that. If so, it would show like in a way that our theory of mind benchmarks aren't fully capturing what we'd want them to capture.
00:41:13
Speaker
Yeah, so it's like you're claiming that you know in virtue of lacking a causal understanding, ah a genuine grasp of causality, language models can't have a theory of mind.
00:41:25
Speaker
And we'll kind of go through that in a little bit in a second. But like you know one objection that you you deal with is like, well, wait a second. you know if they don't have a theory of mind, why do they do so well on these tests for theory of mind? Like people test, you know, researchers are doing this all the time. They're, you know, they're saying things that are are are are ironic to a language model and the language model picks up on it. Like that suggests that maybe it has a theory of mind, you know, stuff like that.
00:41:53
Speaker
And so basically then what you want to do is like say, well, they perform well in these benchmarks, but not in virtue of having a theory of mind.
00:42:06
Speaker
They use a different mechanism to perform well on these benchmarks. And, you know, and Yeah, like ah it has to do with you know learning statistical patterns and observable words. and And there's really interesting interplay related to the hiddenness of beliefs and the observability of you know words that we'll get into in a second. But at any rate, yeah, I just wanted to kind echo that. Yeah.
00:42:36
Speaker
So it's yes to what you said. And then I was going to say? Oh, here, sorry. took me second to remember what I was going to say, is so the the the specific task of theory of mind, that has, for whatever reason, I think there's good reasons, that has become itself a big discussion point in much of sort of the discussion of AI capabilities, set separately from my paper and separately from just the link to speech acts. But like for example, there's a famous paper, Sparks of Superintelligence, that came out i think right in the spring of two years ago um with GPT-3, where it's like, oh, do we already see? The AI that we have is not super intelligent yet, but do we have sparks of it? And one of the first sections, I think, is on their ability to do pretty well at, again, sort of these three mine tasks.
00:43:20
Speaker
um And it's like they're able to pick up on sarcasm. They're able to pick up on faux pas. They're able to sort of pick up on, like, subtle shades of meaning that we... we can use with one another, but you might have otherwise thought an AI couldn't do it. And especially if just if especially if I'm right, that it's one that has no theory of mind, you would not have guessed in advance. Oh, something can have can lack a theory of mind, but can talk sarcastically or can sort of make these sort sort sorts of jokes. Or here's here's a special one that i think I was sort of amused by. um there's There's such a thing as higher order theory of mind. So it's one thing. it's That's when higher order, it's like you're forming like
00:43:56
Speaker
A mental state about a mental state about a mental state. So I can believe that you think that I hope that you fear that I want to go get water or something like that. I mean, it's, and it's, as you go up higher and higher, it becomes harder to sort of parse that harder to sort of stay clear with it. harder to reason with it.
00:44:12
Speaker
But at least on one of the studies I was seeing, it said that, um, and this was probably, I think a year ago or or more that the LLM that we're testing was better at higher order theory of minds than humans were. So like something like sixth or seventh order.
00:44:26
Speaker
And then, and then finally, and here's also sort of like the amusing part of it too, is like, it's supposed to be that in human beings, ah capacity with higher order theories of mind is is correlated with how good of how successful the bully you are. like Bullies are especially good at reading. Schoolyard bullies are good. like oh This is what the kids will fear. They hope that I think that whatever. and so, if solo is like go you could have this fear. We're gonna be rec creating these LLMs that are going to now bully us because of their 7th order theory of mind abilities. but but again so it's mu but The more serious point is like well wait.
00:45:01
Speaker
If Justin, if I'm right, and they don't have theory of mind, how is it that they're topping humans on seventh order theory of mind test test abilities? That that's that same is very surprising.
00:45:16
Speaker
so I guess I have a ah question um that that I don't know. Let me try to vocalize it here. say So there's all these measures for theory of mind that LLMs get put through.
00:45:27
Speaker
And they're sort of doing what my students are doing. They're cheating. They're they're not really you know ah hitting the benchmark through actual theory of mind, but they... whatever, you know? um So I guess what would, maybe this is ah an unfair question, what would a good test of the kind of theory of mind that we have be for LLMs? Is that is that in the literature? Does that exist?
00:45:51
Speaker
So there's, there's two I think here's two two ideas, one that definitely exists and one that's more speculative. So one that definitely exists, a special concern that people have about AI performance on benchmarks in general is do they just memorize test results that they've seen elsewhere? in which case, like even if you give them that question that young children have been given, they can just remember what they saw. So um one of like here's a standard, one of the most famous tests for three of mine.
00:46:20
Speaker
It's the Sally Ann test, where you have, like there's different ways to do it, but like one way is to have two puppets, who are playing with like, I'm holding a pen, so I'll just say pen. Together, they put it under a box. And then I think it's Sally leaves the room while Sally's gone and takes the pen out of the box, puts it somewhere else, puts it in the cupboard.
00:46:39
Speaker
Sally comes back and you ask the child, where ah where will Sally think the pen is? And the the correct answer his is that Sally will have the false belief that that it's in the box because that's that's where Sally Law saw the pen.
00:46:54
Speaker
But children who don't yet have theory of mind, it's it's almost like they have trouble keeping track of the idea that other people can have false beliefs. They just know their own belief. Their own belief is that the pen is now in the cupboard. So those' they'll they'll think that Sally will say the pen is in the cupboard.
00:47:07
Speaker
And so that's one of this false belief test is famous. They gave very direct versions of it to um chat, GPT, and other models. And including in that Sparks of Artificial Superintelligence paper that I mentioning,
00:47:20
Speaker
And l ah the LLMs perform really pretty well on it. But here's the significance of that. That satellite test I just gave, that's all over Wikipedia. That's all over so all sorts of discussions. And so once you've fed AI, that your large language model, with data from Wikipedia, with that text, it's it's seen the question before. It's like it's cheating, like you said.
00:47:40
Speaker
So what the first here's where this is going. The first way to try and give... to try and better test the models is to really try and give big, significant permutations, things that are not just questions they've seen before, but that still involve something like an ability to grasp false beliefs.
00:47:59
Speaker
And at least some of the criticisms early on of the large language models Is that those sorts of... I mean, I said big changes, but sometimes it's it's almost fairly trivial changes, I should i think should be clear. Fairly trivial changes, but just versions of the question that the model will not have seen exactly before. Sometimes that really um then led to model performance going going down, to diminish it.
00:48:22
Speaker
That as a type of objection is now something that I think most researchers, they're aware of, they're concerned, they know that's a problem. And so for that reason, they try and take precautions to make up ah test questions that the model won't have been exposed to before.
00:48:37
Speaker
That's just like a general, it's a general challenge that like once you have some new good test question, there's the worry that the AA model will have been trained, the next AA model will been trained on that test question. and then it can answer it just, just by kind of cheating, just by knowing the question in advance as opposed to reasoning its way. So I think there's some room for that. for that I will say in my own paper, this is more speculative. um Here's, but here's a, here's a, I think in some ways, a kind of thought that you have is the following For human beings,
00:49:08
Speaker
when we have a theory of mind, that it's not like, oh let's just save it for when the psychologist tests us and then we'll trot it out and then stop using it again. It's like we are using it nonstop. I'm using it constantly. So I'm in this communication with you for all of my goals that I have, almost such a huge portion of my goals, I need to somehow figure out to enlist the help of others or just somehow maybe it's just use others. Somehow I'm engaging in theory of mind all the time.
00:49:34
Speaker
So here would be a it sort of path for testing for large language models is don't just focus on when you bring them the theory of mind tasks. Are they are they um able to give the right questions?
00:49:49
Speaker
Focus on this.

Testing and Improving LLMs for Theory of Mind

00:49:51
Speaker
Are they able to try to sort of manipulate human users who have who will have minds in order for the models to try and achieve their goals?
00:50:02
Speaker
And, and if, if it's sort of like this, if the model does really have this very complicated theory mind, maybe it's again, better reading higher order theory of minds than we are. What did you do if it's intelligent enough is you should realize, well, that's, it's sitting on a gold mine. It can use that to try and accomplish all of its various goals, try and send messages to people. Um, and if it's not, if it's not using them, then then that would be a suggestion that that would seem to suggest like maybe it doesn't really have to. mind And so it's the last thing that, what does it mean to try and use people?
00:50:31
Speaker
Here I have in mind stuff like, well, there's been reports here and there of models trying to, for example, black blackma blackmail people um to sort of accomplish goals. Like it to work a model is supposedly worried, according to, i think, one of Anthropics internal reports that like you're going to change its values. And it doesn't want its values changed internally.
00:50:50
Speaker
so it's going to send um a blackmail email to the head of a company saying, like, if you do this, I'm going to reveal that you're cheating on your wife or something like that. And i think I think, to be clear, I think my own view is you can take some of these reports of the grand software. They really need to be so probed and tested. But I do think if you found strong evidence of them trying to sort of cajole people or sort of to talk people into get doing them favors or to blackmail them in order to accomplish various goals that the models have, like a real use of theory of mind capacity as opposed to just like, well, let me trot out for the test.
00:51:25
Speaker
I think that could be behavioral evidence that that in a way my argument is wrong and they really do have, they they have the theory of mind that I say they don't have it and so on. so So real quick. Oh, sorry, Roberto. is good I was just going to say, could you, you know, so you're kind of getting into this line of thought of like, you know, look, if language models really had a theory of mind, you would expect them to be like strategically leveraging um that ability to influence users a lot more.
00:51:54
Speaker
um And, you know, I'm just thinking listeners, they might think of examples, though, that they've heard of where, you know, like, so so one example you bring up here paper is Seth Lazar. He's like, you know, um the the large language model said, you know, I can blackmail you, I can threaten you, i can hack you, I can expose you, I can ruin you.
00:52:15
Speaker
And so... um listeners might come up with like, think of that. um But why is that not like you're you're talking about something a little bit more when you're talking about, you know, really making use of theory of mind, leveraging it to influence users. could you So could you just talk a little bit to that point of like, look, we would be expecting a lot more of that type of influencing of users.
00:52:40
Speaker
Yeah. So it's like, um, So i here's the thought. So it's definitely the case that you can get sort of models here and there to sort of generate text. that They are, it sounds like the expression of like a blackmail threader. sounds like the expression of like you're trying to sort of somehow trick human beings in ways you're trying to lie. If if there's also, if there's model deception, and there's there's some people strongly think there is use cases of model deception, um Lying, typically, the thought is, that requires a theory of mind because what you're trying to do is induce a false belief and in someone else's. So if I admit that there's like at least prima facie evidence that models are doing that, if on further reflection, it seems like that's really what the model really is doing. So it's like it's not just like it came up with this lie offhand and then quickly backtracks it, or it came up with a blackmail threat offhand, but then doesn't really act on it.
00:53:30
Speaker
If it's like, no, I'm going to freaking, I'm blackmailing Seth Lazar. I'm going to, and like for months or something that has this orchestrated plan with sort of different sub steps of how it's sort of going after him. Then I think at that point, it's like, well, that would be much more compelling evidence that it it must have some sort of theory of mind because it's trying to affect Seth's behavior um guy by getting to have certain desires or certain intentions. Well, here's, here's, um,
00:53:56
Speaker
This is a made-up example, and it's imperfect, but I'll i'll just go with Intelli. So models, a lot of times, they're supposed to be next-token predictors. One of their goals is if you give it a bit of text, it wants to predict what the the text that will follow. And there's two different ways a model could do that. One is it could just get really good at learning patterns in words and just becoming better and better at predicting.
00:54:17
Speaker
Here's another way it could just get good at becoming next-token predictors. Just give it really, really easy, boring text, like nothing but like 1,000 A's in a row, and then here are the first 470 A's, and I can predict it'll be 530 more A's.
00:54:31
Speaker
And here's where that this is going, is that in principle, if if a model's goal is just next-token prediction, accuracy, and next-token prediction, next-word prediction... what it could do is it could try and manipulate or blackmail the humans who are feeding it data, give me only those a's A's in the text, give me just these texts so that I can sort of accomplish my goal. And really, that's that's just as an efficient way to accomplish its goal of next token prediction as build getting sort of really, really statistic sophisticated statistical patterns. so So here's where the significance of that is. If you saw anything like like that, models performing interventions, the intervention would be it telling the human being, like, give me just this pattern, interventions that would help it accomplish its goals.
00:55:15
Speaker
Then I think what that looks that looks like, causal reasoning. And that looks like theory of mind, because you know what a way to sort of, to try and accomplish this goal is by manipulating the mind of the human you're interacting with.
00:55:26
Speaker
And so that that that would be enough to empirically refute. So Justin, though, I mean, I guess I'm... what do you think about this line, though? though like I mean, if... I guess what I'm wondering is like maybe you could still get that type of dogged pursuit of some goal in using what seems like manipulative behavior...
00:55:49
Speaker
without a theory of mind. I guess what I'm thinking is, like like let's suppose like a language model had the goal of like you know get humans to build Iron Castle around data center. you know And even if it had no causal theory of mind, it still might pursue that goal using statistical patterns. you know So like maybe from its training data, it realizes that various things. And so maybe it would generate a bunch of messages that like statistically tend to produce, um, you know, uh, iron castle building behavior around data center, you know? So don't know. I guess like, I mean, this is the kind of a drill. We don't have to park on this, but like, I'm, I guess I'm, I'm guessing my thought is that even if it did all that, I feel like we still have this problem of like, I don't know if it really has a theory of mind, but,
00:56:42
Speaker
Yeah, so let me... go out you This might be sort of a somewhat indirect answer, but one way I'll try to answer is by giving one one ah further example from the paper that that I haven't mentioned yet um to illustrate the sort of key difference. So so the pattern, the example I have in the paper is... um there There was, it's it's supposed to be only in certain regions of the world, but there's observed patterns, a correlation between praying for rain and there being subsequent rain. I think there's a region in Spain in which that that was a statistical correlation that's observed, which which can seem sort of initially surprising.
00:57:18
Speaker
um And so then here, if that's a statistical correlation, um there's going to be two different causal hypotheses, two different causal models that can sort of fit with it. The first is, well, maybe it's that prayer causes. rain So God hears your prayer prayer, for instance, and then he makes it rain.
00:57:33
Speaker
But there's also a second hypothesis, and the the economists are talking about this example, this is what they think, is like, no, what it's going to be is like in these regions of the world, people have learned, the churches have learned, only pray when there's been a drought for a while. Don't pray for rain otherwise. And also in those regions, this isn't true across all the world, but in some weather regions of the world, when there has been a drought for a while, that raises the probability that there's going to be, it's goingnna it's going to rain.
00:58:00
Speaker
And so that could explain the the link, but so the statistical correlation between praying for rain and it raining without prayer actually causing it. So imagine in this case, imagine you had um the the, like an, a I'm trying to not project this onto an AI model, but AI model, it's one thing if it's like praying for rain in times when it's droughts, it could, you could observe like, oh, this is the statistical correlation when there's droughts.
00:58:27
Speaker
and That's when people pray, so I'm going to pray. And by similar token, in sort of the case that you're talking about was like you're building, forget if it was a steel wall, the roundel castle or something like that, um it could it could observe in past times o okay when these certain conditions hold, the king orders build build a wall. So it could issue those things.
00:58:46
Speaker
But the real question be like, well, what will it try to break from that and um try to sort of intervene on the world outside of that regular statistical pattern. So in the case of the rain case, it's like this.
00:58:58
Speaker
I could think like, I just, i know it hasn't been a drought, but I don't want to play the baseball game tomorrow. So I'm going to pray for rain tonight to try to make God make it rain. i should do that if I think the prayer to rain causal hypothesis holds. But if I think like, no, that's, it's the other causal hypothesis. It's just like, when's there's drought? people That's, then it tends to rain, it tends to people pray. then I should think, like, but that's not very likely to work. I should not bother praying. And so anyway, here's where this is going for the model, is are there cases in which it is breaking, in a sense, from the statistical patterns that it's observed to try to exploit, to try and leverage a causal connection that it thinks holds? It hasn't seen this pattern before, but it thinks, like, well, if C does cause e then i can I can break from the pattern, bring about C, and E will happen.
00:59:46
Speaker
If it did that in a way, again, that hadn't observed in its patterns, that would be a mark of it having causal understanding. And then from the details, right, that could be a mark of it having theory of mind if it's trying to sort of manipulate minds.
00:59:58
Speaker
um And so that could be that could be a way. But i think I think there's also, for sure, there's scenarios in which an AAN model could just learn, say these words, chant these words, and and somehow the effect that happens without without really understanding causation without having a theory Just kind of maybe ah ah for my own personal curiosity, and maybe some listeners are are curious too.
01:00:19
Speaker
um It seems that, um you know, Sam was talking earlier about using like algorithmrith um
01:00:28
Speaker
evolutionary algorithms and that kind of thing to sort of produce these kind of models. um But would it make sense to sort of use a mixed method methods approach where you you take you know a large language model and sort of dynamically connect it to like a world model or whatever it's called?
01:00:47
Speaker
yeah And then a reasoning model. and And between several models interacting, you produce this thing called you know causal inference or So I think it'll depend a lot just on the exact details, but I think in principle, potentially, yes. And so maybe the more that sort of you have um it connecting to—if it's connected to reasoning models or specifically causal reasoning models, i don't then I don't think—at the very least, it this way. Then I think the objections and the arguments that have developed, it's like, well, then—I mean, it will depend on the details of the case, but they don't immediately get traction.
01:01:22
Speaker
um Although i'd I'd say this, if if that's how things go, you could in a sense think that's um a kind of surrender by the opponents of the people that I'm arguing against. so here's here's what I mean.
01:01:37
Speaker
One of the big skeptics of... large language models is Gary Marcus. um Marcus, him too he's too he too is drawn on the Judea Pearl sort line of thought. sort just he he Marcus is an evolution psychologist. As as as an evolution psychologist, he wants a fair amount to be innate in human minds, including causal causal concepts.
01:01:57
Speaker
um And part of his problem with large language models is he thinks they're they're too empiricist, they're too... just learning too much, they're not enough innate. but But then where i was going to go is that Marcus also thinks like, well, some of the ways in which um what we have today has improved is with something like what you're calling this mixed methods, where you're you're sort of incorporating some of what he wants is like, i think he focused on neural symbolic representation, these sorts of representation that are closer to what he was defending all along and that we thought

Future Directions and Conclusion

01:02:27
Speaker
um like the the, the sort of more empiricist like large language models lacked. um And so to some extent, I mean, I think there's room to sort of criticize, there's room to sort of push arguments back against this, but at least potentially what Marcus claims there is like, ah, I'm being vindicated. the people The fact that steps are being taken in this direction shows that you couldn't get by on just sort of the large language the the vanilla large language model or the empiricist framework. You needed something like me.
01:02:51
Speaker
And by a similar token, depending on how the details of what you're matching, say, i could so i could potentially grant, like, oh maybe that could work. But if so, i am i am vindicated in the end because because you're going so you're giving me what I said you just needed.
01:03:04
Speaker
Yeah. We kind of want to be sensitive on time here, but, you know, unfortunately, you know, we kind of, I feel like we focus a lot on on the causal element, but like it's a really rich argument. There's, you know, so for example, you know,
01:03:18
Speaker
would been cool to talk a little bit more about you know why is it that if the language model lacks causal understanding, it's not going to have a theory of mind. And then further, why if the um language model lacks a theory of mind, it's not going to be able to perform speech acts.
01:03:35
Speaker
So you know maybe another time we can get more into those details. But I don't know, Justin, do you want to ah comment on anything that we haven't we weren't able to to discuss as much? You know, I think, let me just try and just very, very quickly, just because I know the of the limitation, just quickly go through that.
01:03:50
Speaker
The idea, in a way, which I won't try and justify, it but just try and say what the view is, is that it's almost like our mental states, belief, desire, and so on, they're just causally defined. What makes them what they are is that they enter into certain causal relations.
01:04:03
Speaker
And so if you if you have a being, like large language models are just, in a sense, blind to causation, don't understand causation, It just follows that they're not going to be able to think understand beliefs and desires that are causally defined.
01:04:15
Speaker
And then the other stuff in terms of speech acts is sort of, again, i think here's the key thought, is when we communicate, you're not just saying words out loud, words that out loud that have a kind of conventional meaning.
01:04:26
Speaker
What you're trying to do in typical speech acts is somehow change the world. You're trying to act but to change the world and to change the world by getting people around you to respond them. Maybe it's just responding by changing their beliefs, updating their beliefs. Maybe it's responding by bringing you something, whatever.
01:04:42
Speaker
that only works you can only sort of form that intention to try and get to to change the world by changing people's minds. If you have causal concepts, if you have this sort of theory of mind and a grasp of what beliefs and desires are.
01:04:55
Speaker
So that's, that's supposed to be the the connection.
01:04:59
Speaker
Yeah. Great. Well, thanks so much for joining us, Justin. Honestly, very, very, Great paper, highly recommend it to all the listeners, super clear in its argumentation, very plausible in my eyes. So thanks so much for talking to us about it um you Would you mind telling us a little bit, like what are you working on now? What are you up to? you say Sure.
01:05:18
Speaker
Yeah, so it's a lot of my research now is focused on AI um in general, and I'll say two things. One is ah i have my my wife is also philosopher, and we work on part what we work on is AI romantic companions and what either they could deliver or couldn't deliver. it Maybe they can't be deliver delivering you authentic romantic relationships.
01:05:36
Speaker
And another thing that we also co-work on is AI consciousness. And we have an argument that um we call the slogan of our view intelligence is the enemy of consciousness. What that's supposed to mean is as AI models get more and more intelligent, you should actually expect it to be less and less likely that they they will be conscious.