Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
30| Thinking Beyond Language — Anna Ivanova on what LLMs can learn from the brain image

30| Thinking Beyond Language — Anna Ivanova on what LLMs can learn from the brain

S1 E30 · MULTIVERSES
Avatar
143 Plays7 months ago

It can be tempting to consider language and thought as inextricably linked. As such we might conclude that LLM's human-like capabilities for manipulating language indicate a corresponding level of thinking.   

However, neuroscience research suggests that thought and language can be teased apart, perhaps the latter is more akin to an input-output interface, or an area of triage for problem-solving. Language is a medium into which we can translate and transport concepts.

 Our guest this week is Anna Ivanova, Assistant Professor of Psychology at Georgia Institute of Technology. She's conducted experiments that demonstrate how subjects with severe aphasia (large-scale damage to the language area of their brains) remain able to reason socially. She's also studied how the brains of developers work when reading code. Again the language network is largely bypassed.  

Anna's work and other research in cognitive science suggest that the modularity of brains is central to their ability to handle diverse tasks. 

Brains are not monolithic neural nets like LLMs but contain networked specialized regions.  

  • Anna's website: https://anna-ivanova.net/
  • Multiverses home: multiverses.xyz
Recommended
Transcript

Introduction to Multiverses and Power of Language

00:00:01
Speaker
I'm James Robinson, you're listening to Multiverses. Language can do and express many things. And in fact, this was the subject of my last conversation on this podcast with Nikhil Krishnan, talking about ordinary language philosophy. Just because language is so powerful, we might be tempted to think that that's all we need for understanding and predicting the world. It's just manipulation of symbols, next word prediction.
00:00:28
Speaker
However, if we look at how the human brain at least actually works, it's rather different.

Guest Anna Ivanova and Language vs. Thought

00:00:34
Speaker
I guess this week is Anna Ivanova. She's an assistant professor at Georgia Institute of Technology, and she tries to understand the relationship between language and thought. And she does this by looking at brain scans, essentially MRIs, of what's going on when humans are presented with particular scenarios. For example, I'm looking at this marvelous view right now from Carlton Hill of Edinburgh.
00:00:58
Speaker
I'm not thinking about it linguistically. It's going straight into my visual cortex and processes happening there. And if I want to reason about it, I'm not going to reason about it linguistically either. A lot of her work looks at how our conceptual knowledge of the world is not tied to the language area of our brain. So, for example, subjects with aphantasia, people who have large-scale damage to the language network.
00:01:23
Speaker
are still able to reason not only logically about chess problems and things like that, but they can reason socially as

Limits of Language for Thinking Machines

00:01:31
Speaker
well. They can understand what situations are unusual. So this is a really insightful and very timely conversation because it plays into a lot of the enthusiasm about LLMs, which I certainly buy into myself, but it calls into question some of this, forcing us to think, well,
00:01:51
Speaker
What is necessary on top of simple linguistic abilities to really be a fully fledged thinking machine? I think one question that still exists in my mind is to what extent just language manipulation could get us to a fully thinking machine, somewhat in the same way as I have a GPU and a CPU on this laptop.
00:02:14
Speaker
And I could use my CPU to play vector-based computer games, and I could use my GPU to send emails, but they're not really suited to that task. But the point is, maybe language could be a kind of fully fledged thinking system.

Defining Thought: Broad vs. Narrow Cognition

00:02:33
Speaker
However, I think it is valuable to learn from, in fact, what the brain does, of course. So I really enjoyed this conversation, and I hope you do too.
00:02:57
Speaker
Hi, Anna Ivanova, welcome to Multiverses. Hi, thank you for having me. So we're speaking and we're thinking, I think.
00:03:07
Speaker
and people who are listening are listening to our words and they're thinking. And thought seems like something that should be really familiar to us because it's one of those kind of few things that we have really direct access to. And yet at the same time, it seems so mysterious, so hard to figure out exactly what's going on. Maybe it's because
00:03:29
Speaker
the piece that's doing the figuring out is the thing that we're trying to figure out itself. I don't know. But yeah, how can we get some sort of grip on what thought is? Where do we start? Well, I think we need to start with definitions, what it is that we mean by thought, because different people
00:03:51
Speaker
use the word in different senses. And of course, it also depends on the context. And generally speaking, at least in my area of work, I think there is the broad definition and the narrow definition. And in the broad definition, thought is synonymous with cognition. So the mental processes that we use to make sense of the world around us
00:04:21
Speaker
So that includes reasoning, that includes accessing memories, that includes various social communication capacities. So very broadly speaking, something we would call cognition. And then the narrow definition is the stuff that happens kind of like in between us doing things. So it's not necessarily you get a math problem, you're reason about it, you give the result.
00:04:51
Speaker
But it's more about you lying down in bed at night and thinking about your day tomorrow, or you're walking somewhere and you're playing out your conversation that you're going to have with a friend. And so this kind of inner thinking that happens spontaneously, not in response to an external task, is also something that people commonly refer to as thought.
00:05:21
Speaker
And of course, those are very different,

Language's Role in Cognition and Spontaneous Thought

00:05:22
Speaker
right? Like if we're talking about the broad thing or the narrow thing, this specific kind of mechanisms that might support them might differ quite a bit. Yeah, I think both of those kinds of definitions capture something of what one intuitively characterizes as thinking. So we might say,
00:05:43
Speaker
you know, broadly speaking, oh, yeah, of course, when this person solved that math problem, they had to be thinking. But then you might also say, oh, they did it so quickly, they just did it without thinking, which would be sort of more of the narrow definition. They did it sort of without any kind of reflection, which is kind of what the narrower definition requires, I guess. I'm curious, do you have a kind of preference for either of those definitions, or do you think they both serve a kind of useful purpose? I've used both.
00:06:12
Speaker
So I'm interested in the relationship between language and thought, the role that language plays in thinking. And so the role that language plays in thinking, broadly defined thinking as cognition, that actually turns out to be a more tractable question because we can ask people to do a math problem and look whether language processing regions in the brain are engaged. But for inner thinking,
00:06:40
Speaker
So that happens spontaneously without an external task. That's much harder to capture. But that's also where people have very strong intuition. They're like, of course, I think in words, or of course, I think without words. And so that area is harder to study, but also very interesting.

Challenging Historical Views: Language Equals Thought

00:06:57
Speaker
And so that's where I see some of my future work going. Interesting. Yeah, it's true that kind of more
00:07:03
Speaker
reflexive thinking almost by definition people are going to have opinions about it because they are kind of Congestating turning things over and part of that process is inward looking so yeah, people are gonna be oh well I always do that with with words or with images or a mix or and yet sometimes I can also find it a little bit hard to Remember to do that and think about because whenever I sort of think about
00:07:28
Speaker
If I try to think about what thinking is, I will do it linguistically, but maybe that's just because it's the sort of, that's the sort of way that I need to think about that thing. Whereas if I think about something entirely different, like a kind of spatial reasoning problem, I'm sure I would do it in a different way, but it maybe wouldn't engage the linguistic part of my brain. But anyway, I guess, yeah, this is a very, we've got very quickly to a very,
00:07:58
Speaker
key area, which is this relationship between language and thought. My last guest was Nikhil Krishnan, an ordinary language philosopher, and we spent some time just talking about how, for a while, people just thought there was no difference between the two.
00:08:19
Speaker
in some way, language captured the entirety of everything, including thought, maybe including some other things. So, you know, Wittgenstein's famous dictum, the limits of my language, the limits are the limits of, or mean the limits of my world. But I guess your research is maybe questioning that. Would that be fair to say? Yeah, I use Wittgenstein
00:08:45
Speaker
quote as an example of a worldview paradigm that I'm pushing against. And what's wrong with it? I mean, as we said, like, when we try to describe what thought is, we'll probably reach for language. And yeah, it seems like
00:09:10
Speaker
so many of the ideas I don't want to say everything but so much of the ideas that we have and we pass on we do with language and maybe there's an argument that the places where we're not doing it with language are sort of somehow dependent on language behind the scenes but that's not a very scientific argument and I think you have some kind of evidence to the contrary so yeah maybe

Brain Studies: Language and Thinking Networks

00:09:34
Speaker
take us through some of the the things that you've done to probe this yeah let me
00:09:40
Speaker
There is a lot to say, so let me start first by I guess acknowledging the last bit that you said where clearly there is a relationship between language and thought and the most trivial but also important one is the fact that we use language to transmit information to communicate thoughts to one another and that's a very powerful mechanism. We can translate knowledge from generation to generation,
00:10:09
Speaker
So that is a very important role of language in thinking, helping us share information without having to figure out every single thing individually. But here what we're talking about is using language as a medium of thinking. So internally, do we think in words? Do we recruit the mechanisms for language processing when we're thinking? And so that's an important distinction. So that's the scope.
00:10:37
Speaker
Now, as we said, we people have strong intuitions about whether or not they use language to think. And probably that these intuitions are grounded in people's personal experiences thinking. And so one important fact to keep in mind is that there is huge individual variability in how people perceive their own thinking to be
00:11:06
Speaker
And so my pet theory is that a lot of philosophers are strong verbal thinkers. They spend a lot of time writing. They think about abstract topics. And so to them, the link between language and thought and their experience is very strong. And it just seems that
00:11:28
Speaker
people, not just philosophers, have this tendency to assume that everybody else thinks in kind of the same way. And so if you are a strong verbal thinker, you automatically assume that everybody else is as well until you start actually talking to other people and asking about their experiences.
00:11:51
Speaker
I've had these conversations with people at parties or just informally, you ask them, how often do you think in words? Most of the time, some of the time never. People are always surprised to hear that other people's experiences might be completely different.
00:12:11
Speaker
I think a very important thing to keep in mind that our intuitions can lead us because they don't necessarily

Programming Languages vs. Natural Languages

00:12:20
Speaker
reflect a universal human experience. It's just, you know, that's how we think. And so on to the actual evidence that we can use to dissociate language and thought.
00:12:35
Speaker
Um, there are a few different strands. Um, so, um, one, uh, example very briefly is the fact that animals who don't have language might often have pretty sophisticated thought and planning capabilities, right? We know examples of crows being very smart or fucked up by even, um, um,
00:13:01
Speaker
a squirrel that is trying to figure out whether to jump from tree to tree or if it's too far and it needs to go on the ground instead. These are pretty sophisticated capabilities. And so that's just a very basic example of at least some kind of thought. You can then argue, oh, but the kind of thinking they're doing is not the kind of thinking that we care about. That's where the meat of the debate is. But pretty sophisticated cognition is possible in non-human animals from what we know.
00:13:30
Speaker
For me, I work with humans, adult humans. And so what we can do is we can identify

Large Language Models Mimicking Human Thought

00:13:39
Speaker
parts of the brain that are engaged in language processing. So it turns out that there is a set of brain regions in the brain known as the language network that are responsible for language comprehension. So whether you're listening to somebody talk or reading,
00:14:01
Speaker
They're also engaged during language production, so when you're speaking and when you're writing, they are engaged in response to any language that you might know. That also includes sign languages, so it doesn't even have to be a spoken language.
00:14:17
Speaker
And these regions, turns out, are pretty selective for language. So they respond to all kinds of language, but not to music, not to math, not to general problem solving. And so this is pretty strong evidence that language and many different kinds of thinking are actually separable in the brain. That language has its own neural machinery.
00:14:47
Speaker
And that's important because it turns out that if language areas of the brain are damaged, it will affect your ability to use language, but not necessarily your ability to think. And so the most common example of that is a condition known as aphasia, so difficulties with language production or comprehension. Often most commonly it arises as a result of
00:15:16
Speaker
person having a stroke. And so if a stroke affects left hemisphere, which is where the language network is in most people, they might have really serious difficulties with language production or comprehension. But if that damage is limited to the language network, it turns out that their ability to use other kinds of thinking remains intact.
00:15:40
Speaker
So these people with really severe aphasia who really can't understand language or speak, they can solve math problems. They can arrange pictures so that they form a story so they can reason about cause and effect. They can look at the picture showing some kind of event like a fox cheese and the rabbit or the rabbit cheese and the fox and say which one is more likely to happen in the real world.
00:16:07
Speaker
It was something like really weird, like the scuba diver biting a shark, they will laugh because it's kind of ridiculous. And so, you know, you can tell that they understand what's going on. And there are really fascinating cases, you know, some of them like, like the playing chess on the weekend. So clearly very sophisticated forms of reasoning are preserved, even in the face of severe damage to language processing centers in the brain.
00:16:32
Speaker
Yeah, that's completely fascinating. And well, firstly, by the way, I love your theory about philosophers and how maybe the sort of minds that they have

Modular AI Approaches and Human Brain Similarities

00:16:45
Speaker
that make them good philosophers
00:16:48
Speaker
sort of self-select or selecting a very biased or unusual community of people who think in a particular way. And yeah, so that's really interesting. It'd be great to have a survey of philosophers, I don't know if this has happened, and how they describe their own thoughts and compare it to other groups.
00:17:14
Speaker
Yeah, I think sometimes that maybe I should not talk about this theory and actually test it experimentally first so that I don't bias people in advance. Well, I guess I'll wait for that.
00:17:24
Speaker
No, well, as long as they don't listen to this, or maybe you can do it anonymously or something. I'm actually talking, I think, soon to someone who from the philosophy of science who surveyed physicists to see if they are realists in terms of, you know, how they think about the entities of science or not. So I think it's like, it is really interesting to actually just, yeah, try to figure out how it is that
00:17:50
Speaker
Yeah, how it is that people's personal beliefs and their kind of academic disciplines or their own or the peculiarities of their minds intertwine. But coming back to the kind of experiments that you described. So yeah, I think these are just really wonderful illustrations of how thought maybe extends beyond the language network. And I suppose what you're doing is you're asking people, so for example, the way that we know that
00:18:20
Speaker
music is is not within the language network is I don't know the language network is defined as the kind of the bit of the brain that lights up in MRI scans you see a lot of activity there when you give people sentences and linguistic tasks I don't know maybe reading or producing language and then it's a different area of the brain that lights up when they're listening to music or when they're solving a math or chess problem yeah and and even when
00:18:50
Speaker
people have quite severe damage, so aphasia, and the language part of the brain is unable to comprehend or produce language or both, they can still do many of those other things, which is, I mean, that's really interesting for one thing, because I often think of, well, language as maybe being so key to the kind of input

Modular AI Applications and Philosophical Implications

00:19:10
Speaker
output of the brain, that, you know, for example, reading a math problem,
00:19:17
Speaker
would kind of go via the language network? Or is it that our brain is sort of able to just kind of directly take those symbols into, I don't know, a different area of the brain? Or perhaps do we have to kind of pose those problems in a kind of more visual way? I don't know, I'm curious about, yeah, whether the language network is kind of a gateway for much of the information going in.
00:19:44
Speaker
So it looks like if you give people math problems in the form of mathematical symbols, like five plus three, close question mark, it seems like it doesn't need to go through the language network. So even though it's symbolic, not all symbols get processed by the language network. And perhaps even more strikingly, one study that I did in graduate school was looking at computer code.
00:20:15
Speaker
Specifically, we looked at Python, which is very English-like by design, so it uses English words. On the other end of the spectrum, we took a graphical programming language for kids called Scratch Junior.
00:20:33
Speaker
different characters and so then you have different arrows showing all the characters going left or jumping, but it has a lot of the same control flow elements that you would have in text-based code like if statements and for loops and stuff like that. It turns out that for both of these languages, the main network in the brain that's responsible for extracting meaning from that code is the
00:21:00
Speaker
so-called multiple demand network. And so that's the network that's responsible for problem solving and reasoning and planning and not the language network. The language network responded a little bit to Python code, but even there we actually weren't able to exactly establish its relation, its role and why it would. It might be some kind of false positive where the language network is like, Oh, that's language. Oh, wait, no, nevermind. And it kind of goes down. So there are other researchers that are promoting that theory currently.
00:21:30
Speaker
Um, but, um, even for code where, you know, we call it, we call programming languages languages because how similar they are structurally the natural languages.

Future Research Directions by Anna Ivanova

00:21:43
Speaker
Even they are, it looks like it's not the language network that's doing the majority of the heavy lifting.
00:21:49
Speaker
Yeah, I found that completely, well, surprising, actually. And I think you noted in the paper that people kind of fell on two sides of the fence. Some people were surprised and some people were, oh, no, that makes complete sense. But I was personally really surprised because, as you say, there's so much similarity between the way that natural language works in terms of being compositional and having these kind of hierarchical features and the way that programming languages work that I would think, OK, well,
00:22:21
Speaker
It's right that we call them languages because they're so closely related. They're just sort of, I guess, a bit stricter, less ambiguous, perhaps. But the nature of the rules is not so different. And yet,
00:22:39
Speaker
Yeah, it's almost as if one could imagine there being some sort of animal like a crow, like you said, like very intelligent creature, doesn't have language. But maybe it's got a really good multiple demand network. And perhaps we could be a really good programmer, because it's not that part of the brain which is being recruited. But it's rather this kind of almost clearinghouse from what I understand. The multiple demand network just picks up so many different jobs.
00:23:11
Speaker
The other thing that really stood out for me in this paper, which I really enjoyed, was that as a kind of, I guess, control, you presented people the same problems. So the, if I remember rightly, one of the
00:23:26
Speaker
One of the pieces of code that people had to interpret in Python was a calculation of BMI. And so it's like, here is a variable which is your weight. Here is one which is your height. BMI equals height divided by weight squared. And so the person kind of reads through that and you see it being passed off to the multiple demand network. But then there's the same problem defined entirely verbally
00:23:53
Speaker
So instead of using symbols with equals and it clearly being Python code, it's just, this is what BMI is, here's how much you weigh, here's how tall you are, what's your BMI. And that went to a different region of the brain.
00:24:09
Speaker
Um, which for me was just like, okay, well, this is the same problem, but the way that it's presented really changes the way that we think about it. Um, which yeah, that was another huge surprise for me to think just how influential the kind of presentation or the, the medium, I guess, um, for a, um, set of concepts, uh, how much that that determines how those concepts are handled, um, internally, mentally.
00:24:39
Speaker
Yeah, and in fact, that's not that uncommon of a situation if you think about it. So let's say somebody is listening to this podcast versus reading the transcript. The way information gets into the brain is different. So in the auditory modality, it goes to the auditory cortex first.
00:24:59
Speaker
And in the visual modality, it goes through the visual cortex first, and then it gets into, we have a specialized part of the brain that's responsible for reading, so recognizing written letters. But then they will converge in the language network because the language network is responsible for processing either a modality. And that means that these initially distinct representations have to converge in some way.
00:25:28
Speaker
And so for some of the other cases, like a problem that's written in language versus in code, it looks like that convergence is also happening, but it's happening later on in the processing, right? So it goes through the language network and it gets the multiple demand, and then you have some shared problem solving. So in this case, calculating the BMI, doing some basic math,
00:25:57
Speaker
And so that we think also happens in the multiple demand network. And in that we show in the paper that you can kind of break down that activity that we capture into the code reading part and actual problem solving part. But it's a fascinating.
00:26:12
Speaker
Endeavor, in general, in cognitive neuroscience, how do we design an experiment where we have those kinds of different conditions where they're very similar except for something that we've changed? And so at what point that difference, right, auditor versus visual, language versus code, where in the brain does that make a difference and where doesn't it?
00:26:37
Speaker
Yeah, yeah. So you're saying that even though the language area lights up when we have that kind of BLI problem, it's just kind of passing the thing, and then it gets passed off to actually run the calculation. That happens. That doesn't happen in the language area. Yes. That makes sense. Yeah, OK. That clarifies my... I was very excited. I thought that maybe we had some sort of
00:27:04
Speaker
way of doing the confrontation just linguistically. So I guess that doesn't work. I think it's possible and we don't, well, I don't know, well, maybe not linguistically, but like, you know, when we memorize the multiplication table or for like some problems that we do very often, we don't need to actually go through the steps of the calculation, we kind of just retrieve the correct answer.
00:27:32
Speaker
I don't know if it happens linguistically or not, potentially not, probably not, but it's still a different mechanism than actually going step by step and doing, you know, long division in your head or like summing multi digit numbers or something like that. Yeah.
00:27:47
Speaker
Yeah, I think that that's a really interesting question, which we can maybe come back to. But I guess, yeah, so you kind of see both the multiple demand and the language network lining up when this problem is presented linguistically. So it's sort of a fair assumption to that. What is in fact happening is that there's probably some linguistic processing, but then it then gets passed to
00:28:10
Speaker
the same sort of area of the brain which handles the pure Python problem. But of course, yeah, I mean, that is really interesting and kind of useful in some ways in that, you know, it seems more efficient to be presented just with a Python code, right? You kind of bypass that, oh, I turn this into, you know, it goes straight into the
00:28:34
Speaker
the system which can perform the ultimate calculation, I guess. I don't know if you're able to capture any information on whether it was quicker for people to kind of solve the problem when presented with the Python code or not. I don't remember whether we saw a difference in how long it took people.
00:29:02
Speaker
I think it's possible, but some of it, of course, depends on how proficient they are in Python. So there might be individual differences there, also individual differences in how fast they would read text. So I'm sure there's some variability there. But it's actually an interesting thought that you're bringing up
00:29:28
Speaker
So that having this abstract skeleton with other information stripped away might make the problem solving the calculation easier. Because in fact, there are cases where researchers have observed the reverse. So there is this famous waste and selection task, which you have, let's see.
00:29:57
Speaker
a card with like a two and a seven, and then a card with like red and blue. And you need to test the rule that says if the number is even, then the other side of the card has to be blue.
00:30:25
Speaker
And so then the question is which cards do you need to turn over to make sure that that rule is correct. And so then people want to test
00:30:36
Speaker
They card with the two on it because it's even and so they want to make sure that they reverse is blue But then they test but then they often want to turn over the blue to make sure that the other side is odd Sorry is even But that actually is not what you should do because it doesn't matter like if you have blue and odd That's actually not a violation of the rule what you need to do is you need to turn over the red card Because if it's even number there then there's then that really it's violated. Okay, so that problem is hard for people
00:31:05
Speaker
But if you cast the same problem saying that there are people at the bar and somebody is 16 and somebody is 25 and then somebody is drinking beer and somebody is drinking a Coke, then how will you verify that only people over the age of 18 are drinking alcohol?
00:31:30
Speaker
And then of course, you know, that you need to, you know, check the 16 year old and check the person drinking the beer, uh, and not any other way. And so mathematically the same exact process, but it's much, much easier for people to ground the rule and their existing knowledge. Um, not necessarily the bar example, but they're just, you know, the easiest one and the most common one. Uh, and so this phenomenon is known as content effects on reasoning. And, uh,
00:31:59
Speaker
Yeah, I think a lot of people, especially like, you know, businesses and mathematicians and some people trained in like hardcore STEM disciplines that okay, like strip away all of the extra information, only focus on the abstract symbols. That's the easiest thing. But actually, for a lot of people, grounding the problem in some specific content domain tends to help. And so I know that some people in like math education are very interested in this phenomenon. And how does it how to make it
00:32:27
Speaker
easier for people to for kids to learn math. Is it by focusing on the extract? Or is it by grounding math problems in real life situations? And I suppose part of the reason why that grounding might work, well, there could be kind of two hypotheses. One is just like, it locates it in a different area of the brain, which is somehow better at processing this thing. So maybe that just
00:32:54
Speaker
I don't know, the social reasoning part is just better at doing that kind of problem. But it doesn't seem so likely in this case. And another is just that it clicks it in to a place where you're able to recognise a pattern that you've seen before. And so you're already on the right track.
00:33:16
Speaker
And this maybe comes back to your point about, well, maybe when we calculate the BMI for certain combinations of numbers, you just know the answer. So it's being recalled from memory. That pattern is already so established that you don't need to reason through it in the same way. It's more of a recall operation.
00:33:41
Speaker
I mean, this is getting us towards one of the kind of central questions, which is around, well, what are LLMs doing? Because they're kind of glorified recall machines in a certain way, or just really good patterned matches. Maybe before we get to that, though, I want to talk about another of your experiments, which I really enjoyed, which is about
00:34:04
Speaker
where people are looking at images of improbable, improbable things, like the shark and the swimmer that you mentioned. And what I found, well, maybe you should describe the experiment. You'll do a much better job of it than me. Because I think, yeah, there was just a really interesting piece here that kind of writes to this. Yeah, so here we use that same idea that the same information might arrive in the brain through different routes.
00:34:34
Speaker
And so in this case, we were looking at sentences describing basic interactions between two entities, like I guess we can roll with the shark bites the swimmer, the swimmer bites the shark.
00:34:53
Speaker
and pictures depicting the same kinds of events and so here by switching around who's doing what to whom we're manipulating whether the event is plausible so likely to occur in the real world or implausible so unlikely to occur and the question was
00:35:15
Speaker
does the language network respond to language specifically, or does it respond to meaning and concepts more generally? And so if it's language, it should only really respond to sentences and not to pictures. And if it's responsible to meaning, it should respond equally strongly to sentences and pictures, as long as the person is thinking about the meaning. And so we had people
00:35:44
Speaker
tell us whether they think the event is plausible or implausible so you have to be thinking about the meaning. And so what we found was actually something in between where the language network in accordance with all of the prior work responds more strongly to sentences than to pictures but
00:36:06
Speaker
it still responded to pictures to some extent. And I will say that in another study, we recorded responses in the language regions to pictures of objects. So is this animal dangerous? Can this object be found in the kitchen? That kind of stuff. And I did not respond to pictures of objects. So it was something about events, maybe just something more complex, maybe something just more fast-paced.
00:36:35
Speaker
that was specifically triggering responses in the language regions. And so this intermediate result, so preference for sentences over pictures, but also responses to meaning even in non-sentences was kind of puzzling. And so one piece of evidence that helped us make sense of this
00:37:02
Speaker
information was, um, evidence from individuals with global aphasia, from people with brain damage. And I should say that this is, yeah, so lots of the brain imaging work I'm describing, um, I did with, uh, my PhD advisor at the Duranko and the aphasia globe, the global aphasia bit, um, is done in collaboration with Rosemary Varley at UCL who, um, works with, uh, individuals with global aphasia, um, very, very closely.
00:37:31
Speaker
And so here we had two individuals with global aphasia really serious issues looking at pictures of.
00:37:43
Speaker
Swimmer bites shark, shark bites swimmer. And so, as I mentioned earlier, they were laughing at the weird ones. And so, in general, they were very good at distinguishing plausible and implausible pictures, suggesting that their ability to extract meaning from pictures was there. It did not require a functioning language network.
00:38:06
Speaker
And so then the title of the paper, the language network is recruited but not required for pictorial event semantics. So we see this activation, but it's not necessary to do the task. Yeah, I found that very insightful. What struck me was
00:38:33
Speaker
One of the hypotheses as to why the language network was recruited is that it's sort of another way of getting evidence or information on whether this event is likely or not. And of course, we can't be sure what's going on in there, but perhaps it is somewhat like a large language model where you
00:38:52
Speaker
You look at the thing and you're like, part of trying to figure out whether this picture is likely or not is you kind of read it out to yourself and you're like, well, does this, is this a familiar pattern, right?
00:39:04
Speaker
does it is shark bites swimmer, that sequence of words feels close to sequences of words that I've produced before or I've read before, whereas swimmer bites shark is kind of jarring. And maybe behind that is just the improbability of that sentence being produced according to the language model in our own minds.
00:39:35
Speaker
we don't really know that our minds work like a large language model at all, but it's an attractive hypothesis as much as it works. Yeah, so I guess generally speaking, right, so we got this result, language network recruited, but not required, and the question was, what's going on, right? And so generally speaking, we considered two broad hypothesis. One is that that activation
00:40:01
Speaker
is not necessary to do the task. So you see the picture, sharks, women, and biting. And so you activate those words kind of automatically by association, but you're not actually using them to reason about whether the event makes sense or not. So that's one hypothesis. And the other hypothesis is that actually the information in the language network is helpful when you are trying to
00:40:31
Speaker
recast information that you're seeing in linguistic form, you can then compare it with all of the linguistic information that you received in your lifetime, and maybe that information ended up distilled in your brain in some general way. We know that people are very sensitive to statistical regularities in language. We know that we're very good at predicting what word would come next.
00:40:58
Speaker
the information about what patterns are likely in text is very much what people use during real life language comprehension. And of course that information also can help us in many cases figure out which events make sense and which doesn't. And so actually we try to test that hypothesis.
00:41:27
Speaker
didn't necessarily, it's hard to test that in actual human brains, although we now have ideas for how we might be able to do that. But we started by using language models as proof of concept, right? So the hypothesis is statistical patterns in language input can help us distinguish plausible and implausible events.
00:41:56
Speaker
and language models are very good at capturing these statistical patterns. So if language models can systematically distinguish plausible and implausible events, that means that there is enough information there where maybe humans might be able to use that information also to distinguish plausible and implausible events, right? So it's not evident that humans do, but it's evident that humans can.
00:42:21
Speaker
And so we did that. We used language models and tried to see whether they systematically evaluate plausible event descriptions as more likely than implausible. And so in that study, we specifically distinguished between two kinds of events. So one is
00:42:46
Speaker
The teacher bought the laptop versus the laptop bought the teacher. So animate-enanimate interactions. And so when you swap them around, if you interpret the tense verbatim and the inanimate object laptop cannot buy anything, buying requires that the subject is animate. And so that's a very kind of in-your-face screaming violation.
00:43:10
Speaker
And then the other example is kind of like the fox chase the rabbit, the rabbit chase the fox, or the swimmer bite the shark, the shark bite the swimmer. The swimmer biting the shark is not impossible. It can happen. It's just way less likely.
00:43:25
Speaker
And so what we found is that when it comes to distinguishing possible and impossible events, language models were very good, almost at ceiling. So that was actually very easy for them. But when it came to likely versus unlikely events, there was a gap in performance. So they weren't quite as good. They were OK. They were above chance. But they definitely weren't perfect. And so we kind of- So not as good as people, I guess. Right. Yeah.
00:43:52
Speaker
Not as good as people and not as good as when they have to deal with animat and animat with impossible events. Yeah. So I think it's like, to me, it's actually more interesting to compare those two sentence types, like how models do on them. But also, humans do well on both because these are easy sentences. So they're not meant to be challenging. And so, yeah.
00:44:21
Speaker
No, I was going to say, and do we take that as evidence that then when humans reason about these things, they're doing it not just linguistically, or is it that our language models are kind of like better than the language models that are out there in the computer language, large language models? I think it's evidence that humans are doing it not linguistically.
00:44:45
Speaker
And so the reason why we think there is this performance gap is because actually the language input that we receive doesn't faithfully describe the world around us. So when we talk to each other, we don't just passively describe everything that we're seeing. So I'm not telling you, you know, I am sitting down, the lights are on, the room is empty, like it's very boring stuff.
00:45:13
Speaker
I'm telling you about things that are unusual, novel, interesting, newsworthy in some way. And so this phenomenon is known as reporting bias. So language tends to undercover, under report information that is kind of trivial, right? That everybody already knows or can reasonably infer. And so maybe actually events that are unlikely are not as unlikely for LLMs because
00:45:43
Speaker
We talk about unlikely things all the time. That's the stuff that's worth talking about. So if that's true, that's the reason why we see this performance gap. Then even if the human language model is very good, which by the way, we don't think it is. Actually, I think large language models now are much better at predicting the next word that humans are. So actually, they're better. But even if humans were really good, just the language input is insufficient for us to be able to
00:46:11
Speaker
distinguish plausible and implausible events. That means that we have to use something else in addition. We have to maybe have some more sophisticated model of the world where we can actually correct for this reporting bias. We have to also bring in information that's about what things are typical, what we can expect, what we cannot expect.
00:46:35
Speaker
So we're probably drawing on sort of multiple mental resources or systems. In the case of the clearly kind of impossible, so computer-by's teacher, is that...
00:46:53
Speaker
is it just the language? Can you see if it's just the language? Or have you seen if it, whether it was just the language network that's recruited there as well as one might think, well, if it can just be done within the kind of the one region, maybe it's more efficient and you know, metabolically, there might be some kind of preference for doing that if it were possible. But yeah, by default, we kind of light up various regions to just to make sure I don't know.
00:47:19
Speaker
So that's actually a study that I would love to do next. So this difference between impossible and unlikely events is something that emerged out of this language model study.
00:47:31
Speaker
And so now, I think it would be great to bring it back to the MRI machine, measure people's brain activity in response to impossible versus unlikely sentences, and see if the language network alone is sufficient for distinguishing possible and impossible events. That is the prediction that follows from this language model work. And so I would love to test that.
00:47:58
Speaker
Yeah, that would be so. Yeah, I'd love to see the results of that. So I hope that happens. But I suppose coming back to LLMs, what we're starting to see is that maybe just a large language model in itself, for various reasons, might not
00:48:23
Speaker
be so effective at thinking or reasoning as the human brain and one is as you kind of mentioned that the the data that comes in is kind of biased toward the salient and newsworthy as you as you put it but then another from the kind of python example is that well as a matter of fact we don't we don't use the language part of the brain to for code comprehension or for
00:48:52
Speaker
logical mathematical reasoning either for that matter. I suppose my question there is though, you know, could it be possible for LLMs to kind of just be
00:49:12
Speaker
be able to take on the functions of the multiple demand network, for instance, which is doing all this, which is the place which does the mathematical logical Python code comprehension. Could it take on all those responsibilities just by having
00:49:33
Speaker
getting really good at saying next word prediction for mathematical problems and next word prediction for code generation and so on. Or is that implausible? I don't really know how we characterize where the LLMs just could have kind of emergently develop all those capabilities within a single language model, or if that's just very, very unlikely.
00:50:02
Speaker
Yeah, so LLMs do a bunch of different things. In general, as you mentioned, they are very, very good at pattern recognition and pattern completion at different levels of abstraction. So they do a lot of just direct memorization, right? The larger the model, the more text it can just memorize straight up, which is why a lot of those copyright issues end up arising.
00:50:32
Speaker
But that's not the only thing that these models do because they definitely are capable of generating novel texts and mixing and matching previous inputs. So the patterns that they can recognize and reproduce, they can be fairly abstract. But then of course, the question then is pattern completion all it takes? Is that the only thing that's necessary?
00:51:01
Speaker
And so that's where it gets tricky because a lot of logical reasoning is algorithmic reasoning. It's symbolic. It's very regimented. And so these are the kinds of problems where these models seem to struggle. So for example, if you ask them to add and multiply two numbers together, if the numbers are small enough, then the model is doing just fine. But if the number is large,
00:51:30
Speaker
That means it wasn't part of the training set. It means it couldn't have just memorized the response, which it probably does for a lot of smaller number combinations. And so then it would actually have to multiply step by step. And it doesn't seem to be doing that very successfully. In fact, it often gives you a number that's close, but just a little bit off.
00:51:54
Speaker
And so the kind of mistake that it's making is different from the kind of mistake a human would be making, because it's still trying to use pattern matching to complete, and it's not quite working, it seems. Yeah, but I mean, it's very hard to figure out exactly how they're doing, what they're doing. I mean, they've got so many parameters, and it's surprising how
00:52:18
Speaker
good they are yet still imperfect at doing those sort of problems. They're kind of like a broken calculator. So they're much faster at getting to an answer, but it's not quite the right answer. It's a pretty good estimate often. And yet it's not completely out. So it's really, yeah.
00:52:45
Speaker
I don't have a strong opinion, but part of me thinks, well, maybe they'll just kind of, with enough data going in, they might just crack that. That might come a point at which that kind of ability emerges. Although you point out in one of your papers that, well, perhaps if that ability emerges, it might be that a particular kind of architecture that models the human brain emerges as well. So it may not be that.
00:53:15
Speaker
it might happen in such a way that it becomes less fruitful to think of a large language model as simply a model of language, but something that has a kind of linguistic language network part, like the human brain, and then hands off to a logical part
00:53:36
Speaker
And as it happens, obviously, I mean, chatGPT has had that kind of architecture imposed on it, at least in the version with the Python code interpretation, for instance, because you can say, oh, add these two numbers together, and it will figure out, oh, well, I'm doing a math problem here. So I'm going to convert this into a Python problem, and then it runs the Python code. So actually, some of these problems seem to be
00:54:02
Speaker
uh are being sort of addressed I guess by uh the developers um but the way they're doing it is yeah yeah offloading yeah yeah so uh I guess let me uh unpack a little bit there's a little there so um first of all it is very tempting for people to over ascribe intelligence to a language model
00:54:31
Speaker
And presumably that's because in our everyday interactions, we're used that language gets generated by a thinking feeling being other humans. And now we have a system which is breaking that relationship where we have something that generates coherent language that's not human. And so it gets confusing. And that's the reason why
00:54:56
Speaker
That's one of the reasons why there's so much hyper-unto-language language models and they're expected to be the general intelligence models because of this tight perceived relationship between language and thought. And so then when they make a math mistake or they make a factually inaccurate statement, you're like, oh no, like how, you know, these models are terrible. They're not terrible. They're just totally different capacity you're evaluating.
00:55:23
Speaker
And so what we argue is that it's very important to distinguish different kinds of capabilities in these models. And so there is something that we call formal linguistic competence. And so that's the ability to generate coherent grammatical language. And that's something that in humans, the language brain network is responsible for.
00:55:51
Speaker
And then there is all of the other stuff that you need in order to actually use language in real life situation, in interactions. You might want to ask somebody to close the door. You might want to tell somebody how you feel. And there are all kinds of situations where you need to use language.
00:56:13
Speaker
But to do that you actually need other capabilities you need to be able to reason about social situations you didn't be able to know things about the world in order to generate actually accurate statements.
00:56:27
Speaker
You need to be able to reason logically and know some math if you want to solve a math problem. So even if that information is coming in as language in order to be able to make sense of it and also generate language that achieves a particular purpose, you need all of these other capacities which broadly speaking recall functional competence. And so different kinds of capabilities might suffer from different problems. And so we already touched upon a few.
00:56:56
Speaker
We touched upon the fact that mathematical reasoning and logical reasoning might require a different kind of algorithm. So instead of pattern matching, it might need to be more symbolic and it's not fully clear whether large language models today are capable of doing that. Maybe they are, but not necessarily in the default way they operate. So that's an open debate there.
00:57:26
Speaker
When it comes to world knowledge and knowing things about the world, distinguishing plausible and implausible events, there a big problem is reporting bias and the fact that the training data that they have is biased. And so you might need to be able to build up a more general situation model, event model that will not just take in the language that you receive, but also fill in some kind of commonly assumed things. Like if it's daytime, it's light out, stuff like that.
00:57:57
Speaker
And yeah, so different kinds of problems might require different kinds of solutions. A more general kind of potential solution that we advocate or talk about is modularity. So the fact that the brain is modular suggests that that might be an efficient architecture.
00:58:21
Speaker
So a language processing module, the goal of the language network in the brain is not to reason, it's to get information that's expressing fuzzy, imprecise words and extract meaning out of it. And then pass it on to relevant systems that can solve the math problem, that can infer the social goal, all of that stuff.
00:58:43
Speaker
And presumably, for an artificial intelligence system, you might want to do something similar, where language is not a replacement for thought, but is an interface to thought. And so in your example, right, you have a math problem, the language model translates it into code,
00:59:04
Speaker
It's very good at taking this broad like fuzzy natural language and translating into a more precise symbolic representation. That's something that we didn't have at all even a few years back. So it's a huge achievement. But then instead of trying to have that same language model to run the code, you're much better off passing it off to a code interpreter that will run the code and give you the answer. So the same kind of modularity that we see in the brain, that seems to be an effective way forward
00:59:33
Speaker
in the AI world that indeed some developers have started to adopt. Yeah, and I think there's probably other ways in which the builders of these tools are trying to modulize. Like another one that comes up a lot is RAG or Retrieval Augmented Generation where
00:59:55
Speaker
yeah, there's some kind of database or just could just be a whole bunch of, you know, documents or whatever. And instead of hallucinating an answer, you want to make sure that you pick up something from one of those documents and there's a whole different kind of machinery for that. But again, like in the code interpreter example, it's, I guess the language part is,
01:00:22
Speaker
Key, maybe less key in RAG because it's kind of a vector search, but it's a way, it begins with translating language into something a bit more precise, in this case, a vector instead of some code, I guess. And yeah, one wonders then if, how close the parallels are between what is being built here and what's going on in the brain. You mentioned that, yeah, perhaps this is a good model for thinking about how we think.
01:00:52
Speaker
Language is this part where this place where things kind of, you know, entry point for concepts, but the places where those concepts often get manipulated in terms of reasoning might be in other areas of the brain. They sort of become something more abstract than language itself.
01:01:16
Speaker
Yes. Yeah. One thing I actually just slight tangent, but I do sometimes think that maybe language is being so associated with thought, because it's kind of like,
01:01:30
Speaker
the easiest thing to do. We know thinking is about concepts and some manipulating these things which are representations of the world. And language is just such an easy way of visualizing all of that and understanding what's going on. But perhaps it's just the surface level of something much deeper that we really don't have an easy way of capturing.
01:02:00
Speaker
And that would map, I think, quite well to this kind of model of concepts being passed around, but the concepts themselves being beyond linguistic somehow. Yeah, so as we mentioned, language is a system designed to communicate thoughts, concepts from one line to another.
01:02:22
Speaker
And so for this communication to be efficient, presumably language needs to parallel the structure of thought, the structure of concepts in some way, right? And so it's much more abstract already than the raw perceptual input, than just audio, than just pictures, right? So it kind of captures the relevant abstractions to a large extent.
01:02:45
Speaker
And so that seems to be helping a lot. And so that does bring us much closer to this more abstract conceptual representation. We're getting rid of a lot of extra details. We say cat, we don't care which color, which size is the cat. But of course, that mapping between concepts and language is imprecise. We know that different languages partition the conceptual space in different ways, right? So the words don't necessarily map the concepts one on one.
01:03:12
Speaker
Even within the same language, the same word can be used in many different contexts in different ways with different meanings. And so that link is pretty fuzzy, it can get pretty fuzzy. But it's definitely, I think you're right, when it comes to raw surface form, it's a very decent proxy, imperfect, but it makes sense why people attempt to use it.
01:03:38
Speaker
And in some ways that means it makes what LLMs do so much more impressive because they're also somehow capturing that surface form of concepts. Someone, a previous guest pointed out this wonderful quote from Ilya Tsutskeva saying,
01:04:00
Speaker
if your LLM can predict the, you know, it's not just predicting text, because if your LLM can be fed the, you know, the first part of a new, you know, a mystery novel that it's not read before, and it can tell you who the murderer was, it's not just predicting a word, it's somehow kind of understood what's going on in that story. Now, one of the difficulties, obviously, with all these things is, well,
01:04:30
Speaker
we don't know how OpenAI's LLMs are trained. So it's very hard to test them because you really need someone to write a new mystery novel to actually see if Ilya Sutskeva's claim cashes out. So that's quite a high-effort test. Unless, yeah, we happen to know of one which is definitely not in the corpus that was used. But yeah, it does seem, you know,
01:04:57
Speaker
The fact that they are so good at mirroring what we produce and what we produce is somehow a good map onto something somewhat deeper, the world or an inner world. Yeah, it's so impressive. And you point out as well that it seems that the way that
01:05:23
Speaker
LLMs operate is very similar structurally to the way that our minds operate in that it's not working on the raw audio or pixel forms of things. The beauty of language is the compositionality at the level of small units, which are combinations of symbols or
01:05:46
Speaker
small sounds. And yeah, the LLMs perfectly match that. So we've built these things which really do capture something quite essential about how at least a part of our mind operates, it seems. And yet, maybe we've been seduced into thinking that's all there is to thinking, right?
01:06:09
Speaker
Yeah, so in fact, well, that question, I guess I don't want to get too technical, but the question of what LMS is starting with is actually an important one when we're trying to compare them with human minds or human brains. So in fact, what a large language model operates over is tokens. So it's chunks of characters that tend to occur pretty frequently in text. And so oftentimes they're words like the,
01:06:38
Speaker
but there are sometimes not words. If the word is long, it gets split up into multiple tokens. Yeah. And so the problem is that those tokens actually don't match linguistic units that the word is actually made of, like morphine. They can be pretty arbitrary. And so that does cause some differences between the way other ones process them and humans do. In fact,
01:07:05
Speaker
people think that one reason why large-language models are bad at arithmetic is because they tokenize numbers in weird ways, right? So, like, I don't know, 16, 18 is chunked in, like, 161, and then 8, and so then it gets weird when they have to, like, add up those numbers, and so that's where you get these weird pattern matching errors. And so, this kind of cool-arm
01:07:27
Speaker
is that it's very engineering driven. It's actually not like very rigorous or scientifically based. And so it's interesting, like maybe if we change this little thing, it actually will result in much better performance. And so it's funny how a lot of those choices are pretty random engineering driven things. And, you know, they often work very well.
01:07:53
Speaker
But it is possible with a small few tweaks, you can actually make the model much better. Yeah, no, I always thought that there was more sort of reasoning behind the engrams that were used. But maybe is it just kind of randomly chunks? Because I would have thought, well, there's some kind of.
01:08:11
Speaker
it makes sense to split words up because particles like nus, if I think of like redness, right? It's not a word in itself, but it does attach to so many different words that it's sort of part of the compositional structure, I guess. But if what's getting chunked up is just two S's, right? And not nus, then it's kind of hard to, yeah.
01:08:37
Speaker
No, no, that's exactly right, because ness is a morpheme, it's a suffix with a particular meaning. And so if redness is chunked into red and ness, that makes a lot of sense and it's linguistically justified. But oftentimes, that's not how the chunking happens. That's where the mismatch arises. Yeah. So you can definitely have the two S's in principle. Okay, interesting. Yeah. Yeah, it seems like, yeah. One would think that with a bit of curation, maybe they could be
01:09:07
Speaker
even more effective. And yet it's hard to imagine them being more effective in terms of producing language, but perhaps that's just because they've been fed such copious amounts of data that they sort of these, you know, they could be more efficient. But the algorithm is kind of
01:09:31
Speaker
The goal is for it to be universal and that driven right without human creation, which is why the morphids don't get, um, respected all the time. Uh, it causes a lot of issues for languages that aren't based on, uh, the Roman alphabet. Uh, so let's say Arabic, for example, uh, it ends up getting tokenized at the character level because the tokenizer is just not adapted to deal with it. And so that does mean that performance on these languages that are not
01:10:01
Speaker
Roman alphabet base is actually worse, often substantially worse. It's generally a problem that the fewer, the less data a language has, the worse the performance in that language. Some of the more general information seems to get pulled across different languages, which is cool.
01:10:19
Speaker
But a lot of language-specific stuff, like grammar, right, of course, depends on how much data you have in that language. But a particular distinction that tends to matter beyond just the amount of data is which alphabet. And so because so many of these models are English-centric, a lot of other languages get left behind. Yeah, interesting. And to what extent, I mean, I know there are techniques for doing this. So you spend
01:10:45
Speaker
You've done a lot of experiments looking into the minds or the brains of people. There are tools which allow us to do this to an extent with LLMs. But how effective are they? How does it compare to looking at an MRI, trying to understand what's going on inside an LLM, what concepts it has, or what's lighting up as it is given a prompt?
01:11:13
Speaker
Yeah, so I am fascinated honestly by how many parallels there are between studying biological intelligence in humans and artificial intelligence.
01:11:23
Speaker
And for me, the first similarity is really just starting at the behavioral level. So developing separate experiments to look at formal competence, like grammar, functional competence, like reasoning. These are methods from cognitive science. How do we design good experiments? How do we disentangle different contributors to performance? So even before we start looking inside the model or inside the brain, just looking at how humans behave and how models behave can tell us
01:11:53
Speaker
a lot about potentially how they do it, what kind of mistakes they make, what does it tell us about about the potential mechanism that they're using to solve the task. But then, of course, even we can get even more insight by looking at the actual mechanisms or their neural correlates. So for humans, that means looking inside the brain. And for models, that means looking inside the model.
01:12:18
Speaker
And so the movement that is getting seen currently, the mechanistic interpretability movement in AI,
01:12:26
Speaker
is doing that essentially they're asking which circuits which units inside the network are responsible for a particular behavior and so they first try to identify those units that get particularly engaged in a task maybe they respond differently to plausible sentences compared to implausible sentences
01:12:51
Speaker
And then the beauty of having an artificial system is that you can actually go and manipulate it directly. So you can knock out that circuit, or you can replace activations from one sentence with activations from another sentence. So in neuroscience, people sometimes do that as well. In animal research, for example, or there are certain kinds of stimulation that you can do that aren't harmful, but can maybe do the desired effect.
01:13:18
Speaker
In aphasia studies, these are natural causal experiments, right? We didn't cause the lesion that destroyed the language network, but because we see those cases occurring naturally, we can look at those effects. And so the causal tools are really powerful because they can really help us to see whether this part of the circuit is necessary for the behavior that we observe. And so in AI systems,
01:13:46
Speaker
we can do that quite easily. But conceptually, I would say, in neuroscience and in AI, what we're trying to find out is very similar. Yeah, yeah. And it's, I mean, it's wonderful, as you say, at least with the behavioral point, you can draw on the same kind of experiments that, you know, we finally have a kind of artificial intelligence that you can feed the same sort of things that you'd feed a person, i.e. sentences.
01:14:15
Speaker
And so it makes it very natural to run those kind of experiments. But then on the other hand, you can also go into the thing itself and tinker with it in a way which would be very unethical and even just impossible with a person. So you could, I think there was one example where you had, I don't know, the concept of, or Berlin was replaced with Paris, or no, what was it? It was Rome.
01:14:43
Speaker
Was it the Eiffel Tower was placed in conceptually into Rome or something like this and you asked, well, how did you get from Berlin to the Eiffel Tower? It wasn't me, but it was, yeah, it's a famous kind of editing study. Yeah. Yeah. I think I must have read it in one of your papers referring to it. Meringue about, oh, yes. Yeah.
01:15:06
Speaker
And so the LLM does really, it kind of responds in the way that you would think if what's going on is that it has some kind of model of the world and what all you've done is kind of switch around some pieces inside that model. It's not that it gets completely, you know, it doesn't throw everything completely out of whack, I suppose. And it even kind of infers some things that, you know, the Eiffel Tower will be in the center of Rome and
01:15:32
Speaker
to be up with the Coliseum or something like that, which is, yeah, it's so fascinating to have something where we can kind of plausibly peer in into the internal workings. And yet, just like the human brain, everything is so complicated that actually also it's not a trivial task, I guess.
01:15:58
Speaker
No, but that's the benefit I guess when neuroscientists have. We're used to dealing with this complexity and there are ways to zoom out beyond just each individual neural unit to try and look at general trends and general patterns. I think a lot of people are daunted by the task of trying to understand the neural net because it's so big and complex and because
01:16:23
Speaker
It's trained in this way where we don't necessarily know which features it picks up on. But to me as a researcher, I'm just excited. I want to know. It's like a cool puzzle to solve and a cool problem to understand. But generally, I'm pretty optimistic about this endeavor. Cool. Yeah. I think you mentioned at the very beginning that your research is now starting to look at some of the
01:16:45
Speaker
you know, possibly trickier question of this kind of reflexive thinking, the narrow type of thinking, I think you mentioned. So we've been talking a lot maybe about the broader definition of cognition as just kind of reasoning, manipulation of concepts, which one might even do in a very automatic way, as we were saying, like you might just solve a maths problem without really, you know, in a way where you'd say, oh, yeah, I didn't think about that. I just did it.
01:17:14
Speaker
Yeah, how does one, what kind of things have you, how can you look at this other problem of like when people kind of cogitate about things and turn them over in their minds? Where are you going with that? I'm really curious. I think to me the interesting question here is the question of individual differences. If some people report thinking in words most of the time and others say they don't think in words at all,
01:17:44
Speaker
Presumably, we should be able to see that at the brain level. Presumably, we should be able to see the language network working hard for the first group and not at all for the second group while they're thinking spontaneously in the path-free setting. That's really what I want to look at. But in order to do that, we need to have a good questionnaire that will capture those differences precisely.
01:18:12
Speaker
I think instead of just asking people, oh, do you think in words a lot, they're a little, it would be helpful to think, to get more information, right? Do they think in like, what, what does it mean? What, like, what if, uh, other meta assessments of their own thinking style is reliable, right? So like, can we trust those judgments? How can we make them more granular? Um,
01:18:39
Speaker
Another question that I'm very interested in and that's really understudied currently is, is there a difference between thinking in words and hearing the words, right? So if you're using some kind of words and some kind of language to think, does it mean that there is a voice or not necessarily? Some people it turns out they might see the words written in their mind's eye, so they spell it out.
01:19:08
Speaker
a minority, like less than half of the population, but it does happen. And the capturing those differences I think is fascinating and then trying to look at the neural correlates to essentially establish the validity of those differences to show that they're real and not just something that people perceive and report, but actually that's not necessarily how they actually think.
01:19:33
Speaker
It's an interesting direction because psychology has this interesting history of an interesting relationship with phenomenology. So the people reporting their own experiences, right? That used to be very common and then it turned out to result in a lot of pseudoscience and discredited a lot of psychology and so then there was this
01:20:02
Speaker
huge turn to behaviorism, where all that mattered was the stimulus and the response. And people were refusing to talk about any internal operations at all. So people are still very suspicious of phenomenology. So self-reporting experiences. And I think for the right reason, because often, yeah, we just don't know how we think. We're like, I think it's words or I think it's not. Sometimes we'll make a decision, right? Like we were saying kind of very quickly.
01:20:30
Speaker
And then when we have to explain what we did and how, we're going to have to rationalize it. And so maybe that's actually not how we arrived at the decision, but post hoc, we come up with an explanation that might not correspond to the reality. So I think we have to be careful when taking people at their words. But to me, when people report the striking differences of like, oh, yeah, I think in words all the time versus like never, not at all.
01:20:59
Speaker
It seems like there's something there and so I would love to use neuroscience to get at that question more deeply. Yeah, it's a tricky one. I mean, it strikes me that even the process of asking someone, do you think in words?
01:21:11
Speaker
almost necessarily linguistic to communicate that. Because as we say, this is the way that we pass ideas around. And so maybe there's just that kind of arrogance in the language network, which is going to intersect that question and say, oh, yes, it's me. I do all the thinking. But as you say, well, many people do report thinking in many other ways. So yeah, I'm really curious about
01:21:40
Speaker
what that shows. I mean it's just so this must surprise you all the time just how you know outwardly we sort of walk around and we move around and we breathe and we have all our organs are you know working in pretty similar ways and yet internally it might be you know we seem so heterogeneous I guess. Does that sound about right or or am I over stating the kind of differences in brains that we have?
01:22:09
Speaker
I don't know. I think, yeah, it just depends on your intuition about, you know, how much similarity and differences you would expect. Of course, our personalities are very different, right? Our likes and dislikes are interests. So at the cognitive level, there are lots of differences between people, of course. And so I guess the interesting thing is that we have these huge differences in how we perceive our own thinking, but they don't necessarily manifest very obviously in differences in behavior. Right. So
01:22:39
Speaker
In addition to differences in inner speech, another common example is differences in mental imagery. It turns out that some people never experience visual images in their mind's eye. When they're asked to imagine a red apple, they will think about the concept of an apple and redness, but they will not see a red apple in front of them when they close their eyes.
01:23:07
Speaker
Yeah, and so that phenomenon has a name aphantasia and the name got coined in 2015 So very recently really and this is the phenomenon that kind of got discovered over the centuries of various times and then forgotten again and Rediscovered because again people just tend to assume that everybody else has the same roughly inner experience with them and so those differences Just end up getting neglected
01:23:35
Speaker
But it turns out that people with a Fantasia, again, you cannot tell them apart very easily from people with vivid visual imagery. So it turns out that lots of things we do in the world, you can do whether or not the experience images visually. Similarly, whether you have strong inner speech or not, turns out you can't really spot these people very easily out in the wild because they act very differently. So that's the interesting thing, despite these experiences,
01:24:04
Speaker
being so different, somehow we can still act in roughly similar ways and do the task that we need to do in the world. We might be using different strategies, it's very possible, but the end result is that actually those differences are very hard to see. Yeah, that is fascinating. I actually have a friend who is an AFant, I guess. Well, he didn't find out until a few years ago.
01:24:31
Speaker
There's no kind of outward sign, right? It just seemed completely normal. But then, oh yeah, I just can't visualize triangles, right? I know what a triangle is. I can reason about triangles. And actually often there's
01:24:48
Speaker
there might be some research which shows that there, in some ways, are oftentimes better at reasoning about certain things where one might think it requires a visual element. But yeah, he was a very good physicist, a very good colleague, but just thought in a different way, I guess.
01:25:12
Speaker
Yeah, I think we'll find some differences, right? Like now that there's more awareness and once we start doing more systematic research, I think we'll, like, I mean, there are already attempts trying to look at the relationship between aphantasia and episodic memory turned out as aphantasia and yeah, spatial reasoning, geometric reasoning there. The link is not as strong or maybe not even there.
01:25:41
Speaker
even though people expected it to be, and there are various explanations as to why. But yeah, essentially, like I think, even though those differences aren't apparent, I think we'll find some eventually, and probably we'll just find out that different people are using different strategies to do the same thing. Some of them might require imagery or taken in words in your speech, and some might not. I mean, all this is a reminder that well, LLMs might be
01:26:09
Speaker
We might end up producing artificial intelligence, which outwardly look very similar, but we should be careful to think that, to be mindful that inwardly they could be very, very different. And I think it's very true and it's already happening.
01:26:29
Speaker
There are all those cases where people were screenshotting chat GPT responses, especially right after it came out a year ago and just showing it responding to some very complicated prompt and then doing it correctly. And people were so impressed being like, oh, you know, if you have, if you put like a male
01:26:52
Speaker
on a table, like, you know, like on a chair, is that like a construction stable or not? Or, you know, what kind of things you can put on top. And like, it looks very impressive. And then it turns out that if you change the problem just slightly, then it just starts spitting out total nonsense. And the same thing happened with
01:27:13
Speaker
Um, a social reasoning problem, kind of predicting what the other person would think, which is a classical problem from, from psychology. And so the claim was that, you know, now it alarms can reason about people and what they do. Uh, and then again, it turned out to be changed the prompt, uh, to be slightly different from the problems that were already available on the internet. Then model performance drops drastically.
01:27:37
Speaker
So it's very easy to fall for this seemingly impressive performance, seemingly impressive understanding. And so luckily for us, in this case, there are ways to design even behavioral interventions that can maybe help us figure out what's actually going on and what strategy is actually being used. Yeah, yeah, that's a very important problem for sure.
01:28:04
Speaker
I found the social reasoning example really, I just loved it. So I think if I remember correctly, one of the ways that you fool them is just inserting a few words in the middle. So the classic one is something like, you know, Susie Hyde's Bob's Apple. It was in the closet. Where does he think the Apple is? I know the thing will say, oh, not in the closet anymore.
01:28:31
Speaker
But if you change it to say, well, Susie Hyde's bought Apple, it was in the closet. And she tells him that she hid it. And then the LLM still says, oh, he thinks it's not in the closet. And I think, yeah, so clearly they've been either on the training set or in the fine tuning. It's gone so far in getting them to kind of give the appearance of having some kind of theory of mind and being able to solve those problems.
01:29:01
Speaker
with a little bit of tweaking, it becomes apparent that they don't. But it seems to be coming harder and harder to fox these systems. And you point out that one of the real difficulties is we just, without knowing what has gone into the training of open AI's models,
01:29:26
Speaker
it's hard to know how much genuine intelligence has emerged and how much is this pan-recognition or simple pan-recognition and recall. But of course, the paradox here is, well, they have some of the most advanced models, so maybe there's something genuinely interesting going on, but it's a black box, so we can't really say. But I think it's encouraging that
01:29:52
Speaker
other folks are, you know, mistrial, perhaps are being more open and about what's going into that into their models. So, you know, maybe easier to, to test on the very latest things and have confidence that, I don't know, some problem was not hearing verbatim in the training set. But even then, yeah, it could be some very similar problem. Yeah, for sure. And like, it's not
01:30:20
Speaker
always bad if they have been fine-tuned on this problem and if they've seen examples before. Humans learn lots of things. We can't do math right off the bat. We learn, we learn from examples. It's easy for us to the problems that are familiar to us, the novel ones. That's fine. It's just important to know because that helps us figure out what is the mechanism that they're using or potentially using and what are the cases where they might break.
01:30:49
Speaker
We don't necessarily need to expect these models to be like amazing zero-shot thinkers on totally novel problems all the time. It's just that, yeah, knowing what goes in the training data and knowing what problems they've been fine-tuned on just helps us assess them more accurately. Yeah. Yeah. Yeah. That's a good point. It's unfair to demand of LLMs that they don't learn anything or they don't benefit from fine-tuning. But yeah, but we want to know if they are
01:31:19
Speaker
if the way that they're responding is that they've abstracted patterns or they're just regurgitating. Yeah, I found it really interesting to go through your work. I'm usually optimistic, I guess, or maybe that's the wrong word. I really admire how LLMs are
01:31:46
Speaker
have taken off, and it surprised me how quickly they've advanced. But one thing I've enjoyed about your work is kind of reminding me, well, actually, maybe they're not as far along in some ways as they appear to be at the surface level. Like, there still seems to be some nuts to crack here to bring them closer to human thought. What's your take? Do you think, you know,
01:32:15
Speaker
I don't wanna ask the typical how far away is AGI because, but you know, are you of the opinion that just cranking through more data is gonna continue to produce results or should more be invested in this kind of modularity approach? And if the letter, well,
01:32:42
Speaker
Do we have, you know, all the things that are taking place on the right track, or do we need to look more closely and learn more from the human mind, perhaps? I think we definitely should recognize all the impressive achievements that we observe in our lands today. And one way that
01:33:06
Speaker
My colleagues and I have been thinking about it is through the formal functional competence lens. On the formal competence front, learning the rules and patterns of natural language. These models have been incredibly impressive. So even a couple of years ago, they almost reached the ceiling for English, at least language where they have the most data.
01:33:29
Speaker
And they did it without the need for any fine-tuning, so just learning to predict words in a massive corporal of text. Turns out that is something that gives you all the most of the grammar and the knowledge of idioms and all kinds of patterns that characterize a language.
01:33:49
Speaker
And that wasn't trivial at all. That was the subject of debate for linguists for decades. Is it possible to just learn from data or do you need a rich set of internal rules that can help you figure out what's grammatical and what's not?
01:34:04
Speaker
So that is incredibly impressive scientifically and on the engineering front, different language processing systems in the past have struggled so much because you just can't encode language with a few simple rules or even not simple rules. Just like so far, there are so many exceptions in the regular Forbes and this and that.
01:34:25
Speaker
And the fact that these models have mastered that is so impressive, and people kind of forget and start talking about AGI right away, but that's an impressive achievement. Then being good at language is already very impressive. And then we get the functional competence and their ability to reason and be factually accurate, know what's true and what's false, and be actually helpful. And so that's a whole other host of problems where they actually
01:34:48
Speaker
seem to be spotty, right? Like they have achieved a lot because of pattern recognition, but then it turns out that that performance is not robust and it breaks. And so that's where it gets more complicated and more controversial. And that's where we argue modularity will be helpful. Again, looking at the human brain as an example.
01:35:13
Speaker
One distinction we make though is that the modularity doesn't necessarily have to be built in by design. This built-in approach we call architectural modularity where we have a language model and let's say a Python code interpreter and we put them together and they're clearly different and they're doing different things. That can be promising, but then of course you need to know what the right modules are, you need to set them up in the right way,
01:35:41
Speaker
An alternative approach that might work for certain cases is what we call emergent modularity, where you start out with one network, but you don't necessarily specify what parts need to be doing, what you let the network figure that out over the course of training, and you can have different parts self-specialized to do different things. That might require some changes to the architecture to be able to promote this kind of specialization.
01:36:08
Speaker
It might require changes in the objective function, maybe next word prediction alone is not necessarily going to be good. And it might require changes in the training data, kind of like what's happening with fine tuning today, where you are feeding it specific problems that you're asking the model to do in specific ways. So you might have, you know, like selectively boost like the social reasoning and the formal reasoning and the factual knowledge, like there might be
01:36:37
Speaker
specific things we need to do. But there is a lot of promise in these approaches. And the paper where we introduced the formal and functional competence, it's something we started working on in 2020, around the start of the pandemic. Language models were around then.
01:36:55
Speaker
but not nearly as advanced. And as we were writing the paper, and in fact, after the initial preprint version came out, that's when we started seeing the field, the developers shifting away from this simple scaling up approach. That's not the approach that's common anymore. People have started to shift towards specialized fine tuning, using very targeted data sets to improve performance on specific domains.
01:37:21
Speaker
coupling an LLM with external modules, right? All of those things that we kind of suggested that, you know, might be good because it's more brain-like. RAG became big, right? All of those things are something we've seen over the past year. They're very encouraging that now the AI field is also recognizing that it's not just about scale, that you do benefit from different components working together. Yeah. Yeah.
01:37:46
Speaker
Yeah, I think very exciting to see what comes next, both in your work and in the field of LLMs, which it seems like maybe someone's listening to what you're suggesting because all these things are happening. Yeah, I don't know if you have any final comments or predictions. Warnings of Doom often come up with
01:38:16
Speaker
in these discussions, but this has been surprisingly positive. I just think that science is important and we just need to use good methods and not run after the hype and be realistic in how we evaluate the strengths and limitations of these models. There are strengths, there are limitations, so being too far on just the positive and just the negative is not necessarily the most productive.
01:38:44
Speaker
We just want to be able to disentangle them effectively. Thank you so much, Anna. This has been really insightful. Yeah, thank you.