Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Anders Sandberg on ChatGPT and the Future of AI image

Anders Sandberg on ChatGPT and the Future of AI

Future of Life Institute Podcast
Avatar
209 Plays2 years ago
Anders Sandberg from The Future of Humanity Institute joins the podcast to discuss ChatGPT, large language models, and what he's learned about the risks and benefits of AI. Timestamps: 00:00 Introduction 00:40 ChatGPT 06:33 Will AI continue to surprise us? 16:22 How do language models fail? 24:23 Language models trained on their own output 27:29 Can language models write college-level essays? 35:03 Do language models understand anything? 39:59 How will AI models improve in the future? 43:26 AI safety in light of recent AI progress 51:28 AIs should be uncertain about values
Recommended
Transcript

Introduction to the Podcast and Guest

00:00:01
Speaker
Welcome to the Future of Life Institute podcast. My name is Gus Stocker. On this episode, I talk with Anders Sandberg from the Future of Humanity Institute at Oxford University.

Chat GPT: Development, Risks, and Benefits

00:00:13
Speaker
We start out by talking about chat GPT, which is the newest language model from OpenAI. But we also talk about in general what we can learn from how AI has developed about the future risks and benefits of AI. I really enjoyed this conversation and I think you will too. Here's Anders Sandberg.

Experience with Chat GPT and its Surprising Capabilities

00:00:34
Speaker
Anders, welcome to the podcast. Thank you for being here. Well, thank you so much for having me.
00:00:40
Speaker
All right, I assume you've been playing around with the chat GPT, the new language model from OpenAI. Will you try it? Of course I have. It's a very fun, very shiny toy right now. What have surprised you about its capabilities?
00:00:59
Speaker
I was probably most blown away when I read that you could actually use it as a fake Linux terminal to make it pretend that it's actually a computer. And you can move around in different directories, install software, surf the web, except of course it's not the real web. It's a kind of a dream web that it's imagining.
00:01:20
Speaker
And I saw people tweeting about it, so I tried it myself. And I got curious, what are the files I'm finding on this computer? So I found this folder containing letters. So I found the CV of the owner of the account. So I read John Doe's CV, where he's talking about his skills as a programmer at XYZ Corporation.
00:01:44
Speaker
It was a weird feeling because in some sense, GPT is making up a very plausible file to find on a generic computer from indeed a very generic person. At the same time, I felt like maybe I shouldn't go and look at the private files because they're probably private. Where is this information coming from?

Understanding Large Language Models

00:02:03
Speaker
Maybe we could explain how GPT line models work.
00:02:08
Speaker
So generally, the large language models are glorified versions of the text completion we have in our smartphones. You basically try to predict what is the next word or even the next letter coming from a previous sequence of characters.
00:02:24
Speaker
And it's apparently obvious that you can do this in a simple statistical method, that you just take a lot of text, calculate the probability of the next word based on the previous ones, except that that doesn't do much for grammar. And indeed, these kind of random Markov chain models have been fun sources of nonsense poetry since the 1970s in computer circles.
00:02:46
Speaker
What the language models do is they add a more complex artificial neural network that actually is aware in some sense of previous words that have been in a sentence. If I start talking about animals in a sentence, then various words that might be ambiguous might be more or less likely, given that the context is animals. If I switch to the context to start talking programming, then suddenly other words become more likely.
00:03:15
Speaker
This is all there is to it. It's basically predicting a sequence of tokens based on a large corpus of text that you train it on. And now it just predicts the next token. And then you can, of course, add more to that. And it generates a fairly likely text.
00:03:30
Speaker
This is in itself really amazing because some of these texts actually do seem to contain real meaning. And then when you do it interactively, like in chat GPT, you're writing something and you're getting a plausible response. And it's very hard to shake that feeling that there is intelligence here.

AI Models: Reflected Intelligence and Misjudged Predictions

00:03:48
Speaker
I give an instruction and the text responds with following that instruction, even though what's actually going on is not so much that there is a being that understands that Anders wants me to do these things. But rather, it's a text that is being updated by two actors. And the game is very much like, if there is an instruction earlier in this sequence, then that should be applied later on.
00:04:15
Speaker
So there is no real understanding in one sense. In another sense, it's very clear that I can ask these systems which colors are more similar to each other, and I get reasonable responses.
00:04:28
Speaker
This is weird because, of course, these systems have never seen an effect. They have only ever experienced a lot of text. But that text comes from humans that generate, of course, a lot of color and appreciation, a lot of color names, a lot of color comparisons. And given that it's more sensible to say that green is closer to maybe turquoise than red is, then the text system will do this too.
00:04:55
Speaker
The problem is this kind of reflected intelligence confuses us very, very easily. Yeah, I think it's important to remind ourselves how these models actually work. Sometimes it's
00:05:08
Speaker
It's confusing to say. I had a chat with a GPT in the weekend and I had a discussion on moral philosophy with this model and it felt real to me. It felt as if the model understood what I was saying. And then I asked it to correct the point that it made because it was wrong and it acknowledged that it had made a wrong point.
00:05:31
Speaker
But then it repeated the point over and over again, which breaks the illusion. So it's also amazing to think that how broad the knowledge in these models is. So I know something about moral philosophy, but you could also ask it about engineering, bridges, or art, or poetry, whatever it is. So in one sense, as you say, this line of models show understanding, but in another sense, it's brittle understanding.
00:06:02
Speaker
It's also not sensitive to inconsistencies, which I think is very interesting. I saw somebody that had a wonderful argument with Chad GPT about whether it could speak Danish. And the model was responding in Danish saying, no, I don't understand Danish. I'm trained on an English corpus. It was obviously wrong, obviously inconsistent, and it didn't care because that was the most likely way that we are the conversation
00:06:32
Speaker
could continue. It's interesting to think about these abilities that to us seem like a qualitative leap, but they arise out of basically the same type of model that just has more data or has more parameters or has longer training times, more expensive training runs. But to us, it seems as if the models are learning new things. Yeah, for example, that we can now hold longer conversations.
00:07:00
Speaker
Yeah, do you think this trend will continue and we will continually be surprised? I think it's a very good bet that we will be surprised because our ability to predict the capabilities of the AI systems has been demonstrated again and again to be really bad. It goes both ways. People have both been too bullish about how easy it would be to actually program many tasks and discovered the hard way that computer vision is a really deep problem.
00:07:30
Speaker
But solving some problems that are easy for young kids is actually very hard for robots to do. While it's not terribly hard to write software, but does symbolic math that is very much more powerful than any human could ever be.
00:07:46
Speaker
So that's more of a paradox that the five-year-old is kind of outperforming the best robot while rather simpler programs are outperforming the professor. But at the same time, we've also been wrong about things not being doable. So in the current era of large language model, there is a lot of people who know a lot about language and AI that have been making very confident predictions about their failures and shortcomings and that this cannot be dealt with.
00:08:15
Speaker
And usually the story is a few months later, a new version shows up and it totally blows those problems out of the water. And then typically the human expert gives another interview where he proclaims some other problems being really a good test to show that this is not true intelligence and that it cannot possibly be solved.
00:08:35
Speaker
And my bet would be that six months later, a new version arrives and does even that, even though the underlying basic point that, yes, the language models don't really have understanding in our sense is valid. It's worth recognizing that, yes, they're not intelligent in the sense we use about having a goal, having
00:08:57
Speaker
plans to implement this goal. The language models are just predicting strings of tokens, but we're doing it so well that even people in robotics have started to use them for internal communication and planning. Because you can, of course, use these strings of tokens to also make up good plan. How do I make a cup of tea? Have the language model describe the steps, send that to another language model that controls the robotic arm.
00:09:23
Speaker
So then you combine the instructions from a language model with some computer vision, and then you have the beginnings of a robot that can actually navigate in the world. Yeah. So if we look back, if we think say 10 or 20 years ago, we perhaps would have thought that we would have pretty high functioning self-driving cars by now.
00:09:49
Speaker
But that didn't turn out to be the case, at least not yet. But we didn't predict, or at least I hadn't seen predictions of language models with this kind of hitability, as we're seeing now. Can we use this information to predict how we will be surprised in the future? Or is that too much to expect?

AI Applications and Limitations in Critical Fields

00:10:10
Speaker
I think we can use it as a kind of analogy. So generally, history doesn't repeat itself, but it rhymes. So the autonomous cars were a beautiful example where it got a kind of transition. Back in 2006, when DARPA did its first grand challenge, the cars were not doing very well. I think none of the cars actually managed to get to the goal post, even though they're driving in an open desert. And most people were laughing and saying, yeah, that's how bad it is.
00:10:40
Speaker
People who really paid attention said, oh, this is actually getting somewhere. Next year, they actually all went and they reached the goal post and progress seemed to be very fast. And indeed, shortly they were driving in urban traffic. So that led us to believe that, oh, we're going to get autonomous cars very soon now.
00:11:00
Speaker
Then it turned out that there was another problem. Yes, of course you can make a car that drives relatively well in a city or at least on a highway. Dealing with unexpected circumstances and making it safe enough that we can entrust it to drive our children.
00:11:15
Speaker
That is a very different thing. So right now, people are a bit more pessimistic about autonomous cars because they're good enough to demonstrate that they can be unsafe. And actually getting that extra level of safety, so we would say this is perfect to use, is very, very hard.
00:11:33
Speaker
Meanwhile, the language models were kind of flying under radar, because most of us didn't even think it was very important. I said they were glorified cell phone text predictors. And in one sense, that's exactly what they are, except that they're so good at predicting these sequences that you actually can use them to solve problems. GPT-2 could, with some effort, play chess, since there is enough chess games in chess notation.
00:12:01
Speaker
You still need to do a bit of extra work to just get legal moves. I bet you can do much better with chat GPT. This is not because it's doing a proper chess evaluation, just that the database contains so much. And generally, the big surprise to somebody like me who learned about neural networks in the 1990s, before they really worked well, before they were cool, before you got a six-figure salary for knowing this, is that
00:12:28
Speaker
The internal structure is not fundamentally dissimilar from what we were tooling around with. It's just that you have much bigger data sets and much bigger compute to actually do the training. So many of the things I solemnly told my students about back in the 90s like, oh, yes, you need to avoid having too many degrees of freedom in the network because you will get overtraining and that's going to be very bad.
00:12:51
Speaker
turns out to be not really true these days. Instead, you want to have a lot of parameters. You want to have billions of parameters, because you're both training on a much bigger data set. But there are even these surprising properties where overparameterized neural networks seem to be rocking. That is actually the term people are using in the field now, that they find the underlying pattern of a domain
00:13:17
Speaker
and then can be very good, accurate predictions. So Grok is from Heinlein's science fiction novel, The Moon is a Harsh Mystery. So a boy reared by Martians has this weird Martian word for truly understand something, to actually be that thing. And it seems like in some situations, the underlying complexities of reality deep down have certain patterns that you can learn once you get enough data. This was not something we could predict that neural networks would be good at.
00:13:46
Speaker
And indeed, in the 90s, my advisor told me that deep neural networks were kind of a dead end. We didn't get much more results than the original test of the 80s. And that was totally true in the 1990s. And this stopped being true in 2010s because we had enough data, enough compute to run big networks.
00:14:07
Speaker
Now, this story about a mixed bag of predictions seems to suggest that over the next decade, we shouldn't be too surprised if it turns out that there's some new applications or architectures that become very powerful. And they might come from almost any practical application. Also that old methods can be used for other things. So for example, the convolutional deep neural networks that were used in computer vision.
00:14:33
Speaker
You can do one-dimensional versions of them to make sound synthesis and language synthesis. So we can totally expect that as a spin-off of the working computer vision, we're going to get much better in the voices and probably artificial music. That's a likely prediction.
00:14:50
Speaker
But which jobs are going to get disrupted? Flip a die, flip a coin. You don't really know. When Michael Osburn and Carl Frey did their famous study on the ultimate ability of jobs that was published in 2016, they claimed that 47% of all jobs could be automated in the foreseeable future.
00:15:14
Speaker
they gave probability for different jobs. And it turns out that artists and illustrators, there was only 4% chance that it could be an automation. But of course, 2022 saw this rise of AI art where DALY2 and Mid-Journey and Stable Diffusion really scared the art community because it's pretty clear that at least some of the illustrator jobs are totally getting automated.
00:15:38
Speaker
Meanwhile, the insurance underwriters that we gave a 99% probability of they could be automated are doing fine. Why? Well, actually, their job is much more about business relationships than actually taking numbers, filling them into spreadsheets and calculating a premium. So it turns out that our ability to reliably predict the consequences is rather low.
00:16:02
Speaker
And that leads to a strategy. It's kind of recognizing that, yeah, we are going to get surprised here. And that in itself might mean that we need to hedge our bets quite a lot. Do you think the failure rate of language models will hold them back from being
00:16:19
Speaker
from being used in the world to solve actual problems. Like we mentioned with the self-driving car, there the failure rate is too high for them to actually drive our kids around. You could imagine a language model lawyer, but if that lawyer as a language model
00:16:36
Speaker
every 100th clause it produces is complete nonsense. And we can't really trust it in a way that we perhaps can trust humans. Could there be similar problems where language models will confabulate or hallucinate, say, untrue things, and therefore they can't be used as productively as we wish they could?
00:17:00
Speaker
I think it's very likely in many domains, so the automated lawyer, it's not a problem if every hundred paragraph is gibberish, because real lawyers can also occasionally say gibberish and it doesn't matter. If a lawyer says something that is untrue on the other hand, uh-oh, now they are in real trouble.
00:17:23
Speaker
And the problem is, of course, language models are true bullshitters in Frankfurt's philosophical sense. They don't care about being true or false. They just generate an output. So this makes them very unsuited, as they currently are, for that kind of job. I tried out chat GPT on chemistry experiments. I asked it, what happens if I mix the following chemicals? And besides a short lecture that I shouldn't be doing that because it's unsafe, which was totally correct,
00:17:53
Speaker
it gave a somewhat correct answer for two chemicals. And then I changed it to another chemical. And again, I got a short lecture that that's an even more unsafe experiment, which was correct. And then it gave a totally erroneous answer. And it was really convinced about this erroneous chemistry. That would not be very useful if I actually wanted to do proper chemistry. You need to tie the output to actual facts in the world much stronger.
00:18:21
Speaker
But on the other hand, I tried it to come up with scenarios for role-playing games. I pointed out that classic Dungeons & Dragons is in a fantasy setting, but maybe we could do other historical settings. So at first I had it come up with good names for spells in a classical Greek setting.
00:18:39
Speaker
Then I asked it, could you come up with some other interesting historical settings? It suggested that 1815 France, the post-Napoleonic era, and we're discussing the possibilities of having undead, emperor, loyalist soldiers. This was creative work. It was not super creative. It was not coming up with something
00:18:58
Speaker
that was totally unthinkably weird. But at the same time, it was useful. I could tell it, let's add a complication. Let's add the Marshal of France, currently the King of Sweden, to the storyline. And that worked out quite well. This is creative work. There is no right and wrong. It's only entertaining and less entertaining. And this is where I think it can be very powerful.
00:19:21
Speaker
If I were to use one of these models as my personal assistant, I really need to know what it does, what I tell it to do. It needs to actually demonstrate that it sent off that emails and doesn't just hallucinate that. Of course, I sent off the emails. You actually want to be able to check that it's reliable enough. But different domains have different kinds of reliability. I can totally see people using this to generate a large amount of text where we don't care too much about reliability.
00:19:51
Speaker
a lot of copy, a lot of bureaucratic reports, which are probably going to be read by another language model distilling it down to a shorter form, so the rest of bureaucrat will be certain that I have done the right thing for the grant I got. And it might be that now we're just using these models to send an enormous amount of verbiage that no human ever reads. But maybe if that saves our human nerves a nice, that might be a good idea.
00:20:20
Speaker
It is weird that these language models will output text that is completely false with the same confidence as they will output true information. So as a user, if we imagine a user using this as a kind of search engine to try to find true information,
00:20:39
Speaker
then you have to be a good judge of whether the AI is pulsating or whether it is regurgitating true facts that it has learned online. I can see that holding the models back because in a sense the models can't be trusted to tell the truth.
00:20:57
Speaker
Well, the reason is, of course, what are they trained on? They're trained on reams and reams of human text and discussion and conversation. And humans are not super reliable. Even the scientific literature is full of falsifications, errors, and misunderstandings. So the reason an experienced scientist is somewhat reliable at their job is that they have built an internal model that is fairly consistent and is quite often tested against reality.
00:21:26
Speaker
The problem with the text models is, of course, they don't really care and they don't currently have a method to test against this.
00:21:35
Speaker
So we need to add more things to make them really useful. So I would expect the current crop of text models are going to be just generating text where truth is not the most important

Future Enhancements: Reliability and Trustworthiness of AI

00:21:46
Speaker
part. The next generation that might be powerful is, of course, the ones that link to extra semantic information about the world, where they actually check the mathematics if they make equations and make sure that it's a correct calculation.
00:22:02
Speaker
Now, automated fact checking would be really powerful. Imagine if you had an assistant that constantly pointed out when you read something that whether it actually is at variance with known facts or not, that could be very powerful. Although a lot of people would absolutely not want to hear that because a lot of known facts they believe they know
00:22:22
Speaker
are not true. Indeed, it turns out that quite a lot of facts that even experts believe are not true. There is actually a disturbing uncertainty about many of the things we as a civilization believe. So it's by no means given that even these fact checkers are going to find the actual truth. Sometimes they might even act against it because, well, the official view on what the state of the world is might actually be incorrect. But done in the right way, I think this can be very, very powerful.
00:22:53
Speaker
The real problem is, of course, a lot of people are just going to use it naively or recognize that it's unreliable, but it's five to five o'clock and they want to get home and they need to write 10 pages of text. Let's have a language more generated. Let's skim over it and yeah, it looks good enough and let's send it off. So we might very well end up in a world with a lot of unreliable verbiage, which we already live in, of course, but now we get much more of it.
00:23:21
Speaker
The funniest part is, of course, that you also get a lot of weird lies from these models because they are trained on what humans output. And if you ask a human, are you a robot? Most people will argue, no, I'm not a robot, and will give various arguments. That is, of course, part of the training data. There is a lot of examples online of that. So, of course, the language models tend to respond unless you specifically train them not to respond like that by pretending to be human.
00:23:49
Speaker
This can sometimes be very amusing and sometimes downright creepy.
00:23:53
Speaker
We have a lot of these accidental lies because we have a system that's not a human trained on data generated by humans. And before long, we're going to have very different kinds of systems and entities that might actually be needed to train or to recognize what kind of entity they are, what claims they can and cannot make. This is complicated. And it's an interesting issue because it's not just programming, it's prompt engineering. And to some extent, even the sociology of understanding what is the system for,

Risks of AI Training on AI-Generated Data

00:24:24
Speaker
Imagine if these language models begin generating a lot of text because it's so easy for them to generate a lot of text at the request of a lot of humans. And then further imagine that this text is uploaded online and included in the training set of the next language model. Then you could see how the ball could get rolling on a very weird
00:24:45
Speaker
hallucinatory model where the AI is learning by data that previous AI models have created. And so maybe a new field is hunting for authentic human text.
00:25:00
Speaker
Oh, yeah. I think this might actually be quite important. There is already some work by the people developing image generation models that they should have watermarks so they can recognize that they should not be part of the training data. There are some national governments that are starting to say that, yeah, you need to start adding watermarks to what is being generated. And part of that is, of course, not polluting the well.
00:25:25
Speaker
Because one scenario I envisioned is a Spamocalypse. For example, today, if you want to get a recipe online, it's very likely that the top recipe you get will have a preamble about five screen pages about a lovely holiday in Sicily and those lovely lemons before you get to the lemon meringue recipe at the bottom.
00:25:45
Speaker
Why? Well, of course, it's being monetized. There is those impressions as you scroll past the ads at the side that actually make the site pay for itself. Now, writing that verbiage about a holiday that nobody should really read that much, that could be done by a language model.
00:26:04
Speaker
But also, recipes can't be copyrighted. They're not creative text. They're a list of ingredients. So actually, you can totally steal that from another site, but you will probably want to rewrite it a little bit so they don't harass you too much. And I can totally foresee that, of course, an unscrupulous web designer would say, yeah, I'm going to use language mode to make recipes and preamples. And then I get the ad clicks, and then I'm making money.
00:26:31
Speaker
Then, of course, the next day they are going to be doing this for something else than recipes. So the internet might get filled with this auto-generated context, pretending to be human, and also, of course, becoming part of training data, which might mean that now the language models are getting very confused about what ingredients actually are in apple pie.
00:26:53
Speaker
Because if you get random noise in there at some point, maybe carrots should be part of Apple Pie. The model doesn't know. It just includes it. So you could get an erosion of our shared epistemic system here if this is not handled well. So obviously, you want something that checks the fact against reality. And you might also want a flag that this was probably auto-generated.
00:27:16
Speaker
But we might also want to think about the whole systems we have about recipes or scientific papers. These epistemic systems are kind of fragile and the incentives for maintaining them well are not always well aligned. I don't know if you grade undergraduate essays at Oxford, but does this worry you that you will have essays handed in that have been drafted by a language model or perhaps
00:27:43
Speaker
you know, handed in directly from a language model. Do you think this is an actual threat or are they not advanced enough to fade an Oxford undergraduate yet?
00:27:53
Speaker
I think they're not good enough to fake an Oxford undergraduate, he said with some pride, but that is right now, and I fully expect that I'm going to see some of these essays before long. In fact, it would be a good thing if the philosophy faculty sprinkled in a few auto-generated essays just to check that the faculty is actually alert enough
00:28:16
Speaker
The problem, I think, with auto-generated philosophy essays is that they're based on what is in the big corpus, what everybody knows, or at least what somebody has argued. So it's not going to break new ground. Normally, you don't expect undergrad essays to do that, and that is the big problem. So once you get up on the master's level and PhD level, you might actually want to demand that this should be unique stuff that nobody has actually studied before.
00:28:45
Speaker
And that might be possible to even check for. But if you just have somebody explain how Epicurean ideas affected Machiavelli, yeah, I think the language mods are going to do a decent job on this. So at that point, the obvious solution is we need to do a proper philosophy tutorial where I sit across from a student and ask them, so can you explain what the Epicurus and Machiavelli ever had to do with each other?
00:29:11
Speaker
That might be the only way. And it might very well be that the student says, yeah, but shouldn't I be allowed to use a language model to partly make my arguments? After all, over in engineering, we're allowed to use pocket calculators.
00:29:25
Speaker
And I think there is some merit to it. It might be that the right kind of models might actually improve things. I would love to have an argument validity detector from my own text that reads the text and checks. Does this make formal sense? And actually warn me when I'm kind of using too much hand waving, that might be useful. That would be a tool just like a pocket calculator that we should want to have.
00:29:50
Speaker
But the earliest applications are, of course, going to be cheating. I have no doubt whatsoever that a lot of very bad scientific papers are being generated right at this moment and being sent to a lot of journals for review by people who really want to pad their CV. And we need to deal with that. We need to find better ways of actually checking that papers are useful, that they are true, that they have validity, that the experiments were actually done, and that's going to be tough going.
00:30:18
Speaker
This reminds me or this mirrors a debate about, you know, you have a, say, a third grade student learning how to add or subtract how to do basic math. And then the discussion is around, well, I will always have a calculator. And they're right. They will always have a calculator in their pocket. We will always, I'm guessing, have these language models available and they will only get better.
00:30:42
Speaker
Why should we artificially handicap our own thinking by not using them? And perhaps the answer there is that, well, we are training human ability. So we are training critical thinking or ability to do math mentally and so on. And if we outsource our learning to these tools, well, then we will never gain the human abilities.
00:31:07
Speaker
That could be one answer. Of course, the skeptic might say, yeah, but why do I want to have that ability? And my answer to the math student would be, it's actually very useful to check the sanity of your answer if you can do a quick math in your head. Quite often, that is an extra safeguard. That's very useful. To the philosophy student, you might say, yeah, but don't you actually love wisdom? Don't you actually want to figure out for yourself what is true?
00:31:36
Speaker
After all, you're not going to become rich by spouting off about Machiavelli and Bentham. You actually need to have some proper understanding. Because the language models are dealing with the surface level of a message.
00:31:51
Speaker
a very good model of what a text would be like. But that text is supposedly reflecting some underlying understandings and concepts. Now, it seems like the language modes are good enough that they almost imitate this deeper level well enough. And indeed, quite a lot of our languages we use every day is probably not smarter at all than the language model. We're just generating words that fit with this situation. I bump into a chair and I apologize to it.
00:32:21
Speaker
That's not because I'm thinking that the chair is an individual that I need to apologize to. It's just a reflex because, well, I bump into somebody on something and then the issue responds to apologize. But getting that deeper level, that semantic information, those meanings, that is tricky. And we might actually want to have systems that tie more closely into that, especially when you want to have truthful language models.
00:32:48
Speaker
GPT-3, you could actually improve a truthfulness by asking it to give a truthful answer.
00:32:53
Speaker
Because you need to point out that we're interested in the part of language that deals with true statements rather than the big part of language that's arbitrary. But you might also want to hardwire in things that it actually checks the numbers it's giving against known data in the world. That would be working on the deeper semantic level. Of course, we might also train on a lot of things and hope that maybe these models will grok the deeper semantics of the world, in which case we're actually approaching some better understanding of the real thing.
00:33:23
Speaker
But in many cases of a many deeper levels we might be having a conversation about one topic but actually the real reason we're doing the conversation might be a political game in our organization or something very different actually understanding what's really going on and have quite a lot of levels.
00:33:42
Speaker
Some of which are unsaid, some of which might even be unknown to the participants. They don't really know why they're engaging in some kind of social game. But there are reasons for that.

AI's Role in Enhancing Creativity

00:33:53
Speaker
We really want to have models that might be able to act on those levels. And the deeper they get, of course, the closer we get to real intelligence.
00:34:01
Speaker
But I think the language models are going to be a bit like the pocket calculator. Very good for making text look good. I make a sketch of what my argument is and then they fill out the details and make a nice rhetorical flourish. I'm proud to admit that I actually did this a few months ago with a paper. I couldn't come up with a good
00:34:19
Speaker
the end point of the paper. It was a simple one about plastic recycling, nothing fancy, but I couldn't come up with a good rhetorical flourish, so I had GPT-3, give me a few alternatives, I chose one and added that for the first iteration of the paper. It didn't contain any science, it was no research, it was just a nice send-off of this is why we think this is a useful method and it's going to help solve the waste problem, but expressed in a nice way.
00:34:47
Speaker
But you might want to have the deeper levels too. You want that fact checker, you want the math checker, you want to have maybe a system pointing out to you. By the way, this argument you're giving is very related to that book you read last week. Maybe a very solid correlation here. Do you think that there is anything deep to this notion of understanding or will it kind of dissipate when we find out that the future language models can
00:35:13
Speaker
can in fact simulate what we would deem to be a deep understanding of a topic. Do you think when you talk about whether a language model understands something, do you think we mean the same thing by understanding?
00:35:29
Speaker
I don't think normal human understanding and language model understanding are the same thing. They work in slightly different ways. So when I say that I understand, let's say, some equations in astrophysics, that typically means that I can use them. I can use them to calculate things. I can explain what happens if you change the equations a bit. I might know something about the range of parameters where they're valid, and so on. I might be able to tell you a funny story about how they were discovered.
00:35:59
Speaker
Now, this is not just making a simple prediction about the output. Language models are very good at making these predictions of output, but they generally don't have that causal connectedness. Generally, what current machine learning is weak at is understanding the causal effect of the, I do something, something happens.
00:36:21
Speaker
They're thinking all in terms of correlation. And this quite often works well. As they say, correlation does not imply causation, but that's something that we've added. But it certainly winks suggestively. If things are correlated with each other, they usually have something to do with each other.
00:36:40
Speaker
Quite a lot of stupid errors happen when you assume one causes the other. Quite often there is something hidden beneath or behind that that cause both to happen together. If you can discern that, then you have understood the situation and can do much more interesting things like changing that underlying factor in a useful way. This is something I think language models right now are unlikely to be good. But again, there is general sequence prediction tools.
00:37:06
Speaker
it might be very risky to say they can never ever do this. It might very well turn out that with the right kind of training or the right kind of applications, they can actually start finding correlation structures.
00:37:19
Speaker
I think Judea Pearl and many others would say, no way, we are not buying that and give very, very clever arguments for why this cannot happen. And it wouldn't surprise me that it might generally not lie, but there might be domains where it actually works well enough that you can get cheap causation understanding, and that's good enough for those domains.
00:37:40
Speaker
It might also be that there are other very important domains where this absolutely doesn't work and you get totally misled. We need to learn those different domains and the boundaries between them.
00:37:52
Speaker
Do you think that language models, abilities to generate lists of actions? So for example, give me a list of the things I need to do, the actions I need to take to get a cup of coffee. Well, okay, get the cup, boil the water, put the coffee in. Is that a very rudimentary understanding of causality?
00:38:14
Speaker
I don't think it's an understanding of causality. It's an understanding that humans explaining how to make coffee and tea make it in this order. Humans generally, I assume, have an understanding of causality when they're in the kitchen. And then they recount on how they did it. And then they don't explain how they concluded that you needed to pour in the water after bringing the cup.
00:38:37
Speaker
it's just always there so then you get the correlation structure that the system learns and now it looks like it has a causative understanding. And it's even general enough that if you say i don't have a teacup in my kitchen can i use a glass it will probably say yeah you can use a glass bring the glass or in the water.
00:38:58
Speaker
You do get something that is very similar, but it's a bit deceptive because it's actually not based on understanding the causative structure. Now, a lot of our language is full of assumptions about causality. So many that I think it looks down what the language model
00:39:13
Speaker
can say in such a way that it actually borrows our sense of causation. And I think this is generally what happens with the language models. There is so much human output that you can actually borrow all of this and use that reflected brilliance to look like you actually understand the things.
00:39:30
Speaker
you can steal human common sense and give it back to us so that it looks like you have common sense in air. Exactly. Yeah, yeah, yeah. How many of us also look very intelligent? Actually, we just read a lot of stuff and listen to podcasts. But actually, we in a conversation, we bring up some nice bon-mot and some cool facts. And it sounds like we are very, very wise. But actually, yeah, it's the rest of mankind channeled through us. I think that's a very common experience.
00:39:59
Speaker
We should talk, so you've hinted at this or talked about it, but we should think about how these models will evolve, how they'll get more grounded, how they'll gain semantic understanding, perhaps how they will become multimodal. So will they incorporate images or video and so on? What's future directions could these models go in?
00:40:23
Speaker
So the real issue is not so much what you want to add, but just how much can you add and do you have enough compute for it? So making a multimodal language model seems totally obvious. Just add a camera and a microphone and now you get a data stream, which in many ways is similar to just getting the text data stream.
00:40:46
Speaker
There are some issues about noise, about training, but basically we're talking about patterns of data and finding an architecture that's good at handling it. Probably you might say, yeah, I'm going to put in some model like OpenAI's Whisper to actually parse the language because we already worked rather hard on that. So instead of just having unfiltered sound, my multimodal image is going to look through the camera and hear what people are saying.
00:41:13
Speaker
although having the raw sound stream might also be useful because the chirping of birds and the bowing of dogs actually is also part of that world and then presumably it would do a good sequence prediction and be able to predict in a scene what's supposed to happen next.
00:41:31
Speaker
But the problem here is more like, OK, can you get enough training data? So right now, we have been using the idea that we can get enormous amount of training data, get it suitably labeled, and then produce a nice output. The problem is, of course, getting cheap label training data is tricky. I think we run out of that thing. People used to say that data is the new oil. And right now, I think we're approaching a kind of weird oil crisis.
00:41:59
Speaker
Having a really good curated repository of good training data is not trivial to get your hands on. Would this data have to be human labeled, or could we have AI labeled training data for AI with some quality control mixed in? Could we set up this training set so that it wouldn't pollute the model and degrade performance, but would instead perhaps expand performance, improve performance?
00:42:24
Speaker
I would bet that you could do interesting things here because you could start by having a model trying to label the data, a human responding, and the expensive and rare human interventions can then train an internal critic that helps retrain the actual labeling model.
00:42:42
Speaker
This is something that has been discussed a fair bit in the AI safety and the AI alignment world, because ideally you would want robots and AI systems to always do what humans want them to do or make sense. But humans are so slow and they're so expensive to ask humans for advice. So instead what you do is you try to train on a human giving a bit of scarce advice
00:43:07
Speaker
So you get an internal model pretending to be that human giving advice of the same style and then try to see can this trade well. And it seems to work surprisingly well. I don't know whether this could work in these applications, but it's certainly worth trying and I would be very surprised to hear that nobody is doing it right now.
00:43:26
Speaker
How have your opinions about AI safety changed because of how AIs have actually turned out? So perhaps in the 90s, we didn't have that many examples of actual AIs. And perhaps now we do. I don't know if that's an actual picture, but that's kind of my summary of the history here. Now we have some examples of simple AIs.
00:43:53
Speaker
Have you understand how have your opinions changed about how we can make these models safe? So back in the 90s, very few, if any, people were really thinking about AI safety. It's well worth noticing that people in the 1950s were claiming that, oh, within a generation, we're going to get human level AI and we're not concerned about safety at all.
00:44:18
Speaker
There's this very famous research proposal where they, this is in, I think, 1956, and they propose that we give a smaller group of researchers a summer and they will make significant progress in artificial intelligence research. That's the start of the field with a very overconfident prediction. But go ahead. Well, yeah. The Dorfman conference that this was about was also accurate. They did actually do a surprising amount of useful stuff over summer.
00:44:45
Speaker
because there was nothing before. When there is nothing, any improvement is infinite improvement. But they were very optimistic about getting powerful AI. And even some people like Alan Turing and I.J. Good were aware that, well, once you get self-improving AI, that's going to be very dramatic. But none of them really seriously talked about the safety issues.
00:45:10
Speaker
And part of it might be that they still had that distrust, because when you see how badly your actual system works,
00:45:17
Speaker
Well, you have a hard time believing that it could be powerful enough to matter. And it's a lot of cheap talk. In the 90s, I was part of a transhumanist movement. We were all very bullish about the imminent technological singularity. And this is going to be great. And we had members of the mailing list argue that we need to speed up the singularity. So we should really be working hard to make an artificial intelligence system that could act as a seed AGI to improve itself.
00:45:45
Speaker
And it's actually from those conversations we got a lot of early safety thinking because some of the people involved started realizing, wait a minute, if we're taking this absolutely seriously and believe that we could get this exponential growth of something becoming smarter and smarter, that means that we are essentially creating a goal and its values are going to be set by whatever we happen to program. And at this point, any programmer realizes, oh, I mess up all the time.
00:46:14
Speaker
Oh, if I mess up programming something godlike, that's scary. That led to a conversation about how do you actually make something that is safe? And from some of these discussions, you actually end up in the earliest part of the AI safety debate.
00:46:29
Speaker
But it's worth noting this was not part of the mainstream. This was among the weird of futurists. Meanwhile, the AI researchers in the 90s, they were mostly interested in making sure that the industrial robots don't squish people. That had been a concern since the 1970s, and it's a valid relevant concern. But AI being used for any super important jobs in the world, they knew the field well enough to think that this is not very likely to be a problem right now. We can wait.
00:46:58
Speaker
And I was probably very much in that world, even though I was one of those optimistic futurists, AI seemed fairly far away. I was much more optimistic about many other technologies,
00:47:10
Speaker
And again, it turns out that futurists are very bad at predicting which technology arrive early. So while life extension is going on nicely, but a bit too slowly for my taste, nanotechnology got misdirected by the sociology of research funding, while biotechnology is charging on, but so unpredictable we can't tell anything. New space is way ahead of what we would have expected, while AI for a long time didn't do anything, and then suddenly just jumped and became
00:47:38
Speaker
much more of an urgent issue. And that made me concerned about AI safety. I'm always been much more optimistic than many of my friends in this field, perhaps just because of personality rather than facts and good reasoning. But it seems to me that, yeah, we're getting systems that can amplify human performance tremendously. That's already enough to be a cause of serious concern.
00:48:02
Speaker
It also looks like we are bad at predicting when their performance can increase, and even what they're doing once they have higher performance. These are very good reasons to work very hard on designing them better, finding better ways of validating and testing them, and understand the consequence of a society.

Ethical Divisions and Human Value Alignment Challenges

00:48:21
Speaker
The problem right now seems to be that the field is a bit divided between AI ethicists that are looking at the standard ethical issues that you get from these systems.
00:48:30
Speaker
and are rightly concerned about bias in algorithms and how power accumulates, but they tend to totally dismiss concerns about the power of AI systems themselves and the transformative power. While the people working on the transformative AI side, they are to some extent dismissing the everyday ethical issues because they seem to be so small compared to the really explosive power of a powerful system.
00:48:56
Speaker
So unfortunately, we have ended up with two communities that don't talk to each other enough, and they don't get along enough to actually be helpful. Because it's pretty obvious that if you can make an AI system that can understand human social morals well enough not to be racist, then that is also a helpful thing for figuring out a system that can understand human social morals enough to be somewhat value aligned with humans.
00:49:21
Speaker
And that might not be an issue of just trying to change the training data. It might require some rather deep architectural discussions. It might also be that we need to think a lot about governance and long-term strategy, especially if AI is going to become very transformative. I think many of the current discussions about the AI effects are going to be blown out of the water simply because just too much happens. As we saw with AI art in 2022, that shocked
00:49:49
Speaker
really shocked an entire community and led to a big debate. And we're going to see this probably every year for the next 10, 20 years where different groups that never expected in their whole life that AI would matter to them, suddenly it erupts there and it becomes paramount. There is this core technical issue in AI safety of aligning an AI model to a set of values, any set of values, where the goal is just
00:50:17
Speaker
For any set of values, can we make it so that the AI pursues these values? For example, we could call it the values of humanity as opposed to some random values. But there's also the philosophical side of deciding which values we should have the AI pursue. And you quickly get into difficult discussions. If we're thinking about, now say we have a language model that we have trained and
00:50:47
Speaker
improved over time so that it does not output racist content. Perhaps the same techniques could be used to have a language model that does not criticize various governance, for example. And then you start to see how there could be worries about authoritarian governance using this to read the internet and pick out
00:51:12
Speaker
What I'm saying is that we have conflicting goals here, and some of them will collide with each other. And there are super difficult questions about which values to give to these AI. Yeah. We don't know what the true values we ought to have are, if any. It's an ongoing debate. I think we made some progress over the last 2,500 years. But yeah, it's a complicated issue.
00:51:42
Speaker
Similarly, what kind of rules do you want to build into the AIs? That is also a real important thing. It's a matter both of power and utility. For example, many of the image-generating systems don't make naughty pictures. And you can kind of understand that OpenAI don't want to generate that. Because they're a company, they don't want to mess around with the issues of political propaganda and pornography.
00:52:11
Speaker
So they carefully trained Dally to not to do that, which had other interesting side effects. Because if you remove naughty pictures from the training data, that means that you actually remove a large number of predominantly female people, which means that you get a gender bias, which they corrected for. So in this case, we have a bit of an understanding of a particular bias, but it probably also had other effects that we don't even notice because we don't even understand what's going on.
00:52:40
Speaker
And you can imagine, of course, making AI systems that don't help you do dangerous things, but what is dangerous? And this goes back to the more deep philosophical debate about, for example, freedom of expression.
00:52:53
Speaker
many outrageous ideas have turned out to be true. So it's actually a good thing to be able to express outrageous ideas. And it might very well be a good thing to have AI art systems that can make both sexual and political pictures. Indeed, if you go to a fancy art gallery or art museum, typically the artworks that the curators and the reviewers talk the most about are the ones that probably the AI program would not allow to be made.
00:53:22
Speaker
So it might very well be that we want diversity here. And that actually goes back to the interesting value alignment problem, because it seems like one very valuable approach to this is to recognize that we are uncertain about this, and the AI ought to be uncertain about the two. We might say we want you to make people happy.
00:53:40
Speaker
And then hopefully the AI system recognize that the humans don't quite know what they mean by happy and we and make us also is a bit uncertain and actually they might not like my own proposal because their literature is full.
00:53:55
Speaker
of dystopias that pretend to be utopias and stories about AI programs doing bad things when they're given good commands. I should be careful. So you actually want to learn many of these things about axiological uncertainty. What we are uncertain about the value is normative uncertainty. We're even uncertain about what morality to do. Uncertainty about the world, our place in it, etc.
00:54:19
Speaker
What does this uncertainty do? Well, it softens things. If you're absolutely certain as an AI program what you should do, you will just do it. And that seems to be very dangerous. AI programs that are good at handling uncertainty are probably not going to go haywire quite as strongly.
00:54:36
Speaker
Now, this still leaves us with another big AI safety problem. That's AI programs doing what humans tell them to do, but the humans happen to be malicious or misguided or stupid. And there is a lot of those people out there, and if their power gets amplified, we're going to see a lot of trouble. But at least we have some ideas about the valley alignment problem and uncertainty.
00:54:56
Speaker
they go well together. And one of the most interesting things about language models is that the text we see, that is just the thing that is being generated. It's the most likely words following each other a bit randomly. But there are many alternative ways the text could have gone. We might want to have tools to actually see the alternative versions of a text and be able to backtrack and say, actually, let's follow this other branch of thinking or this other branch of style.
00:55:23
Speaker
I think we actually can develop better tools for interacting with these systems. So it's not just a question about training data or having bigger computers, but also getting better training data and better tools for interactively working with them. Some of the best images I've seen generated by machine learning, AI art,
00:55:41
Speaker
or done by having them make a few pictures. And then a human paints over a section and says, over here, I want the house. Or this part of the uniform should be a bit more science fictional. And that face, make it sterner. And then you iterate. You have an interactive work. This is where I think chat GPT is such an interesting tool. Because it's not just generating a text one time. You're actually having a conversation. You nudge it in different directions.
00:56:08
Speaker
It still by no means perfect. It's very easy to get it into totally inconsistent state. But that interaction is also a good way of conveying. Now, this direction is the wrong path. We might want to go down another path. I agree that having AIs be uncertain about values is an interesting research direction. But I worry that it's simply easier to make an AI that has a more hard-coded or precisely specified utility function.
00:56:37
Speaker
It might be easy, but it's also likely to fail in annoying ways. So imagine that you're running your authoritarian government and you're programming your AI to follow your official ideology. Do you now get an AI that you are aligned with? Probably not, because most authoritarian rulers actually don't behave according to their own ideology.
00:57:01
Speaker
This comes back again and again to bite us. Indeed, NSA probably felt that Snowden was a very trustworthy individual because he was going on about the Constitution all the time, and we are so aligned with the Constitution and its values, which turned out that Snowden disagreed with. Similarly, that AI with the black and white utility function
00:57:21
Speaker
might very well say, my authoritarian government is not obeying its sinister ideology well enough. I need to make it do that, which might be a bad news for the leadership of that government. So a well-defined utility function quite often backfires in interesting ways. I imagine that the surviving authoritarian governments would also say, yeah, we want a bit more uncertainty. It should
00:57:45
Speaker
recognize that there's some nuances and subtleties in our sinister agenda, so it shouldn't just be treating it black and white. That's one direction to go. True, true. Anders, thank you for talking with us. Thank you so much. That's it for this episode. On the next episode, I talk with Anders about his upcoming book, Grand Futures, which is about what humanity could do at the very limits of physics.