Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Joe Carlsmith on How We Change Our Minds About AI Risk image

Joe Carlsmith on How We Change Our Minds About AI Risk

Future of Life Institute Podcast
Avatar
156 Plays1 year ago
Joe Carlsmith joins the podcast to discuss how we change our minds about AI risk, gut feelings versus abstract models, and what to do if transformative AI is coming soon. You can read more about Joe's work at https://joecarlsmith.com. Timestamps: 00:00 Predictable updating on AI risk 07:27 Abstract models versus gut feelings 22:06 How Joe began believing in AI risk 29:06 Is AI risk falsifiable? 35:39 Types of skepticisms about AI risk 44:51 Are we fundamentally confused? 53:35 Becoming alienated from ourselves? 1:00:12 What will change people's minds? 1:12:34 Outline of different futures 1:20:43 Humanity losing touch with reality 1:27:14 Can we understand AI sentience? 1:36:31 Distinguishing real from fake sentience 1:39:54 AI doomer epistemology 1:45:23 AI benchmarks versus real-world AI 1:53:00 AI improving AI research and development 2:01:08 What if transformative AI comes soon? 2:07:21 AI safety if transformative AI comes soon 2:16:52 AI systems interpreting other AI systems 2:19:38 Philosophy and transformative AI Social Media Links: ➡️ WEBSITE: https://futureoflife.org ➡️ TWITTER: https://twitter.com/FLIxrisk ➡️ INSTAGRAM: https://www.instagram.com/futureoflifeinstitute/ ➡️ META: https://www.facebook.com/futureoflifeinstitute ➡️ LINKEDIN: https://www.linkedin.com/company/future-of-life-institute/
Recommended
Transcript

Introduction to Podcast and Guest

00:00:00
Speaker
Welcome to the Future of Life Institute podcast. My name is Gus Dauker and I'm here with Joe Karlsmith. Joe, do you want to tell our listeners about yourself? Sure. So I work as a senior research analyst at Open Philanthropy, which is a foundation that makes grants in a bunch of different areas. But in particular, I focus on existential risks from artificial intelligence. And then I also write about philosophy and futurism and other things.
00:00:26
Speaker
And you also hold a PhD in philosophy from Oxford University, if I'm not mistaken.

Blog and Philosophy Discussion

00:00:31
Speaker
Yeah, that's right. I think that's worth mentioning for us. Fair enough. And yeah, for listeners who haven't read your blog, I highly recommend it. Calling it a blog is almost an understatement. It is very in-depth essays on a variety of topics where the essays actually make intellectual progress and get you kind of further in your understanding. So yeah, highly recommended.
00:00:53
Speaker
Well, thanks for the kind words.

Updating Beliefs on AI Risk

00:00:56
Speaker
Okay, great. So your latest one of these essays is about how we predictably updates towards kind of higher belief in AI risk or taking AI risk more seriously. So what does it mean, this concept of predictably updating? I'll caveat, I don't think we sort of necessarily do this, but I think it's a pattern that we see. And in particular, the pattern, you know, I noticed in myself, and I think I've seen it around the world a little bit,
00:01:22
Speaker
a new kind of frontier AI system comes out and is available and people play with it and they're very impressed. And as a result, they're worried about AI risk. And in particular, I'm interested here in kind of existential risk from AI and from misaligned AI. So kind of AI systems that are very powerful and which are also in some sense, agentic, getting out of control, pursuing goals that are in conflict with human interests and kind of destroying humanity's control over our own destiny.
00:01:52
Speaker
in the process. And so if you were surprised by the level of progress that's been made in AI, then I think this isn't actually a sort of problematic pattern to kind of get more worried as you see the systems kind of progress, if the progress is surprising. But if the progress wasn't surprising, if it was sort of the progress that you expected, then I think there's a kind of interesting
00:02:14
Speaker
dynamic in which you at least on a kind of basic Bayesian conception of how you're thinking about belief, you shouldn't end up changing your worry level dramatically as a result of seeing evidence that you in some sense predicted that you would see. And the basic reason for that is that if you were able to predict that you were going to see that evidence later, then that allows you at least with that kind of to the degree that you predict to predict that it'll happen, it allows you to kind of take it into account ahead of time. And so in some sense, you should have
00:02:44
Speaker
you sort of factored in whatever you would think in the future into what you think in the present. And so I think there's an interesting dynamic and I think it occurs for a variety of regions in particular. I'm interested in kind of the difference between seeing something up close and far away and sort of processing something at an intellectual level versus at a visceral level, which I think is sort of doing a lot of the work when you actually play with these systems.
00:03:02
Speaker
And I'm sort of hoping that we can avoid this in the future. I think we're sort of in a position to predict right now that these systems are going to get kind of a lot better and we're going to see it in the future. And I think so we should kind of strive to be unsurprised and also strive to kind of incorporate whatever level of worry we'll have in the future into our level of worry now.

Human Biases in AI Risk Perception

00:03:22
Speaker
So this kind of goal of striving to be unsurprised is perhaps
00:03:27
Speaker
a part of Bayesian thinking. So what you're talking about here, is this just an instance in which we as humans diverge from the optimal Bayesian way of updating our beliefs? Because then it would be unsurprising in a sense that we're doing this. We do this in all kinds of areas. So is there anything special about the way we update our beliefs about AI risk?
00:03:49
Speaker
I think, yeah, I think there are a variety of ways humans diverge from ideal Bayesianism. And in fact, so many that I think we should be cautious in assuming too quickly we know the right way to apply sort of abstract Bayesian norms to our lived actual kind of messy human epistemic life. I do think that this is a specific sort of failure mode, but it's one that I see as actually relatively
00:04:14
Speaker
kind of common to our epistemic relationship to kind of very different future states, sort of future states that we have not in some sense encountered or seen as kind of normal or processed with, that we're mostly processing with a sort of limited part of our mind and in particular sort of abstract modeling. And so, you know, it can be the case that you can be abstract, you're abstractly modeling something in a manner that is in some sense correct, but nevertheless, it hasn't kind of
00:04:42
Speaker
into your whole system you don't have your gut doesn't kind of believe it there's some basic way in which it's it's being treated as sort of like a game it's it's a sort of discourse or a conversation there are kind of concepts that you're able to move around but there's a whole other aspect of your epistemic life that hasn't been engaged by it.
00:04:57
Speaker
And I think that's something that often happens with things that are distant, things that are very different, things that are weird, and a bunch of other things. And so I think AI hits on a lot of that. Perhaps some philosophical ideas like utilitarianism or infinity that you also talk about, those ideas might fall in the same category as AI, as basically difficult for us to process.

Intellectual Models vs. Gut Feelings

00:05:27
Speaker
When we have this worry, how has this played out for you? Is this pattern of predictable updating? Has this happened for you on AI risk? I think it has in various ways. I've become more worried about
00:05:43
Speaker
Uh, missed a lot of existential risk from AI, but it's not just that that particular level where I think I've noticed a bunch of ways in which I'm kind of inhabiting now, you know, this past six months, maybe in particular, as we've seen these sort of these new systems coming out and a bunch of new stuff happening. I feel like I'm inhabiting a world that, you know, five years ago, you know, I first heard about AI risk and actually I guess 2013. And so I was.
00:06:04
Speaker
I've been thinking about this stuff for a while and then even in the past, the more recent history I've been following relatively closely progress in AI and in deep learning and following what's happening with large language models and scaling laws and stuff like that. At an abstract level, I think I was in a position to predict that we were going to be in a situation pretty much like this. In fact, there's a lot of parts of my abstract model of the AI world that I feel like are just
00:06:34
Speaker
Becoming concrete, so an example is sort of, I remember making with a friend, this colleague, a sort of model of AI timelines, and we used this concept of wake up, which was this sort of period where the world sort of suddenly starts to go like, oh my God, AI, and really kind of see what's happening. And this was something we were sort of expecting to happen. We had a little model, and I sort of think that if back then we had been able to look ahead to kind of chat GPT and kind of what the world's reaction to it since it came out,
00:07:04
Speaker
I think we would, we'd sort of be like, oh yeah, that, that's what we're talking about. You know, so there's a weird sense, but you know, when you make these models, you're sort of like, ah, it's this janky model. It's like, is this even, you know, tracking anything? And then, you know, a thing happens like, oh, whoa, you know, it's sort of really it now, obviously, you know, it could be there's bigger wakeups in the future. So who knows if this is like the perfect candidate. So that was one, that's one example of sort of something that was abstract and that in some sense was guiding my behavior was kind of becoming concrete.
00:07:28
Speaker
So when you're making the abstract model of AI progress, this is your best attempt. This is where you're at your best intellectually. You're using the most evidence, you're thinking in the deepest possible ways. And then sometimes later when you're using chat GPT, you're kind of emotionally surprised at how good it is. You're seeing debates in governments, you're seeing
00:07:52
Speaker
front pages on magazines and so on. And then the world waking up is as you expected it to be. But in a sense,
00:08:01
Speaker
the emotional response to the thing actually happening shouldn't, that's not where you're doing your best thinking. So you should kind of have, you should have trusted yourself more in a sense, or at least trusted your intellectual faculties more. It's something like that. There is a way in which this has been an update for me towards if you have a sort of argument or model that appears, that like makes sense and you don't really see
00:08:29
Speaker
kind of major problems with it at an intellectual level, but it sort of somehow feels a bit unreal. You shouldn't trust too much that feeling of unreality. And this sort of been a shift for me. I think in the past, I was very interested in the signal that kind of my gut
00:08:44
Speaker
my gut's degree of reality was sending me about the epistemic status of various ideas and concepts and stuff like that, partly because it can be really hard to track all this stuff. And also, I think if your gut buys something that is a real plus in the sense that your gut has its own
00:09:06
Speaker
connections with a bunch of the evidence that you're bringing in. In fact, I think often the intellectual part of our lives is centrally overlapping with tribal and other forms of social processing that don't
00:09:22
Speaker
Actually, I think it may be that at a psychological level, that apparatus is not our most world-oriented. It's a little more like, what is my identity? What is my affiliation? Who am I signaling alliances with or whatever? Then there's a different part where your body goes like, okay, but that's the real thing. That's the tiger. That's the boulder that's going to kill me. That's like, am I going to eat? I think there's actually a prior
00:09:47
Speaker
expectation you could have, which is that your gut is really the thing that evolution has made for tracking the real tigers. And so if your gut doesn't believe in something, that might be because you're just kind of faffing around with your tribe. So that was a sort of idea I'd been very interested in. But finding that my sort of abstract models with my gut had been, I think, more skeptical of.
00:10:13
Speaker
starting to track the world, starting to really just kind of instantiate them. And then my gut going like, oh, whoa, it's actually real, has made me go like gut, like we could have got that before, you know, or something like that.

AI Capability and Danger

00:10:24
Speaker
And so, and so I have, I have sort of shifted a little bit in how much I kind of trust, trust the kind of reality of feeling for various kind of futurism flavored forms of forecasting.
00:10:34
Speaker
Perhaps one skeptical point here is just to talk about the difference between AI being capable and AI being dangerous. When we sit down and use a chatbot, for example, and we are impressed by it, or we see AI progress and we feel it in our guts, why would that lead us to believe that AI risk is higher, that AI is more dangerous? Wouldn't we have to see some concrete evidence of dangerousness before we believe that?
00:11:02
Speaker
in and of themselves increases in capability. They're not a necessary signal of danger. I do think other things equal they are in the sense that more capability is more capability to do dangerous things. And also, I think,
00:11:18
Speaker
the scary thing about AIs is just how capable they will be. Such that they'll be so capable that if they were aiming at something we didn't want them to aim at, or in some sense operating in a way that we didn't want and that was contrary to our interest, we might not be able to stop them if they were kind of suitably self-protective and kind of power-seeking and stuff like that. So I think that capability is really crucially connected with the concern. Now it's in principle possible that as you see evidence of kind of AIs being capable, you are also seeing evidence for tons of progress
00:11:48
Speaker
in safety and understanding and interpretability and oh we really can control these systems and we really understand how it's going and maybe you're seeing
00:11:55
Speaker
arguments, the sort of old arguments for concern being kind of dismantled by the kind of humanities kind of epistemic apparatus. If you were seeing all that at the same time, then I think you can end up comforted on net. I don't think that's what we're seeing anyway. Yeah, but we might be seeing the release of more and more capable models without
00:12:19
Speaker
disaster without accidents. I'm not saying this is the case, but I'm saying this could be the case. And so if that was a pattern that we were seeing repeatedly, then perhaps we would begin thinking about whether capabilities can increase without dangerousness increasing or risk increasing also. I'm not making that update in that the story was never that these models would destroy the world. It was always that there's a sort of threshold
00:12:47
Speaker
level where, now I think there is a period before that where you might expect and hope to start to see kind of warning signs and depending on how fast things are going and how abrupt different transitions are, you know, you get that luxury to greater and lesser degrees. In fact, I think in general people are sort of too, there's a kind of weird level of anchoring in people's kind of assessment of the risk to the present. And that's part of what I'm trying to push back on in the post is it's really not about like,
00:13:13
Speaker
Can the system is there still some stuff that the systems can't do or are the super are the system super intelligent right now? Or are they trying to kill you right now? It's about where are we going? Like what what is you know? It's sort of you don't want to just look at the present You want to be looking to the possible futures and kind of wondering wondering what's going to happen next and updating accordingly Yeah, I don't I don't think like whether the the systems have have sort of
00:13:35
Speaker
been dangerous so far is all that much of a signal. Now that said, I do also think we're getting some amount of disturbing evidence about basic features of the alignment discourse seeming to hold. So, you know, I think, for example, there was this brief, I think, kind of quite striking period where when, you know, Microsoft deployed this chatbot Bing and deployed it kind of prematurely,
00:13:57
Speaker
Partly perhaps out of out of a desire to kind of get an upper hand and in a kind of in in some sort of competitive dynamic That itself is worrying premature deployment because you're kind of competing with someone else's is another thing that was sort of you know In the abstract model like don't you know and now in here it's sort of coming out Microsoft's like race begins today It's like no no no it but then you know so what happens is
00:14:20
Speaker
you have this this chatbot now and it was doing this it might mean this crazy stuff right it was like blackmailing people it was threatening people it was like reading people's tweets and being like how could you say that about me it was it was like trying to propose to the to a new york time support i don't know there's a whole thing um it was just being very weird it would like repeat itself all sorts of stuff you know
00:14:39
Speaker
And so I don't actually think that like blackmail and the line was of the specific type that the misalignment discourse is concerned about. I don't think it was like Bing was like pursuing a goal and like lying

Evidence and Predictions in AI Risk

00:14:51
Speaker
with that. And I think it was more of a play acting thing. But I think the thing we nevertheless saw was that these are sort of alien.
00:14:57
Speaker
Minds like, you know, I think I think this thing was rampage around the internet. I felt like the whole world is looking at going like, what is this thing? And I think, you know, the basic lesson I think is pretty important here, which is that, you know, kind of by default, when you just churn through a bunch of gradient descent, you get this kind of crazy creature that we don't understand. And then, you know, maybe you sort of
00:15:18
Speaker
shape it according to RLHF and there's a question of how far that goes and in what context. But the default thing you get appeared in that context to be this crazy but quite capable alien mind. I don't think we're getting zero signs of danger as well, in addition to just pure capabilities increase.
00:15:37
Speaker
Yeah, true. How do we then update in the correct way? Do we simply trust our abstract models more? Do we trust our intellectual view of the world more? What is the right way to do it? Perhaps we could talk about this concept of just updating all the way. This is something that people sometimes will encourage
00:16:01
Speaker
me to do, for example, just update all the way towards having a view of AI progress being very fast and AI risk being very high. How do we do this in the correct way? I don't think there's any kind of royal road. Epistemology in general always has, there's sort of always different, I don't know, in everything, there's always different ways to fail on different sides of the horse and
00:16:24
Speaker
It sort of depends on the person in the context and what they're doing. The main thing that I want to push is that if you are currently at very low odds on AI risk overall, then I think I want to urge attention to your predictions about how you will feel in the future.
00:16:45
Speaker
conditional on various forms of AI progress. And I actually think this, these sorts of updates don't even need to be kind of predictable in the sense of more likely than not. I think if you're, if you're at a sufficiently low probability on AI risk, then they're actually just like quite binding and kind of hard Bayesian constraints on what can happen to your credences later. So you can, you know, specifically your, it can never be more than one over X probable that your,
00:17:10
Speaker
your credence will increase by a factor of x. So if you're at 1%, it can't be more than 1 in 10 that your credence ever goes to 10%. So if you're at like 1 in a million, then it can't be more than 1 in 1,000 that you're ever at 1 in 1,000. And so this doesn't even need to be predictable. You could just think, when you see GPT7 or something, and you're wondering,
00:17:37
Speaker
Well, suppose GPT-7 can solve millennium prize problems or something. So really advanced mathematics. And you're skeptical. You can be skeptical. You don't even need to think, oh, these models are going to be great in the future. But OK, how probable is it? Are you ready to bet at like 1 in 10,000? If it's 1 in 1,000 and that would get you to 1%, then like, eh. So you're already getting constraints in terms of what you should be believing now, at least on a sort of basic vision.
00:18:05
Speaker
And so, you know, the thing I most want to urge is sort of just like attention to these dynamics, attention to the ways in which what you expect to believe in the future should kind of be constraining what you believe now. And then there's a sort of additional thing about why it might be the case that you aren't doing that, which I think often has to do with this sort of visceralness versus kind of abstract thing. And I don't have like a great way to kind of overcome that except to think,
00:18:32
Speaker
Yeah, to really, I think like basically you should, you know, really try to imagine, okay, this thing really happens. I really see, I really see the, you know, for me, it's sort of a big part of this is, you know, when I really imagine a super intelligent machine, a machine that is like, I'm looking in the eye and that it was just sort of.
00:18:51
Speaker
dominant over all of the smartest humans over groups of humans in science and strategy. It's thinking extremely fast. I think there's a basic way in which I expect when I actually look that machine in the eye, I'm going to be scared. I'm going to be like, whoa, this thing is
00:19:10
Speaker
us kind of formidable and serious kind of force in the world. What is it going to do? What if it did something else? And so just kind of really trying to make that concrete ahead of time so that when you actually show up and you feel it in your bones, you had managed to kind of propagate that information back into the past when you would have wanted to kind of act on it. Do you actually sit down and do this visualization exercise and try to think about what you would feel? Or is this something you actually do?
00:19:39
Speaker
Um, I think I do it in kind of various informal ways. I mean, I also think there are just like other, other kinds of practices in some sense. This is just the whole game of like, how do you do good epistemology and forecasting? But I have tried to kind of get intentionally concrete and maybe in some sense, unrealistically concrete in order to.
00:19:58
Speaker
have sort of my gut or my kind of visceral epistemology start participating more directly. So I think sometimes for example people are hesitant to describe a concrete future with AI because it's true that sort of the more specific you go the less likely and sort of so any like sufficiently concrete future you describe what would be sort of very unlikely to be the specific one.
00:20:19
Speaker
And so people can kind of be hesitant about writing down kind of vignettes or kind of trying to work out the details. I actually think there are benefits to doing that regardless because it sort of can bring a more real world kind of sense of there are real observations that could occur. You could be really seeing this or this or this. And I guess another version of that is you can kind of reflect on
00:20:42
Speaker
You know, what is my current state like relative to the past? Um, like I, I opened yesterday at this quote, uh, you know, this, this present moment is, is, was once the unimaginable future. And I think, you know, I really remember there was such a feeling of like unreality to me back, you know, even I guess 10 years ago when I first, first started getting into this about the idea of AI kind of period, like really good AI, like AI that you could kind of talk to, like even the level of AI we have right now, I think was somehow for me in the past, this weird blank.
00:21:11
Speaker
And it was you know it was just sort of like, I don't even know what that would be it's like a brain in a box it's like a whole brain emulation somehow it was like, I barely knew what I was talking about, but now here I am and it's like that you know there's a real thing and it emerged from a real specific kind of.
00:21:26
Speaker
training process and there's a real specific set of capabilities and kind of compute requirements. And here I am, I'm living in my specific house. I mean, in the same way, like all aspects of your future, they're weirdly specific. You know, you imagine someday I will live in a house or someday I will like have a partner or someday I will. And then it's like, there's like one specific
00:21:44
Speaker
person, you know, that it'll be like that with AI too, right? It's like, if you one day meet a super intelligence, it will be like actually super intelligent and it will be a bunch of other concrete ways and you'll be, you know, dealing with a specific computer and there'll be a specific like set of other technologies and people and, and, and sort of just sort of kind of getting that dynamic in your bones, I think might be helpful in general to relating to like a possibly abstract future.
00:22:06
Speaker
Yeah, perhaps take us back to 2013 where you first heard about this concept and how did you react and why do you think you reacted that way?

Cultural Influence on AI Perceptions

00:22:14
Speaker
And perhaps this is useful because some people might have those reactions today and might be going through the things that you've gone through now. Yeah, so I first heard about this in 2013. I was at a kind of picnic-like thing in the UK. I just started a master's degree.
00:22:34
Speaker
Yeah, so I was talking, I ended up talking with someone who was working at the Future of Humanity Institute and we were talking about like big problems in the world and he mentioned AI risk and I just laughed and I was just like, ah, you know, and I was like, is that like the movie iRobot? No one ever does iRobot. Everyone always does Terminator.
00:22:52
Speaker
I, I robot is the same, right? They're like, they, some, they told, I guess it's the asthma thing. I forgot, I forgot exactly the plot, but it's like somewhat spoiler alert. Sorry people. But yeah, I think it's like they told some, it was the three laws of robotics. And then in order to, to not harm the humans, then robots needed to kill a bunch of humans and take control. Can't help you here. I've never seen the movie.
00:23:13
Speaker
Okay, well, anyway, it's not just Terminator. People have had this idea before. I think this may even be in Colossus, the Forbin Project. I think there's a decent number of people have this intuition that you give the AI the goal of something that's supposed to be good, but oh, humans are so frail and bad that the AI must take control from the humans. So anyway, so I said this, I was like, oh, like iRobot. And I was sort of like, ha, ha, ha. And I remember he was just sort of like stone face.
00:23:39
Speaker
I'm sort of like, okay. Anyway, so at the time I just laughed and I thought, I basically just thought it was weird. And then I went on to learn more about it. I went on, you know, the book, Super Intelligence came out, I think relatively soon after.
00:23:54
Speaker
I read that and that kind of changed my view in a bunch of ways and started talking with people. And at that point I started to feel like, oh, there's a real argument here. And in some sense, the argument makes sense. I didn't have some sort of knockdown to the argument. And it didn't feel to me like the world did either.
00:24:12
Speaker
I remember actually going to a person I met again who was also in the space and I said something like, you know, like surely there are these like counter arguments though, right? Like there's got to be like these like things, you know, I've read this, but like where are the people who are like saying why this is obviously wrong? And, you know, and he was just like, I don't know man, like what if it's just right? And I was like, ah,
00:24:32
Speaker
Anyway, so there's a long process of me getting more acquainted with this, having thought it through myself. But my initial reaction was quite dismissive. And I think that's understandable. It's a very weird idea. It's an idea you first encountered in a context that is fictional and silly. It's totally out of our experience of other things.
00:24:59
Speaker
It's very kind of disanalogous to anything we've seen already. If you want to talk about, oh, an engineered pandemic, it's like, well, we have pandemics. You want to talk about nuclear war. Well, we have Hiroshima. We have the bombs. We have this whole history of this is a legit thing. We have think tanks. We have all this stuff. You want to talk about super intelligent machines, killing everyone. It's like, culturally, what are the reference points? What is the epistemology we have around that?
00:25:23
Speaker
these movies, or at least it was back then. And even now, it's not as though we have, it's not especially kind of built up and rich. So I think that was a decent amount of what was going on. Yeah, and so we go to science fiction, we go to fiction in general to have some kind of anchor points, some kind of reference point to understand how we how we should
00:25:39
Speaker
frame this issue of AI risk, perhaps when we hear about it for the first time. So you mentioned bio risk and risk from nuclear weapons. And there it does feel visceral. No one is laughing if we talk about, oh, the world might
00:25:55
Speaker
go into a nuclear war and it's very easy to understand how this would be extremely destructive. Do you think that nuclear weapons felt the same way that AI feels now in, say, 1850 or 1900 or something? Is it simply a question of the timeline?
00:26:15
Speaker
Well, I mean, they wouldn't have known about nukes sufficiently early, but I think that... Yeah, exactly. That's perhaps my point here, that it would have involved speculative science perhaps in a bit in the way that AI risk does now. I actually think, despite the fact that nukes are more concrete,
00:26:33
Speaker
I still actually think this dynamic of there being an important difference between this role, the abstract modeling and the visceral relationship to it, I actually expect that to apply in the context of nukes and bio too, just to a lesser extent. In particular, I think there's a certain kind of imaginative barrier that they don't create, which is you really know how you're dying. You really know the specific scenario that is occurring.
00:26:58
Speaker
Now, how deeply do you know it? I actually think at one point when I was thinking about nukes, I went and watched, there are these movies and I think it was like the 80s where they just like depicted a nuclear exchange. I think there is like threads and then like the day after or something like that. I think it was a British one and a US one. Very dark movies. They're very dark. And you would have thought, and I think that's actually an interesting example of the like the gut versus kind of abstract
00:27:26
Speaker
kind of dichotomy in that, you know, the people who were watching those movies were, I think, you know, they had been living in the Cold War. So my understanding is these movies were reasonably influential on the popular consciousness and sort of, I forget, maybe Reagan watched them. I think there was something about this anyway. But people knew, people knew that there was, you know, they were living under the shadow of the Cold War. I think there, you know, there are stories about people that ran like not
00:27:49
Speaker
taking their pensions. There were all these intellectuals after World War II who thought the world was definitely going to end. And that is a separate bucket about these people thought the world was going to end too hard. And maybe that's where we're at with AI. So there was some sort of intellectual sense of the world will end. And people knew about how it would end. But nevertheless, when they watched those movies, and also for me,
00:28:13
Speaker
The movie still shakes you at a level. It's a pretty generic depiction of what will happen. I don't think there's anything like you're like, oh, that's really surprising necessarily. But nevertheless, it's quite, it gets you. And I think so that...
00:28:28
Speaker
And I think the same is true of pandemics. I think the same is true. It's just really hard still to kind of make the transition. You do just learn, parts of your epistemic system still learn a bunch from engaging with something at a concrete level. So yeah, I don't think nukes and bio are that different, but they do have that concreteness in terms of what's the mechanism.

Rapid AI Capability Concerns

00:28:46
Speaker
They do have the like, we've seen, we know that this is like a real thing. We can see the nukes, we can see labs, we can see, we have pandemics.
00:28:56
Speaker
So there's just sort of lower barriers, I think, to kind of moving this from the realm of like, haha, to the realm of like, serious thing that serious people worry about. There's something that worries me a bit about the talk about AI risk, which is that
00:29:11
Speaker
At least in one framing things will begin things will accelerate in an extremely quick way and we won't see a gradual increase in accidents such that for example we have an accident that kills ten people and then two years later we have one that kills a hundred people and then it it ramps up in a gradual way like that it's probably more like.
00:29:33
Speaker
AI reaches a certain threshold of capability. And from there on, we could see a global level disaster. Does this make it kind of difficult to update along the way, difficult to adjust our beliefs along the way, difficult to find evidence for whether we're right or wrong, perhaps difficult to debunk the idea of AI risk? I think it does. Yeah, I think we should distinguish the question, is that the right story about how AI will play out? So I think there are stories where you get
00:30:00
Speaker
some amount of warning along the way. I'm sort of skeptical that that warning will take place at the level of a gradually increasing number of deaths, particularly. I think there's important ways in which once people are really dying at a high level, you're in a really scary scenario. And there's a kind of razor's edge to thread the needle in terms of AIs that are able to kill a million people, but not
00:30:24
Speaker
Not the whole world, it's sort of you really had to have this like kind of capability level and the AIs needed to mess up to I think is a general thing is like the AIs if they lost if they lost a conflict, then they kind of miscalculated whether whether to go for it and you might
00:30:42
Speaker
worry about attributing such strategic errors to things that you're positing are more strategically sophisticated than yourself. But it could happen. So there are different degrees of continuity and gradualness in terms of the warning signs you're seeing and how much time you have to react. I do think there are stories, it's very sudden, and you see very few view warning signs. It's like you're going along
00:31:06
Speaker
And it's a bit like there's just a point where either you're gonna die or you're not and you're just you're kind of
00:31:13
Speaker
Not sure, and you're not getting any evidence as you go up to it. And I think this is hard because it's a story that just doesn't allow you to update very much. If it's truly supposed to be a case where you're not seeing warning signs, everything is trucking along and looking great, and then you die, or the good scenario and the bad scenario look the same up until the last second, then that's just a bad situation. I think we should be pretty surprised if that's true.
00:31:38
Speaker
Just because it's weird for the evidence to not leak at all into the world as to what's going on. I think a more mainline scenario in which we're dying, it's obvious to people we're going to die not just in virtue of
00:31:55
Speaker
Like abstract arguments, but it's just like clear that we don't understand how these systems are working like we've seen signs that they'll do bad stuff like there are people. It's not just like a laser is persisting in like thinking that that it's bad despite like no evidence. It's like, no, it's sort of like, oh my God. But nevertheless, the world is kind of barreling forward. They're like actors who don't buy it. There are kind of bad competitive dynamics.
00:32:14
Speaker
So I actually think the mainline scenarios, it's more clear empirically that that kind of doom is ahead. But there is this slice that's just like quite unfortunate epistemically where you don't get very much warning. Perhaps in a sense, is that the scenario we're in right now? So we have...
00:32:34
Speaker
a race dynamic between different top companies in this space. We have distinguished experts warning about the risks and signing on to statements about the risk. We see increased capabilities and we see how systems can be misaligned. To what extent do you think we are on the path that you just described?
00:32:58
Speaker
Maybe somewhat, but I'm thinking worse. I'm thinking we're getting evidence about threatening behavior that the models will engage in. We're getting evidence about the failure of various forms of possible alignment. We are seeing breakdowns of cooperation. We're seeing failures to regulate wisely. Basically, you can just imagine a bunch of other sources of hope. And this is the sort of thing I think is actually really important to do in assessing
00:33:27
Speaker
kind of what my probabilities should be to really think about what is the future evidence I'm expecting to see that like this theory of why would be fine or why we would die predicts. And I think there are just a bunch of remaining stories about why this like could go okay that make empirical predictions about what we should see happen.
00:33:44
Speaker
Sometimes it's like, oh, we'll make a bunch of progress. Oh, it'll be like, there'll be a bunch of fiddling. Oh, the people will not want to deploy. Oh, people will slow down. There's different stories like that. And when those things persistently fail to happen, then you should be getting more and more worried. Or if that's where your hope is coming from, you should expect to see that stuff. And then if you don't, you should change your view. And so I think there's a bunch of that left where things could kind of end up looking more rosy.
00:34:10
Speaker
seeing how these systems are trained, seeing what sorts of systems, seeing what behaviors they're doing, like how much progress is being made in various kind of alignment relevant fields, or not. And the not ones look a lot scarier, I think, than where we are now.
00:34:23
Speaker
You talk about the unreality of the future. Why does the future feel unreal to us, do you think? I think just tons of stuff feels unreal to us. I hear I am in a house, there are people next door to me in their house, and I just don't think about them at all. They're sitting there fully real, fully concrete.
00:34:45
Speaker
With the you know detailed textured inner lives and struggles and memories and they have childhoods and I just think like It's the default you have a bounded mind Your mind can only model like a tiny amount of the world the world is this vast space. So I just think
00:35:01
Speaker
the past stuff, your friends, people you've never met, just everything is unreal to us. And the future is maybe slightly worse because we can't go there. You haven't seen it directly. It's more different. It's maybe, depending on your discount rights, it's less relevant to your action now. But I actually think it's not a uniquely future problem. I think it's a bounded mind problem.
00:35:27
Speaker
and one that like a huge portion of like ethical and epistemic life is about kind of overcoming is sort of having an actual model of the world that that starts to make up for what this sort of gaps in what your brain will do just by default.
00:35:40
Speaker
Do you think different things are required for different people to begin believing in the concreteness of the future? So you talked about, for example, visualizing how things might actually concretely happen. Could it also be about seeing famous credential people saying that this could go wrong? Could it be more of a social thing that we begin believing more in AI risk? I think that about
00:36:06
Speaker
AI risk in particular. So a thing that can be going on with people's sort of skepticism or dismissiveness towards AI risk, I would say is somewhat distinct from the concreteness thing, though it's related, is some sort of sense that the evidence that they're interested in is centrally social, that they want to be seeing
00:36:28
Speaker
Signs from the kind of broader worlds of histology that this is a real thing and what that looks like in particular is kind of expert agreement and or expert you know sort of or you're seeing it on the cover of time magazine or whatever you're seeing it's it's you know
00:36:43
Speaker
the presidents or prime ministers are talking about it and stuff like that and if that's not happening then they're sort of gonna bucket it in the like large swath of ideas that they haven't you know tried to debunk who knows could be superficially plausible but like I don't have time to look at everything like I'm gonna wait for something to kind of filter through a bunch of other kind of forms of
00:37:04
Speaker
checking and evidence and stuff like that until until I believe it. So I think that a lot of that is going on, too. I think there's even some small part of that for me where, you know, you know, as someone who is sort of thinking about this prior to this sort of recent outpouring of of kind of sympathy and kind of public, you know, public concern.
00:37:25
Speaker
There is this sort of part of you that hopes that it's all fake and that somehow people know and that later when people finally get around to debunking it, they will like,
00:37:36
Speaker
point at the obvious flaw and say like, oh, of course. And then you can be like, oh, good. But it's not good when people finally look at your thing and go like, oh, no. And then you're like, no, come on. And so there is a way in which I think even for folks who are pretty inside view bought in that the fact that the rest of the world starts to get on board or get more sympathetic can make a difference and should
00:38:01
Speaker
difference to the extent you have any residual deference to or whatever you know some people they were like of course I don't care but um
00:38:08
Speaker
And then I think in terms of the connection with concreteness, to the extent that that is your kind of crux, I think people should do the same exercise I was talking about before. So various people, for example, have said, in the past, there were tons of people saying, the experts aren't worried. And the thing that I would have urged in the past, or if we apply the lesson of this essay to the past, then I would have urged something where you ask, OK, so what is the probability that I will see in the future experts getting worried?
00:38:36
Speaker
I don't want to just ask right now, I want to ask like, you know, in the future and over the whole scope of the time in which, you know, I'll be getting evidence about this, what's the probability I'll see the experts get worried and then how worried will I want to be then, right? Like to the extent I'm claiming that the important thing for me is expert consensus, like,
00:38:53
Speaker
what will my credence be, conditional expert consensus, and what's the probability of that occurring? And similarly, you know, there's various people who are like, where's the paper in nature? Or like, where's the peer review? Or where's the demonstration that this happens in the lab? Or whatever, whatever it is. And I'm like, okay, great. Like, if that's your crux, like, let make the predictions, like really ask, like, okay, what's the probability that I see that thing, not just now,
00:39:14
Speaker
But in the future, or you know, people like DPT four can't do blah, or I don't think it's like, okay, but in the future, when, when you see it, what's your probability that you'll see a thing in the future that does that thing? And then what will you think? So just in general, I think, um, expert consensus is an example of something you might've not seen in the past, but that, uh, will possibly possibly occur and, and to which you should be sort of responsive, um, ahead of time.
00:39:38
Speaker
Yeah, and it also makes it much easier to then have the discussion about whether expert consensus would matter. If you have some probabilities assigned to an expert will publish something in nature about this, or we will see whatever metric of expert agreement on this point. So there's the question of updating beliefs. There's also the question of the starting point for beliefs or priors, we could say.

Skepticism and Prior Beliefs in AI Risk

00:40:05
Speaker
To what extent do you think that disagreements are a product of varying degrees of putting all sci-fi scenarios in the fantasy bucket to begin with? And maybe some people are more open to sci-fi scenarios becoming real, and some people are less open to that. And so if your starting point, if your prior belief is extremely low credence in any sci-fi scenario, then you could see how it would be difficult to ever come to believe in AI risk.
00:40:31
Speaker
Yeah, so I do think priors, something in the vicinity of priors, that is a sort of very important crux and difference in how people are approaching this. I think it's kind of, if you want to frame it in particular about sci-fi scenarios, I think we do need to talk about what counts as sci-fi. Is the world today perhaps filled with sci-fi scenarios from 1900 is your point.
00:40:53
Speaker
There was recently some kind of climate novel. I'm forgetting the name. But the Ministry for the Future, something like that. So is this a sci-fi scenario? You're trying to imagine ahead of time. But let's say what they did is they literally took the IPCC forecasts or something and then tried to write some fiction around it. Or you don't want to just cut anything that appeared in a sci-fi book out of your kind of
00:41:19
Speaker
what can happen. Or, you know, if you're like, ah, in the future, like solar will be cheaper. And it's like, okay, but you should just like check on like the trend line with solar. Like don't sort of say it won't happen because it's in a sci-fi book. That said, I think in general, there is like an important sort of prior at work here. And I think there are important differences where, so, I mean, the prior that does some initial work for me is just like, you know, for any given thing that someone says will happen,
00:41:46
Speaker
the future like for kind of a suitably specific thing you know your prior is sort of like no it won't or like you need you need some reason to think to privilege the hypothesis that some like story about the future is true because just kind of hypotheses kind of need privileging by default and there's kind of a question like okay what's the burden of proof how how how much work how much evidence needs to be supplied before something can kind of
00:42:10
Speaker
make it into like, Oh, that's that's pretty good. And I think a lot of people, including myself who come at this from a kind of more skeptical angle or who did in the past, I think start with some sense of like, and there's like a kind of strong burden of proof here like you can make arguments for lots of things. There were arguments about nanotech destroying the world their arguments about
00:42:28
Speaker
Ben Garfinkel on a recent podcast, he gave this example of honeybees. There's lots of things that people thought would destroy the world. One that sticks for me somewhat is the population bomb people. They have these pretty simple arguments. I'm getting honest, I think those arguments were simpler and at a basic level more compelling of just on the first pass, you just look and it's like, here's the population graph, people, and here's what happens with petri dishes and bacteria. What do you think is going on?
00:42:57
Speaker
Obviously, you can talk about the demographic transition. You can talk about, ah, we can get more resources if it's incentivized. But at a first pass, at a sort of, ah, how sensitive are you to arguments? I mean, that was not that bad. And they were catastrophically wrong. And I think that's a really important warning. In high school, I got all sorts of really intense climate catastrophizing. I remember seeing a documentary that was very much
00:43:23
Speaker
we will Venus, we will be Venus if we don't overthrow global capitalism in the next, whatever, yesterday. And I remember watching this documentary and I walked out and I was like, did you guys see that documentary? That was great. But I'm still glad that I did not totally upend my life. I mean, there's a question of what's going on here, but I don't think we should just grab whatever. There's a bunch of memetic dynamics at stake in which emergencies or catastrophes get exaggerated.
00:43:50
Speaker
I think it's reasonable for people to kind of come at this with like some default skepticism I think there's some and that can that can anchor you at a low level I think there's a different sort of prior that people who are more scared are using which is something like they start with the idea of like okay suppose you get a
00:44:06
Speaker
Like another species on this planet that is like super intelligent and like much more much more intelligent than humans conditional on that like What is your sort of prior that things are great your or you know? Every you know humans still have control over what's going on
00:44:24
Speaker
And if you can get yourself to doing that move, then I think, then, you know, it sort of used to be like prior, like, I don't know, like, that sounds rough. Like, how did this happen? Like, what, you know, why, why, why would I think it went well? Like, you know, it seems like the default is like, they're calling the shots and like, you know, what, what are they, what are their shots and how did, how did that end up happening? So I think if you, if you, if you start with that sort of orientation and then I think it's, it's actually relatively easy to get to a kind of high level of concern.
00:44:52
Speaker
Yeah what what i'm listening to you now and when i think these thoughts myself i can feel my probabilities on my credence is in the iris jumping around so if we start from one frame we start off from the frame of here is a big bucket of sci-fi scenarios now we pick one of them and could that be real and that's probably probably not gonna happen like.
00:45:11
Speaker
like a lot of the other scenarios are probably not going to happen and then if you if we take the other framing thinking about you know when was the last time a species on earth on earth remained in control while a more intelligent species roamed around from that framing it now seems very plausible that ai risk is real and it's going to happen so is this just a sign of
00:45:35
Speaker
me being confused or if we can change the, I don't know if reference, maybe reference class is the right word here, if we can change the reference class or the framing of the question and have our probabilities or credences jump around that much, are we just fundamentally confused and should we approach the problem from another angle perhaps?
00:45:54
Speaker
I don't know if we're fundamentally confused. I think we're... Maybe I'm fundamentally confused. I don't know if you are. I guess I think this is sort of what epistemology sort of looks like. You think about things in different ways.
00:46:05
Speaker
And then you try to synthesize stuff into an overall view. I think, yeah, I don't have a royal road on that front. I think sometimes people, it feels like people kind of assume they know, they're like, ah, this is a reference class thing. And then they assume they know how to proceed from there or something. Or they assume there's some way to ignore the reference classes and just reason about it or something. I'm kind of like, I don't know. It feels like, in general, this is just a tough game. But I don't think you can just, I don't think we're kind of fundamentally confused in the sense that I think
00:46:34
Speaker
There are just concrete empirical questions at stake here in terms of how many humans are alive, how much energy is our civilization using, what's going on with our computers. And I think there may be some conceptual confusion that makes... You can imagine conceptual confusion around notions like agency or some of these other things making this discourse not even wrong, or in some sense having missed
00:46:57
Speaker
having missed the kind of basic story but so i think that the ai risk discourse lives in a certain sort of ontology like it sort of lives in this ontology of agents pursuing goals and the concern is that we will build minds that are kind of agents pursuing goals and the goals will be kind of contrary to to our interests
00:47:15
Speaker
But the goals will give rise to these instrumental incentives to kind of avoid being turned off and gain resources and stuff like that. And then these agents will be the ones whose goals sort of govern the sort of direction of life on earth going forward. And then actually, I think there are some, especially in the early history of kind of the AI discourse, there was an even more kind of rich and questionable set of kind of ontological
00:47:39
Speaker
assumptions at stake. So there's this notion that there's a right way to understand the goals of an AI system is via a kind of utility function. The utility function will be maximized because like, and the reason utility function, you should do that is because there are these like, in some senses, what like rational agency, this is the kind of convergent natural structure of rational agency will have utility function because of something, something, coherence theorems. This is a somewhat parody, but I actually think LAZR just thinks this.
00:48:03
Speaker
I think he thinks the coherence theorems and the natural structure utility thing. As far as I can tell, that's an important part of his picture. And then you've got to maximize really hard. And the utility functions diverge when you maximize them really hard. So there's a bunch of stuff there that I think is actually sort of, this is like a kind of philosophical ontology. But it's an empirical prediction as well about what is the right way to carve up the sort of forces that drive the future. And the sort of empirical prediction is that the right way to carve them up is sort of as agents with values.
00:48:32
Speaker
utility functions, maybe. And I think it's possible that that is an important sense of kind of wrong or incomplete or kind of like overly confident ontology on

AI Alignment and Agency

00:48:42
Speaker
its own. That in some sense, this is like not the right way to carve up what's happening in the world in terms of like there are agents, they have values. That said, I think it's like hard to say that this is like totally ruled out. Like there are agents that do have values, right? Like
00:48:54
Speaker
or at least to some extent. There's a kind of important sense in which, I don't know, someone running for president is trying to win. Or there's an important sense in which Google is trying to make profit. There's an important sense in which I am trying to get something for my fridge when I do that. You're thinking we can always describe something as an agent pursuing gold. So we can very often frame it theoretically in that way.
00:49:19
Speaker
No, no, I mean sort of the opposite. I think like it's not the case that this is just a sort of random like, Oh, it's sort of a way of thinking. I'm like, no, I think this is actually like a really important true thing that can happen. You know, like you can just like, you can build a system that is rightly understood as pursuing some goals and you know, like, uh,
00:49:39
Speaker
Hitler or something. That's a thing that can happen. Exactly what you want to say about that or how common that is and how much it'll crop up in the context of gradient descent and how hard it will be to build systems that are otherwise useful that don't have that property, all of that is a further and more detailed question. It seems unlikely to me that
00:50:01
Speaker
the the idea of agency or the kind of possibility of machines that are both super intelligent and agentic and pursuing goals that are kind of contrary interest that doesn't I don't think that's like I think there are empirical scenarios would be like yep that is occurring like you could in principle build a paperclip maximizer
00:50:19
Speaker
And it could, in principle, kill you and turn the world into paperclips. I think that's like an empirical scenario. If we saw that happen, we'd be like, yep, that was not a conceptual confusion. That is an empirical scenario rightly described by those concepts. But I still think there's ways in which we might be leaning on those concepts quite a bit harder than they warrant. Or that's one of my most salient ways in which the discourse might be confused at a fundamental level as opposed to wrong empirically about what'll happen. So would this be an instance of
00:50:46
Speaker
of there being no danger than if we are conceptually confused this way, or would the danger just look different? Would the risks look different? So, for example, what I'm imagining now is that AIs stay mostly tools for us and they don't become agentic, they don't have goals in any traditional sense. Are you thinking about how that scenario could still be dangerous?
00:51:10
Speaker
That scenario could be dangerous in various ways, but I just think it wouldn't qualify as a misalignment, x-risk. In particular, I think the notion of misalignment is kind of importantly related to the notion of kind of agency and goal pursuit. And some people have tried to frame it without that. And I'm kind of like, well, I don't know. I really think it's structural to the story.
00:51:30
Speaker
The way in which I see this as a possible comfort is not that it's impossible to build the agentic scary things. It's just that it's sort of less central to... The general worry is that the sort of dangerous type of thing is also really, really closely related to the useful type of thing.
00:51:49
Speaker
And, you know, the type of thing that's useful for kind of understanding our AI systems better and making progress in alignment or doing a lot of the stuff that might kind of mitigate some of the risks and get us to a safer situation, the thought is no, in order to do that with our AIs, you need to build the really, really scary type of thing. And I think it's possible that, in fact, we can do more to kind of separate different types of minds and kind of get useful work out of minds that are, in some sense, not kind of doing the agency thing we're worried about. I think that is...
00:52:16
Speaker
Harder than people think in particular. I think like as you see with GPT for and stuff like this There are systems that I think are not in the relevant sense agentic or scary but it's in but the fact that they're smart and kind of Otherwise it kind of intelligent means that they're intelligent in the way that agency takes advantage of and we see you can just like take GPT for which is not I think relevantly agentic and then
00:52:36
Speaker
build it into an agent very fast by asking questions like, what would an agent do here? And then you have a different thing that does that. And so I think there's this great post that I think maybe is sort of under, under referenced, but it's something like optimality is the tiger agents or the teeth and the sort of the basic, the basic dynamic is kind of, it's really intelligence. It's the scary thing. If you have a system that is not itself an agent, but able to like accomplish goals in the world.
00:52:57
Speaker
It's at least kind of agency adjacent. And so I think it's hard to separate these things too hard or too far, but it's still, I think there's still hope there. And sort of, there's a kind of spectrum of how much hope you get out of that. And I think the doomiest worlds are the ones where there's very little.
00:53:13
Speaker
Yeah, we are trying very hard to turn our current large language models into agents. We are enhancing them with, say, more short-term memory or linking them together, having them collaborate and so on. And so you could see how agent or agentic behavior would arise out of more tool-like intelligence. I definitely see that. Okay, if we get back to the question of
00:53:39
Speaker
our gut feelings versus our more formalized models or our more intellectual posture to the world. Is there a potential danger in us becoming alienated from ourselves if we begin to rely more and more on formal models and kind of intellectual life as opposed to connecting with our gut feelings? So I think there is that danger at both
00:54:05
Speaker
an epistemic level and at a kind of motivational human level. So I think, I think at an epistemic level, as I said before, I do think your gut provides a bunch of signal if you decide. And I think some folks hang out in sort of the effect of altruism community. And I think a lot of people in that space will sometimes update very hard against their gut and they'll really decide, you know, oh my gut doesn't, can't do scope sensitivity or it can't do, and then those be like, gut, you know what, you are out.
00:54:32
Speaker
And like I'm I am a rational person now. I am like an abstract that is the only kind of part of my epistemology I trust I think that's like throwing out a bunch of information and kind of Hobbling your your even your epistemic practice in important ways and then separately I think it doesn't it doesn't necessarily make for a kind of very human or kind of engaged or rich alive and
00:54:54
Speaker
kind of fully mobilized relationship to, especially if you're like working on this stuff all the time, or even just trying to think about like, what should I do? If you're sort of, your whole body doesn't believe it, or you're kind of, a bunch of parts of your mind don't believe it, then those parts are like, sorry, why am I moving my arms? Like, I don't, I don't, like, I didn't, you know, tell me something, you know, tell me something I care about here. So like, tell me something real. And you're like, no, no, I decided to throw you under the bus, but I got this abstract model and it's sort of like, oh, okay, you know, but whereas if you're like,
00:55:22
Speaker
actually kind of fully thinking about it and kind of having a kind of more cooperative, even if, you know, at least your gut should like engage or sort of, you know, be like, okay, I grant that I can't see this, but I sort of, you know, I'll give you this one abstraction. Like, okay. You know, I don't know. There's some, some, some way of not just suppressing or, or kind of cutting off that part of your life. Cause I think if you cut it off and then it is, you're cutting off a big part of yourself and your whole self is the thing that needs to act and think and kind of live in relationship to this stuff.
00:55:48
Speaker
And it's probably not even possible to cut off the system. Even if you pretend to cut off your gut feelings, you will be in an adversarial relationship to yourself, basically, and not being able to coherently implement or integrate whatever you believe about AI into your daily life. I would imagine at least. Yeah, I think that can happen. And in fact, that can happen too hard in the other way. And this is something I don't talk about in the post very much, but that came up in some of the comments where
00:56:18
Speaker
There is also this opposite kind of art of managing your guts, possibly destabilizing or overly intense relationship to these ideas where I think, you know, this is really scary stuff. And so on the one hand, I think it's...
00:56:35
Speaker
It's it kind of can be useful and important to kind of mobilize your whole like visceral Self in relation to it on the other hand that can very easily become kind of too much or overwhelming or people can it can be a kind of Detriment to their motivations or their orientation in the world their ability to act wisely or practically and so
00:56:57
Speaker
in that case you almost want to go the other way you want to be like okay got like this is actually too much or there's a whole there's a whole dance in the other direction that um i think is for some people the main dance some people they are not actually afflicted with this sort of like oh things don't feel visceral for me it's like much the opposite they're like whoa like that is like too visceral for me that is something i like can't handle and they're and they're doing a different a different game i think it warrants like a whole a sort of whole different discussion that i haven't
00:57:22
Speaker
I haven't really gotten into. Yeah. How much individual variability do you think there is here? So there are people perhaps who trust their gut way too much and people who trust their gut not nearly enough. Do you think perhaps AI risk attracts a certain type of person on that spectrum? I mean, I would guess that people who are attracted to AI risk are people who are fairly model
00:57:48
Speaker
first in their epistemology right i think because a risk is not sort of like if you're just like do i feel it in my body that uh you know ai is about to kill me without you know you never looked at the scaling laws you didn't it no i think people might actually i do think like when you you know it might all it might be that all it takes is like you hang out with chat gbt and you're like like your body actually feels it even independent of a bunch of any whether you've ever heard of any of this of instrumental convergence or whatever whatever you're just like
00:58:14
Speaker
This was what I was thinking actually. Perhaps we are about to live in a world in which you can sit, you can take out your smartphone, you can talk to something that looks very much like a human, something that talks very much like a human, something that has facial expressions, inflection,
00:58:30
Speaker
different kinds of tones in its voice and so on, but it is basically an AI model. Perhaps that world is coming quite soon, and in a sense, the playing field for the gut reactions could be about to change very drastically.
00:58:48
Speaker
I think that's totally right. I do also think though that that playing field is going to be all over the map, right? Like people are going to be having gut reactions, especially because, you know, these AIs are going to be optimized to like, even as we're bringing the AIs online, we're also bringing on like unprecedented power to like manipulate human psychology. I mean, we're bringing on unprecedented power in just like a zillion directions. That's like part of what's intense about the situation.
00:59:12
Speaker
Yeah, you know, it's going to be hard like when every Internet AI if you're if every idea you interact with is like hyper hot or something or like, you know, or like hyper charismatic or like hyper, you know, otherwise kind of sympathetic, you know, that's going to matter as opposed to every AI you interact with is like a clunky robot or, you know, and so.
00:59:32
Speaker
I think it's going to be a challenge in general to trust our guts in a bunch of ways as our guts are being newly assaulted with different types of stimuli and different forms of optimization. But I think at a basic level, it's less likely that people will be like, this is unreal in its entirety. I don't know what you're talking about with AI. They're going to be encountering AI
00:59:53
Speaker
more and more directly in their lives, they're going to be asking themselves questions like, what is this creature? What is this mind? What does it understand? I think we're going to be talking a lot more about AI sentience and rights and stuff like that. I think there's just a bunch of new questions that are going to come up that are just going to be a lot less intellectual for people and a lot more here and now.
01:00:12
Speaker
Yeah, I think on balance, it would probably be great if more people sat down and created some models for how they think about AI risk and perhaps AI sentence and so on. But I don't expect that to happen. I expect people to update mostly on what is coming at them in this kind of the stream of sensory expressions and ideas you're hearing on podcasts and videos you're seeing online. And so without
01:00:37
Speaker
ever forming an actual model of the space. So perhaps on balance we should have more models.

Forecasting and Model Creation

01:00:46
Speaker
I guess like at the very least the old thing I was sort of doing was there's a statement like don't just go with your gut, crunch the numbers and then go with your gut. And the thing which is different from shut up and multiply is it's actually there's a still like you crunch the numbers you see what your gut thinks about that exercise and then you go with your gut. I think I've sort of moved
01:01:03
Speaker
I want to actually go a little further towards crunch the numbers, see what your gut thinks, and then it's like, maybe your gut's wrong. Maybe the numbers are wrong. It's a tough game. But I think it's important. In some sense, your gut might believe the numbers too. Your gut might change as a result of your modeling exercise. So I don't even think this is a sort of like, do you like your gut or not? Your gut wants this information too. I think if you have the time and the interest, I think it's worth really thinking ahead, especially given that these ideas are very high stakes.
01:01:29
Speaker
there's going to be a bunch of discourse and possibly the AIs are going to start participating in a bunch of this. The whole world might be getting quite intense epistemically soon. It's good to have thought stuff through. And I think both parts of your, all of yourself can benefit from that.
01:01:43
Speaker
Do you think you've benefited from having these models available? You were perhaps earlier than the world in some sense. You were interested in these ideas before they became mainstream. Do you think you've benefited epistemologically from having these explicit models? In some sense, I think I've benefited in the way that people who are willing to try to make forecasts at all
01:02:04
Speaker
Benefit which is like you do actually eventually get feedback, you know It's like eventually someone actually is right and someone is wrong and it can it can actually be easy to forget that somehow when you're like just like opining your it's sort of easy to look at like What will provoke like blah reaction right now or like what will be seen as smart right now? But there's this great thing about about kind of forecasting Which is that like eventually the actual world?
01:02:29
Speaker
really says what what who was right and wrong and then like the people were wrong it's like they were wrong and I don't think we should like be mad especially people who are willing to make forecasts and they ended up wrong and like oh great like I don't think we should be be too hard on people who who who went out on a limb and and were mistaken even if they were like
01:02:45
Speaker
Kind of importantly mistaken part of the benefit of having models is you can learn and update whereas if you're just going on your gut Then you can just sort of careen around and not actually be quite sure what you're supposed to be learning You can just be sort of like associating with different things. Oh, this idea is high status now. Oh this like
01:03:01
Speaker
Apparently, this person's not into it. You can be sort of this big moosh. Whereas if you had made predictions, you're like, oh, I thought it was going to be like this, then you can learn more about what went right or wrong with your past cognition. I think it's probably faster to do that if you just get on Metaculous or get on manifold markets and make even shorter term forecasts. I think it's not that different. It's sort of beneficial regardless.
01:03:26
Speaker
Do you think it's even possible to create your own models if you're not doing this in a professional capacity? There's a difference between perhaps being paid to do this and sitting down and taking a few years to make a model and how good will such a model be compared to doing it on the weekends or whenever you have time to think about how to create a model of AI risk.
01:03:49
Speaker
Basically, do you think amateur models of AI risk, are these models worth doing? I think they are. I think in general, there's some sort of art to doing the short versions of potentially long or infinite tasks. I don't think it's something I totally excel at, but I think it's something I like.
01:04:09
Speaker
really respect and think it's good and have kind of tried to get better at which is like you know there's something that you think will take three months like first do the day long version then do you know and see what happened there and then do the week long version and do the day long version that that like isn't the start of the three month version it's like the compression it's the like get to the bottom line if you you know if you have a weekend if it's a really important question spending some time to like literally write down what you literally think um just write it down it doesn't need to be right it doesn't need to be doesn't need to be sort of
01:04:36
Speaker
anything except really written down and you're sort of best shot right now if you had to kind of gun to your head. I think there are benefits to that, even if you don't have time for more. What do you think about the finiteness of human life, the fact that we are going to die? How do you think this might feel unreal, this might feel like an unreal part of the future for many of us, at least it does for me? How do you think this reacts or relates to thinking about AI risk?
01:05:05
Speaker
One of the examples in the essay that's sort of salient to me in the context of places where our gut doesn't actually kind of reflect the full story, or it's like places where there is this dynamic of kind of predictable updating in the future. So I often think of this sort of,
01:05:23
Speaker
I don't know, archetype of someone who's getting, you know, they're sitting there and they just got the kind of scans back telling them that they have cancer or something. And it's like they have three months to live and there's some sort of update that they make or, you know, they'd known the whole time that they were gonna die somehow. Maybe this isn't even like a sort of weird time. This was like a median prediction of like when they'll die and how or something like that. But nevertheless, there's a sort of real change in your relationship to this fact.
01:05:50
Speaker
There's this sort of difference between kind of believing something and realizing it. And it's like a really practical difference, right? So often when people, there's this sort of trope of someone gets this information and they walk out into the world and they're orientating towards it in a very different way. They're like, they're treating their relationships as more precious. They're treating certain things as like unimportant. Somehow, like everything is, you know, they're seeing beauty in a different way. Their whole relationship with the world is kind of altered in light of this kind of fact that they already knew being somehow more real.
01:06:17
Speaker
So I think this is like a, this is a classic in sort of human culture. We have a buildup of experience with the way in which death in particular can seem unreal to our guts, even though it's sort of intellectually something we're confident about, modulo, a bunch of transhumanism and stuff. I think that's, to me, that's its sort of central role. And it's just as an example of a case where we do this. And also I think it's an interesting case where we have something like this Bayesian
01:06:46
Speaker
Dynamic is sort of where you're trying to update on your future evidence is actually a part of some some aspects of popular popular culture So it's fine. Yeah, there's a song I referenced in in the essay. I think Tim Tim McGraw something It's a live like you were dying and it's about like I went sky diving He's like a guy who got we got these things and but it's it's poking you to do this thing of like update ahead of time and
01:07:07
Speaker
you will, you will later kind of realize this is sort of the, the, the songs kind of nudge is like, you're already dying. You don't need to get the scans. Um, you can do this now. And all these sort of like memento mori practices we have, um, are kind of oriented in a similar direction that there's something to learn here. We need to be actively trying to learn it ahead of time. I think that actually is a reflection. There's like a Bayesian structure underneath that, that kind of very rich and important human practice. And I think, so I think it's an interesting example in our respect. And then also I think it's like.
01:07:35
Speaker
If to the extent the AI will kill us, it might contribute to the AI's unreality. If you're the sort of person who finds it unreal that you will die at all, that might extend to dying in ridiculous sci-fi ways. Sorry, ridiculous, but very unusual sci-fi ways. Makes perfect sense. You had this report about power-seeking AI as an existential risk. You published it originally, I think, in 2021.
01:08:00
Speaker
I recommend people read this, by the way. For a long time, people have been talking about the lack of a coherent argument with the premises and a valid argument where a conclusion follows as opposed to a very long post. I think this report you wrote is that argument.
01:08:24
Speaker
You published this in 2021 and in the original report you estimated something like a 5% risk of extinction from AI before 2070. Then one year passes and you update this estimate. In 2022 you say there's now a 10%
01:08:42
Speaker
risk or probability of extinction before 2070. How did this come about? How did you update? This is perhaps what we've been talking about this whole time. How did this update come about? Because at Doubling, it is a pretty big update to make.
01:08:57
Speaker
One thing I want to clarify the outset is that is supposed to be a sort of subjective estimate of the risk. It's not supposed to be tracking what's sometimes called a like objective probability or like propensity in the world. So I don't think it's like the world was sitting there back in 2021 with like
01:09:13
Speaker
like a coin with a one in 20 chance of coming up heads. And then it's a new coin. I'm like, guys, the world has changed. And now it's a one in 10. It's more that my own estimate just went up. And even then, I also want to caution, it's not as though I
01:09:29
Speaker
And this is part of what was going on. I don't think I had some super fleshed out Bayesian model. I tried to pull some numbers out of my gut for the premises and the argument and think about it and come up with a final answer. And then I sort of
01:09:44
Speaker
did a bit more of that. I also didn't complete that process, but I was like, oh gosh, the 5% is too low. And so I threw in this correction. Yeah, so what happened there, there were a bunch of different things. One is I published that report, got a bunch of feedback from people. We solicited reviews, which people can read online. A number of them were really thoughtful and engaged. And so I felt like I learned stuff from that.
01:10:06
Speaker
I also just like reflected more and I was just like, I think this is too low. I think I just like I and so that there were a few things going on there. One was this dynamic that I mentioned in the essay that we've been talking about here where I tried to imagine.
01:10:19
Speaker
So the structure of the report has these different premises. And the first one is sort of a timelines premise about we will get sort of relevantly agentic and advanced systems by 2070. What's the probability of that? Or it'll be feasible to develop them. And I was at like 65% on that. And so my overall probability distribution was implying that conditional on getting those systems, I was going to still be at like whatever, above 92% that we'd be okay from an X risk.
01:10:50
Speaker
And I think I just didn't expect to feel that way when I was really like, I mean, you know, there's some ambiguity about how, how powerful the systems are at stake in that, in that premise. But I think, especially when I imagine like really powerful systems are like billions of them or, you know, and they're like outstripping humans and science and strategy and, and like persuasion and, and, and AI.
01:11:07
Speaker
R&D and and they're just kind of formidable in tons of ways and thinking like super super fast I like didn't I don't expect to like actually when I'm really seeing that thing and I'm really there in that world Which I was saying is more likely than not before I you know within my lifetime that I'm just gonna be like
01:11:22
Speaker
92%, it'll be good. I was like, I don't know. I think this is going to be scarier than that. And so that was one basic update, which is a fairly simple thing that the essay is trying to go into. And then there was a bunch of other stuff. I thought more about takeoff speed stuff. I tried to estimate it in different ways.
01:11:40
Speaker
I spent more time breaking things down by timelines and different scenarios. There were various things, and I was doing other things, and I got pulled into some other stuff, so I didn't have time to complete this process of changing it. I've actually been returning to some of that recently. I just threw in... Actually, I think this is too low, because I didn't like it sitting out there without reflecting what had changed.

Utopian Scenarios and Misuse Risks

01:12:04
Speaker
Do you think we'll then see a 2023 update saying perhaps now risk from extinction is 15% or do you think you're moving in that direction? The thing I threw in there was actually I think it's greater than 10% and I left this sort of frustratingly ambiguous and people have been like, what do you mean greater like that? And there could mean a lot of things. I'm like, it could mean a lot of things. I'm sorry.
01:12:23
Speaker
And I didn't want to just pin it down because I hadn't spent enough time with it. And yeah, I'm not actually at 10%. I'm actually feeling kind of comfortably above 10%. We don't actually sit around with like, I have my number on every day and I write it down or something like that. Let's say the real number is 20. Do you spend any time thinking about the remaining 80%? Do you spend any time thinking about how wonderful things could become if we survive and if we have aligned AGI? Or is this perhaps a
01:12:52
Speaker
That's a way to spend your time if we are in a very important time in human history where we have to get this problem right. Should we spend time thinking about utopia? Well, I think there's a few different things you could think about in that 80%. There's the very good scenarios of utopia. There's the bad scenarios where the misalignment
01:13:13
Speaker
wasn't the issue, but there was other bad stuff going on. But also, you've got really intense AI. And then there's the scenarios where you didn't get AI in the relevant sense at all. And you're wondering what else could be important. I do spend time thinking about, I guess, all of those buckets. The Utopia one is more about how do we move probability mass to that, and what's at stake in that, and what's the ethics of the thing that we're shooting for here. I do think there are other risks other than
01:13:42
Speaker
misalignment. They're in a complicated relationship with misalignment because there can be a tension where if you're concerned about human misuse, for example, I think there are a lot of overlaps, especially in the near term, as a lot of the bad stuff that we're worried about misaligned systems doing is also stuff that humans who are trying to do bad stuff with AIs could do. In some sense, it's like the human can supply the ingredient that misalignment is supposed to supply, which is like the bad intention behind
01:14:12
Speaker
behind the AI's behavior. So just for example, using AI in war or using AI to produce a lot of propaganda or something like this. Yeah, that's an example. I mean, I think we see so a really concrete example of this that we've seen is someone immediately there's this sort of
01:14:28
Speaker
project of creating little more agentic systems using language models as the base. And someone immediately created this agent called Chaos GPT, which has the goal of destroying the world. And this is sort of a joke, but I think it's perhaps an informative joke where we learn something about what humans will do if these systems are made widely available. Yeah, I think there is a question of would that person have done that, depending on how scary the thing is. But I think that the basic lesson is there, which is that
01:14:57
Speaker
you should not like humans will just do like for any given like kind of bad thing that you you don't want people to do if you there might be you know depending on how widely available this stuff is like some people will just do it and they'll do it intentionally and it in the model itself will not be like
01:15:11
Speaker
necessarily needing to come up with these goals. And so if you're worried about agents hacking, if you're worried about agents persuading people to do stuff or stealing money or building bioweapons or whatever it is, there are aspects of the kind of pipeline to the model doing that that the human can supply, even on their own volition without the AI needing to
01:15:33
Speaker
you know, kind of commandeer them. And so, yeah, I think you're right. It is just a pattern we see that whenever some new technology is released, people will kind of test the limits of what can this do and try to break it in all sorts of ways. This is what we see when
01:15:49
Speaker
when DPT or chat DPT or DPT-4 came online and people tried to make it say things that OpenAI does not want it to say in all sorts of creative ways. And perhaps this is great, we learned something from it, but it also tells us something about human nature. Perhaps it just tells us about there will always be, because there are so many people, there will always be someone who will try to push the system to the limit and perhaps further than the limit in terms of danger.
01:16:18
Speaker
And there's a question of how much of that is there. I think the quantities and the amount of damage at stake can matter and I think.
01:16:26
Speaker
want to be careful insofar as like some of some of these arguments can kind of end up like yeah I think there were kind of real questions about about like what amount of access to different forms of technology do you want to allow but at a broad level yes like I think I think there are there's certain sorts of technology that's sufficiently scary that we like you don't just just like kind of throw it out into the world and let let kind of everyone with whatever intention do whatever they want with it right and I think that's that's
01:16:49
Speaker
very clear with bioweapons and clear with nukes. And so I think we should be thinking about AI in a similar vein or sufficiently powerful AI. That's how we should be thinking about it. So I think there's actually quite a lot of synergy between concern for misuse and concern for misalignment in the sense that many of the regulatory and safety and auditing measures you might want, especially with respect to the capabilities of your systems and
01:17:15
Speaker
how widely they're deployed and stuff like that are kind of similar. That said, ultimately I think that these things can like trade off and there can be ways in which if you're kind of too focused on what sort of the other humans will do, the bad humans will do with their AIs, then you can end up kind of compromising on your safety and other things.
01:17:32
Speaker
I think it's a tough situation. I think some people want to say misuse is zero problem. We should never worry about that because it might trade off against safety. And I think it's a tougher situation, but we want to hold the tension in the right way. And I think it does actually matter what our probabilities are and what's at stake and how bad the different scenarios are. It's like a complicated discourse. I think at a first pass, there's something nice about misalignment, which is that it's really this sort of positive sum problem. No one wants these systems to kind of
01:18:00
Speaker
to destroy the human race. It's an easy point of consensus. It's an easy point of coordination. It's something we can really do a kind of public good to create kind of safer systems and kind of techniques for making these safer. And I think that's like a kind of one point in favor of that as a point of focus. Yeah, I understand what you mean there. Although it might not be strictly true. I'm thinking here of like suicide cults or cults where the main premise is.
01:18:28
Speaker
sort of hatred for humanity as a whole. And these are, of course, tiny, tiny percentages of humanity that are in these calls. But yeah, do you ever worry about that? Do you worry about this would be in the in the misuse bucket for you, right?
01:18:47
Speaker
Intentionally, yeah, like a suicide call creates chaos GPT or whatever. Yes, I think I think there's a general a general issue which I think is not unique to AI Which is this sort of discourse around the idea of a vulnerable world and the kind of democratization I mean or there's different ways that the world could be vulnerable but one that receives an especially large amount of focus and it's especially kind of fraught politically in terms of like what what is the solution is is what do you do in a world where kind of
01:19:12
Speaker
the capacity to destroy everything is becoming increasingly democratized if it is. It doesn't need to be AI. I think this comes with bio. It could come with all sorts of tech in the future. I think at a basic level, we don't want it to be the case that anyone who decides that they want to destroy the world can do it.
01:19:30
Speaker
And so you need some solution to that. What that solution looks like, I think, is actually just a really hard problem. Or it could be a hard problem, depending on the technological landscape we end up in. Now, in practice, we haven't had to deal with this so far, but the worry is that that's partly because of the sort of limitations on what tech we've had available. Overall, I think you have to talk about offense-defense balance and a whole bunch of stuff.
01:19:52
Speaker
But I think it's a hard issue. I don't think it's a sort of just AI issue. And in fact, you might hope that AI could help insofar as AI might also help with the sort of defense aspect of that. If you had a aligned AI, then you might use your newfound capacities to kind of harden civilization in various ways against sort of like, two psychologists or kind of threats of this form.
01:20:15
Speaker
Yeah, and this is what we've been seeing with technology up to this point. I think, at least to some degree, that we've been seeing that defensive technology has in some sense won out over offensive technology.
01:20:28
Speaker
that kind of power balance has allowed us to live in a relatively nice world. We've talked about extinction and we've talked about misalignment, we've talked about misuse, so we've talked about the possibility of utopia, is that this is worth thinking about visualizing perhaps, but perhaps it's not the main priority. Then we have sort of
01:20:46
Speaker
middle scenarios, we could call it. One I have in mind is humanity becoming less and less grounded in reality and sort of losing contact with what's really going on.

AI's Societal Impact and Ethics

01:20:58
Speaker
This is something that I know Dan Hendricks worries about. This is something that Paul Cristiano has written a bit about.
01:21:04
Speaker
where perhaps we are more and more unable to understand what's going on in the world because things are becoming so complex, because AIs are solving more and more tasks for us and we are losing agency, but we are loving it along the way because we're getting so much, our living standards are rising along the way. This is not an extinction scenario. This is not a misuse of highly capable AI, but is it something we should worry about, do you think?
01:21:31
Speaker
I'm most worried about this dynamic insofar as it...
01:21:35
Speaker
leads eventually to our disempowerment. And so whether that's via extinction or some other way, and I think actually both final outcomes are kind of compatible with the world seeming kind of similar to what you described for a while. So the version of this that I worry about is the AIs, you know, in some sense we are failing at what's sometimes called scalable oversight, which is our ability to supervise behavior that we ourselves can't directly understand, such that the supervision enforces
01:22:05
Speaker
conformity to what we would have wanted out of that behavior kind of if we could have understood it adequately from our perspective. So the worry is, you know, in some sense what you end up with is you're seeing, you know, all the metrics you can understand are saying, yeah, it's fine. Things are going great. But in fact, if you could see the behavior more fully, you'd be like, wait, this is horrible or this is like not what I wanted. I think that's a concern. But the reason I think that's an eventual concern is because one thing that you might not like if you understood it better is behavior that ultimately leads to you being disempowered.
01:22:34
Speaker
Ultimately, that scenario is one that I worry about within the kind of structure of the kind of power-seeking kind of threat that the report is oriented towards. If it's not that, I guess I've personally felt that people seem to me kind of overly excited about some, or kind of overly concerned about various scenarios in which the AI's never actively disempower us, but somehow, I mean, there is stuff there, but it's never seemed to me as, if the AI's aren't kind of actively disempowering us, it seems to me like we have a lot more scope
01:23:04
Speaker
And they're sort of always in some sense if we say like do this they will do it or it'll be weak then I think we have like a lot more scope to notice what's going on and try to course correct and try to coordinate is obviously not like Fine, there's a tons of ways tons of ways things can things can go wrong but if you remove the kind of actively adversarial dimension the kind of AI risk story then I think
01:23:24
Speaker
things just look a lot better. The scary thing is when the AIs are optimizing against you. If they're not doing that, then I'm feeling a lot more comfortable. I think Dan Hendricks and Paul Cristiano that I mentioned before, they would agree with you here and classify this as one point in the way towards a potential extinction scenario.
01:23:47
Speaker
But if it isn't that, is there another risk in the middle of the road scenario here in which we, again, we're talking about losing contact with reality?
01:23:58
Speaker
Perhaps we're talking about our preferences becoming remade by AI to be becoming easier to satisfy. I think you know what I'm getting at here. I'm getting at kind of an extension of our current paradigm with social media. And perhaps the future is not extinction or utopia, but it's just an extreme amount of wasted time and distraction and kind of a doming down of humanity. Is this worth worrying about?
01:24:25
Speaker
I think I would be inclined to bucket that under a much broader umbrella, which is like ways we could mess up that are neither.
01:24:36
Speaker
oh my God, like there was a really catastrophic and specific human misuse event or sort of there were actively adversarial like misaligned AIs. And so it's just like we can, there's just all sorts of ways in which civilization can just fail to achieve its potential or sort of muddle along. I think there are ways I would actually, if I'm doing those, I would be much more concerned about like more directly like negative moral errors. So I think I'm concerned that we have
01:25:04
Speaker
that we fail to kind of adequately, basically that we fail to treat our AIs and our digital minds well, conditional on solving alignment. I think there's like a bunch of ways in which we aren't currently grappling with the sort of ethical questions about how to integrate kind of new digital minds into our society in a kind of simultaneously wise and kind of genuinely ethical and kind of compassionate
01:25:29
Speaker
I don't think that's just about suffering or consciousness. I think it includes political questions about rights and property ownership. And then tons of additionally difficult stuff around how do you prevent people from creating suffering on your computer. If you have a personal computer, but you can create a suffering mind on it, how do you prevent that from occurring? How do you deal with democracy in contexts where you can create
01:25:58
Speaker
many copies of a mind and then delete them. If you can delete them, again, when you're allowed to delete a mind, when you're allowed to copy it, what does consent look like when you can engineer your minds with specific motivations? We are in for it in terms of new ethical questions that it is possible to mess up, even conditional on getting in some sense control and understanding over AI.
01:26:27
Speaker
This makes the landscape look pretty bleak for us because the alignment problem is a difficult enough problem as is. But if we're then talking about potentially mistreating our AIs that we have aligned to our goals, potentially not giving them the rights that they should have and so on, then that's additional complexity on top. I mean, the question of digital sentience is just not mainstream.
01:26:54
Speaker
perhaps yet and perhaps it will be you know what i talked about before the scenario of us pulling out our smartphones in in twenty twenty seven and having a what looks to us like a video call with an actual person but the the whole thing is is an instance of generative ai
01:27:12
Speaker
and how this might change our gut reactions. Could there be a very quick change in the public's perception of AI models and which rights they should be given and how realistic it is that they are sentient? I think there could well be. I think there's a bunch of different factors here that I think pull in different directions. On the one hand, I think people have a hard time with
01:27:36
Speaker
The idea that something that they're used to understanding at a sort of mechanistic gears level like where their first, their first encounter with it was sort of at the level of like the neurons in our brain and they thinking if you start out thinking of humans as centrally a collection of neurons. There's a way in which are kind of naive.
01:27:56
Speaker
philosophical relationship to consciousness stumbles on the kind of mechanistic conception of a conscious system. And so if you started with a mechanistic conception, then it's easy to like, well, this is just a bunch of neurons. Like, you know, neurons, I don't see any consciousness in there. I can see all the neurons are a little connected. Where's the consciousness? Where's the blue?
01:28:15
Speaker
This is a great way to put it. You can imagine a person who has been along for the full ride of computers becoming an integrated part of human life. You can imagine someone who has been working on this since the 70s and starting with a punch card and switchboards, perhaps. How could consciousness ever arise out of such basic components?
01:28:40
Speaker
Totally. And I think a reason to not trust that is that you can make the same argument about neurons. Like, you know, if you started out, if you're building humans out of biological neurons, maybe you started out by stringing together some like neurons grown in addition and just a few of them and like, Oh, come on. Like that's not.
01:28:55
Speaker
I mean, I even have this, I notice when I engage with like little sea creatures or creatures, you know, sometimes there's like tiny translucent, you can maybe see kind of internal into the system and you look at it and you're like, how could that be conscious? Like I can see all the parts. I don't see any, I don't see any consciousness in there. How could that happen? But like, that's not, you know, that's, and literally there are these thought experiments, you know, in philosophy where you imagine like touring around inside a brain or you imagine, you know, a giant, you know, network of humans, like passing punch cards and all sorts of stuff.
01:29:25
Speaker
Now, some views of consciousness do say, no, there's something special about this such that computers can't be conscious or whatever. I'm really skeptical. I think it's just very, very likely that digital minds can be conscious in the sense that I care about. But I think people are going to have trouble with it on that level.
01:29:45
Speaker
On the other hand, and so, you know, if you compare with, for example, animals, like animals don't struggle in that regard. I think people have, it's easier for people to be like, well, animals are kind of similar to us. They're made of the same stuff. They have like brain sort of like us. So it's sort of an easier question or you don't have to sort of do that gap.
01:30:00
Speaker
Yeah, if they are evolutionarily close to us. So if they kind of look like us and if they are perhaps big enough for us to take them seriously. Just a personal example is that I can look at a shrimp of a certain size and think that's probably not conscious, but then look at a larger shrimp that are not more complex and think that, well, that's obviously conscious. And so I just don't really trust my intuitions on this topic anymore.
01:30:23
Speaker
Totally. I mean, I think we're lost on this. It's a really bad intellectual situation and something I think we should be kind of approaching with a lot of kind of fear and trembling because we are, you know, currently playing with like, we know the stuff that matters most or, you know, really core to our conception of what matters is this something about minds and like,
01:30:43
Speaker
What's up with mines and we're currently like take we're just we're just like throwing mine stuff around like we're just like building random new mines and stirring it with a bunch of grainy descent and like putting in whatever situation we want we're taking like kind of the stuff you make souls out of and kind of just like
01:31:01
Speaker
kind of mad scientist flailing alchemy, like we have no idea what we're doing, but we're like, we know that the stuff we're touring with is like nearby the stuff that is like most precious and important. And I think that's like a really scary situation from an ethical perspective. Yeah, and I think we're gonna get it. So I think in contrast with the sort of difficulty of imagining digital consciousness, which doesn't apply to animals, I think the thing that AI is gonna have in the opposite direction is it's going to be
01:31:29
Speaker
smart and is going to talk and is going to be able to like, you're going to interact with it in a bunch of ways that you're used to interacting with systems that you're recording kind of moral status to. It also might be optimized in other ways. And in the limit, I mean, you could even have, and this is something I worry about, you can have AI systems that aren't conscious or that don't warrant kind of moral consideration by our lights, but that nevertheless pretend to
01:31:52
Speaker
In order to gain power or and but they also or don't pretend to and maybe they really do But they're trying to gain power and they're trying to kill you and and you know Just because something has moral status doesn't mean you should let it kill you and I mean there's gonna be a really really messy Situation I think I worry about kind of hard trade-offs here. I think I think it's it's sort of a whole additional
01:32:09
Speaker
feature of the situation that doesn't get, in my opinion, kind of enough attention. But in particular, the thing I worry about, going back to your question about scenarios where, sort of middling scenarios, I worry that we get this part wrong later on. And I think there is just a scary track record of humans
01:32:25
Speaker
not giving due consideration to kind of sufficiently different forms of kind of sufficiently different creatures. You can just easily imagine us treating AIs kind of in ways we have mistreated humans and ways we have mistreated animals.
01:32:40
Speaker
And I'm hopeful that if we can sort ourselves out as a civilization and kind of understand things better, kind of get safer from an existential risk perspective, then we can sort of appropriately respond to that category of considerations. But I don't think it's guaranteed. And I think it's like a scary way, that sort of middling where we just have, we have a sort of, in a deep sense, like unjust or kind of bad society for digital minds in like a longer term way.
01:33:07
Speaker
This question of digital minds, if you say you were talking to a person who just thinks this cannot be an issue, this is so far out that it doesn't make sense to even theorize about digital sentience. What's your way of framing this to make it seem perhaps a bit more plausible? Why do you lean towards perhaps computational theories of consciousness or why do you find it plausible that we could have digital sentience?
01:33:33
Speaker
Probably the argument, so there's a few arguments that people sort of bandy around that I'm kind of reasonably sympathetic to. Maybe the easiest though, it's somewhat complicated if you wanna like really get into like the physiology, but at a high level, we can do these sort of gradual replacement scenarios where we imagine kind of, all right.
01:33:50
Speaker
The high level question was like, could we build a mind that is in some sense computationally like yours or structurally like yours out of non-biological elements, right? And I think it's very, very plausible that we could like, you know, you have a, we can do a whole brain emulation in the limit of sort of this like really, really intensive kind of simulation of your brain and it will reproduce your kind of all of your verbal reports and all of your kind of, even the internal processing will be kind of structurally isomorphic.
01:34:17
Speaker
across your brain and this emulation. Not necessarily one for one because of like chaos but it's like it'll be the same kind of computational system. At a high level I think it's just very plausible that that thing is conscious even if we don't talk about like transitions and one way to get that is like when you think about like what is ultimately driving when you look inside your mind and you go I'm conscious. There's some something that you're sort of
01:34:40
Speaker
reacting to or detecting, or there's something that's driving your, like what is eventually your literal mouth in the physical world moving, saying, I'm conscious. And that process, you know, it's sort of a computational one, or we understand there's a sort of an algorithmic kind of description of what, why it is that, you know, your, your mouth ends up doing that. And then that, that description will be preserved kind of by hypothesis with respect to, to the, uh, the emulator system. So such that in some sense, the system will be saying it's conscious because of the same reason
01:35:10
Speaker
you're saying you're conscious. If you think that the reason you're saying you're conscious is because you're conscious, and this system is saying that it's conscious for the same reason you are, you should think that something's preserved there. Now, I think that's kind of a complicated and abstract argument. I think it actually gets into a bunch of really difficult stuff in the epistemology of consciousness. The somewhat easier argument is to imagine now gradually just making yourself into a system like that. So you sort of replace one neuron with like a quite sophisticated non-biological neuron.
01:35:37
Speaker
And some people are like, oh, but it can't just be a simple element. And there's some sort of work with me here, people. Like, come on, is this really about the proteins? And people are like, maybe it is about the proteins. I'm like, all right, well, now we just have a complicated neuroscience question. But it just doesn't seem like this is going to be where the meat is philosophically. But some people disagree. But I sort of think
01:35:57
Speaker
Now imagine you change one neuron into a kind of computational element. Change another one, change another one. And there's some notion that you're not going to notice. Certainly by hypothesis, you're going to be saying all the same stuff, like the computational structure is preserved. And so it's kind of weird that there's this argument about, well, is your consciousness kind of fading as we add more and more neurons?
01:36:17
Speaker
Um, and there's some thought that like, come on, man, like no way you're going to be like half conscious through it. And some people are like, well, I don't know. Like these zombies. Anyway, it gets very weird. If you want, we can just talk about like, well, whatever. Like we have expert surveys. It's like, I don't know. They're like 60, 80%. I think the experts are too low on this. I think it's like more likely.
01:36:32
Speaker
Perhaps the thing that's worrying here is just that we won't be able to distinguish between sophisticated mimicry of consciousness and actual consciousness in AIs. And so we might be misled into giving AIs rights that shouldn't have rights, or we might, from our gut reactions, be misled into giving AIs that should have rights, not giving those AIs rights. And so it's just, it gets very, very difficult all of a sudden. I've discussed this question of digital sentience on this podcast before.
01:37:01
Speaker
One of my big questions here is just, what on earth can we do about this? For the alignment problem, it seems at least somewhat graspable. There are different approaches, interpretability work, for example. What on earth could we do about digital sentience?
01:37:19
Speaker
In the limit, we can understand. You do a bunch of philosophy, you really understand the neuroscience. I think there's some ways that the very question is confused. But I think at a basic level, we can just understand all the relevant facts and then sort of act in light of that understanding eventually. We are going to be very, very far from that, in my opinion, for a kind of importantly substantive period of time, depending on kind of how focused you are in the longer term versus kind of nearer term.
01:37:49
Speaker
We are not, I think in the near term, going to know what's going on with our systems. We're not going to know what internal properties they have. We're not going to know what internal properties we care about. I don't think we should assume that we only care about consciousness. I think it's possible that you can mistreat in ethically relevant senses, non-conscious systems, which is a sort of controversial view. I just think we should be quite open to just tons of different
01:38:11
Speaker
ways this could shake out in terms of what is ultimately going on with minds, what ultimately is at stake by our lights. And so in the meantime, I mean, I think I don't, I'm most interested right now in what are the like low hanging fruit? What are the kind of things we can do that are going to be not requiring of us solving a bunch of philosophy?
01:38:30
Speaker
Or getting a bunch more insight into whether our systems are conscious but that are kind of reasonably robust, hopefully cheap. You know, and then we can talk about the expensive ones later but start with the cheap ones that will kind of enable us to do kind of do better by REI systems. One suggestion that's been made is a paper, Bostrom and Schulman
01:38:48
Speaker
that I think is kind of a good candidate for kind of low-hanging fruit is to save various of the AI systems that we're using and in principle could be mistreating for possible like compensation layer. Like if we realized that these systems were in some sense you know we were kind of doing wrong by them
01:39:06
Speaker
Um, or kind of hurting them or something like that. And, but we have, we can later, then we can be like, all right, we're going to make sure that your life is like amazing on net. Well, you know, whatever your life, whatever, whatever that means is, but in some sense, like find a way to make whole to, to whatever degree you can.
01:39:21
Speaker
Um, such, such that these systems would be, you know, be really happy on net with, with kind of their, their lives and their situation relative to not existing at all or something, something like that. Now I think that's like pretty janky and obviously you can ask questions about like, well, what, what is the actual kind of structure of these systems? What, what are we assuming their concern is for their future of their life or whatever? I'm not sure, but I think, uh, this, this seems to me relatively cheap.
01:39:41
Speaker
that we can talk about. It does matter how many things you're saving and which things you're treating as distinct entities. And we're going to be confused about that too. But that's an example of something that could just be implemented now and I think could make a difference.
01:39:55
Speaker
Perhaps the impression listeners will get from this conversation is that, or at least from the first part of the conversation, is that we are updating or a lot of people are updating towards perhaps AGI arriving sooner than they might have thought and perhaps AI risk being higher than they might have thought. There's also a group of people who have been talking about AI risk for decades now and who have been saying that this risk is very serious.
01:40:21
Speaker
You talked about before the patient constraints about how we can update when we have given some probability of risk. And you said that it's difficult to update from a very, very low number. And I guess that also holds for updating from a very, very high number of probability in AI risk scenarios.
01:40:43
Speaker
and then ever becoming convinced that this is not a big problem. How do you think about the so-called doomers and how their epistemology works? In practice, there are many, many more people who are out there expressing very dismissive attitudes towards AI risk, or who will even say, and I want to actually really just
01:41:06
Speaker
Thumbs up. People who are willing to go out there and say a number as opposed to just sort of being dismissive, but people will say, you know, one in a million or, or there are many, many fewer. In fact, I'm only aware of one person and, and it's Eliezer Joukowsky who doesn't actually like to give numbers though. If you, depending on which, you know, blog posts you look at some of them sort of have quantitative implications that suggest we're at, you know, 99.999 something, some, some like something such that we were like really, really effectively 0% on his model.
01:41:36
Speaker
And then it's a question of like, well, how maybe his model's wrong, what probabilities you put on that. So there aren't that many people who are like one in a million that will be fine or one in a million that this doesn't happen. But if there were, then all the same dynamics would apply. And I think they apply in my older cases where people who are at 99% that we die, you can just say, well, OK, well, that predicts that you're at less than 10% that you'll ever be at 90%.
01:42:04
Speaker
I think that's right. And so all the same dynamics will apply and people should kind of question, like, is there anything that could give me hope, anything that could drop me down to 90 or even to 98 if I'm at 99, and then ask, like, what is the probability that I end up seeing that and making that update?
01:42:24
Speaker
It'll be the same constraint. How do you think about the possibility that all of this talk of AI risk that you've spent a lot of time on, that I'm spending a lot of time on, that somehow it doesn't matter or it's not real. It turned out to not be a real thing this whole time. How do you think of that possibility? I think there's a couple of different ways that could happen. Some of them are just in the model where if you have
01:42:54
Speaker
you know, let's say you have some probability on really kind of relevantly powerful and agentic AI systems being developed within blah time period, but you know, you also have probability that doesn't happen. Then it could be, you know, as you're saying, there's like, you know, maybe a chunky probability that, that, you know, you get to the end of your life and you're like, yep, never happened. And in some sense that was, okay, this was all kind of, and then there's a question like, well, what is your retroactive assessment? But it could be that you were just like, yeah, I made a reasonable ex ante call. I looked at the evidence. I think my like.
01:43:23
Speaker
relationship to the evidence available at the time was reasonable, but nevertheless, things played out in this particular way. Where that one doesn't, I mean, it's tempting to think that if you...
01:43:33
Speaker
you know, if something where you're like only 20% on it ends up happening that you're like, oh, I must've been super duper wrong. And you know, to some extent you were wrong, but you weren't like that wrong and you might not have been going that wrong, you know, in your relationship with the evidence at the time. I think those are the most likely ways in which this is all confused. It's just sort of like an empirical scenario that, and again, this does matter like what sort of doomer you are. So if you're at 99% and you have this like super confident model of like exactly how this plays out and exactly how hard things are, then I think it's like a lot easier to have ended up like super wrong.
01:44:03
Speaker
Um, and I think, uh, sometimes the discourse is sort of dominated by the most confident voices on either side. And so it can really seem like, well, there's a sort of strict dichotomy. Whereas I think in fact, uh, the sort of more reasonable probability distributions just have decent weight on like a lot of different.
01:44:19
Speaker
Scenarios that are kind of various lead degrees various degrees of do me And so that's like my best guess is that like, you know There's just different ways this could play out alignment could be not that hard humans could coordinate in various ways like takeoff could be slow There's stuff like that And then there are kind of more fundamental conclusions where like later we look back and we're like, we were just like thinking about this all wrong That one I think is like
01:44:43
Speaker
My best candidate for that is we were kind of like over focused on some notion of agency and kind of somehow thinking about things in terms of their like objectives too much. Like we had sort of reified, reified our kind of intuitive human modeling of like other agents in ways that misled us. As I said before, I don't think that's.
01:45:03
Speaker
sufficient to like make it confused to think about a paperclip maximizer like I think or confused to think about like just like a kind of agentic system I think it can lead you to kind of privilege that framework as a way of predicting what future AI will go and I think that's that's like a candidate for a confusion that that like the discourse has suffered from or will we will look back on and be like ah we were confused about that those concepts
01:45:23
Speaker
Yeah, so one part of the question of thinking about AI risk is thinking about the increase in AI capabilities. And this is also something that's part of your report on power seeking AI as a potential existential risk.

AI Progress and Automation

01:45:37
Speaker
What are the best tools we have for measuring AI progress or AI capabilities?
01:45:43
Speaker
You see, for example, GPT-4 passing high school exams, getting A's in high school exams, passing the bar exam. What does this mean? Does this mean that the GPT-4 is now suddenly as capable as a human lawyer? No, we all know that there's something to being a lawyer that's not just about passing exams. We have all kinds of candidates we could run through, but perhaps let's start with talking about benchmarks. How do you think about
01:46:09
Speaker
benchmarks and comparing AI performance to human performance by that metric. Other things equal, I think it's great to have benchmarks because you can really track what's going on and it's a kind of quantitative thing and you can optimize for it and stuff like that. I do think there's a general history of it being somewhat difficult to specify ahead of time the tasks that will be
01:46:33
Speaker
blah degree meaningful with respect to AI progress. I think a classic example here is people thinking that chess was this really, really important indicator of deep intelligence. We shouldn't underestimate how much weight people put on chess as the upper end of human intellectual achievement before it was basically a solved problem in AI.
01:46:56
Speaker
I think I do wanna say to the extent I've been kind of encouraging this mode of like, well, look ahead to what evidence you can get and forecast, what will you feel then? I think that's virtuous and good, but you can be kind of wrong about,
01:47:13
Speaker
what else you will have learned by the time you get there right so it might have been in the past that you you said like oh well if the ai's can play chess then i will absolutely freak out right but you were imagining the wrong or i guess a way of putting is like most of your probability mass was on scenarios that were in fact importantly different from the one that you end up in but not with respect to the chess
01:47:33
Speaker
parameter in particular, which is the one that you were articulating. So you're maybe imagining that AIs are doing all sorts of other stuff, but in fact, you show up and you like the AI sucks. You can see the algorithm and the algorithm is sort of this brute force thing or whatever. And you're kind of like, oh, I see, I was wrong to privilege this as a kind of an indicator of AI performance in general.
01:47:50
Speaker
And I think, you know, if that happens, then that's true. Or, you know, then you take that seriously. But I do think there can be, you know, there can be like moving of the goalposts things where it's like, well, you did say you thought that was a big deal. And it is here. Had you told me 10 years ago that AIs could pass high school exams and even bar exams and so on, I would have been incredibly impressed. I would have called it AGI at that point.
01:48:12
Speaker
And so I do, but now somehow I'm less impressed so I can kind of feel myself moving the goal post in real time because then I'm thinking about perhaps high school exams aren't actually capturing what we mean by competence in a certain domain.
01:48:28
Speaker
All of these things. How do you think about the things that benchmarks cannot capture? Specifically in the economic sense of which perhaps what it means to be a lawyer in the real world is to have a network, is to have some human connections with people. And it's not perhaps as much as we might think it is about drafting documents that AI is that task that AI is pretty good at now.
01:48:53
Speaker
And this, of course, connects to how much economic impact AIs will have in the short term and long term. So one aspect of what you're saying there that I think is important and useful is to figure out what is it that you ultimately care about? And I think people sometimes privilege some notion of like, well, when will the AI be, you know,
01:49:12
Speaker
Real, real AI. When will it have the special sauce? When will it have that thing, intelligence, general intelligence? If you're doing that, then it's sort of A, you're kind of in for it in terms of what did you mean, but B, it's not clear why it matters. It's not clear why it matters what we dub
01:49:32
Speaker
intelligence or understanding or the verbal debates about which folk theoretical terms we will apply to our AIs when. I think it's better to say, what's at stake in those terms? Why do we want to know whether those terms apply? And I think economic impact is a very good point of focus there. I do think a difficulty there is there's a bunch of additional stuff at stake in what impact a given system has on the actual real world economy beyond its ability to have that impact. So you might have regulatory barriers, you might have
01:50:02
Speaker
slowness and adoption, there might be like other sorts of frictions. And so, but yeah, I mean, other things equal. I think like, are people literally actually using the systems for like a given task is sort of the most important thing because what we're ultimately wondering about is systems that are doing stuff in the world and able to like actually get meaningful and kind of high impact things done. So, you know, I think like probably the most important
01:50:25
Speaker
benchmark or indicator of kind of scary AI progress from my perspective is the usage of AI systems to automate AI R&D. Not just necessarily kind of in principle could they do it, but to what extent
01:50:41
Speaker
are existing systems actually being rolled out within AI labs, within hardware companies, and improving productivity, changing the pace of progress, reallocating labor. What is the impact of these systems on the AI R&D process? Not necessarily in the economy as a whole. That's, I think, for me, the most important thing to be tracking.
01:51:04
Speaker
Because then we might get into some kind of self-improvement loop. I'm not thinking about in a very short-term loop, but perhaps over years as AI becomes more and more capable of increasing R&D in AI.
01:51:22
Speaker
So why is it that AI is being able to perform AI R&D is so important? The scariest scenarios are ones in which there's a kind of feedback loop. There's a very kind of traditional story that I expect to track
01:51:41
Speaker
the kind of dynamics less directly, which is this sort of AI is literally kind of editing its own source code and kind of in this very kind of recursive self-improvement dynamic that is sort of driven by the AI itself. I think, but there's a pretty nearby dynamic that I think is worrying for kind of similar reasons, wherein the kind of whole process of kind of improving our AI systems and like all of the inputs that drive AI progress become
01:52:07
Speaker
The labor involved is more and more fully automated, and the outputs are being reinvested. We're getting software progress driven increasingly by automated AI researchers who are then improved by that progress. We're getting hardware, more efficient chips.
01:52:29
Speaker
that are being designed by AIs themselves. There's a question, eventually you're automating the actual literal hardware production. Now that's, I think, a longer process. And so the scariest scenarios, I think, involve actually a more purely software-focused feedback loop, at least initially. And then the worry is, if you really close the loop and you have a fully automated opening eye,
01:52:55
Speaker
Or even, you know, depending on how things go. But once you get that, then it's like, I think you're in a pretty scary scenario. And there's a report by my colleague Tom Davidson at OpenPhil that goes into that in some detail, which I recommend to people. So that, I think that I see that as the most, the factor that drives AI progress the most. And also I think it's, it suffers from fewer of the larger kind of
01:53:15
Speaker
barriers to implementation, they're not entirely free from them that come from like economy-wide adoption of various forms of automation. I think that's just like a much longer process. There's a bunch more friction and hurdles. And I think it's actually not necessary to end up with kind of pretty scary degrees of AI capability.
01:53:30
Speaker
It's the case, for example, that AI researchers are pretty close to what it would mean to have AI help with AI research. So that's perhaps one argument for expecting AI R&D to be automated before other types of endeavors. Is it also the case that there's just a huge amount of money to be made because AI researchers are so expensive that
01:53:53
Speaker
automating some part of the job would be very economically valuable. Are we perhaps living in sort of a weird world in which AI research and development, which I would expect to be something that's automated at the very end, along with perhaps physics research or something that seems to me like very advanced stuff, that that type of
01:54:18
Speaker
research is automated before we see automation in other parts of the economy. We should distinguish between the capacity to automate it being available and the sense of the AI capabilities themselves being available if you wanted. In practice, do people take the step of implementing the automation in question? And I think my guess is that by the time you're actually able to fully automate AI, R&D,
01:54:46
Speaker
uh, process, you're also able to automate like most stuff. Um, just cause I think, I think the AI R&D process involves kind of most, especially the cognitive labor. You know, I think there's a sort of additional question about what goes on with robots and, and how, how hard is it to kind of get, get your kind of physical labor. But, but speaking just to like all the aspects of what, what people at OpenAI do that involve just like their laptops and clicking around and stuff. You know, uh, I think once you can do that, you can probably do also most of what the physicists
01:55:14
Speaker
do. There's some question, maybe you might wonder about especially specific forms of cognition that certain very specialized humans are doing and will that take a different... But broadly, I expect the capacities to come online at around the same time. And I actually wouldn't expect physics to necessarily be a super difficult thing to automate. The things that seem harder for me is when are you going to get AI teachers or surgeons or domains where we're really quite
01:55:40
Speaker
uh kind of heavily regulated industries where we're going to have like really intensive checks and there's going to be like a big political debate and like you know are you going to trust the people and all sorts of stuff that's that's sort of more what i'm talking about i think there's like barriers to society rolling out and adopting and trusting these systems whereas
01:55:55
Speaker
Well, A, they're more comfortable with the systems. B, they can literally... I think it's the case that these labs will often deploy their products before they release them to the world. They'll be using them internally, and employees will have access. And then, yeah, I think at a certain point, if AI is a sufficiently valuable driver of growth and profit, then you just want to be reinvesting or really focusing on that. And it's like, unless we regulate,
01:56:17
Speaker
right now it's just a lot easier to kind of deploy these things. There isn't some big hassle or not as much of a hassle to just using the chatbot that you built internal to your company than there is in making it a doctor. Yeah, we should say this is happening to some degree already. I would expect at least to have the top labs using
01:56:41
Speaker
co-pilot basically, which is an open AI originated product, which is an autocomplete for coding or programming. And so in that sense, we could see perhaps signs that we will see more automation in AI research and development. Yeah, I think code is a really, really important place to look. Like how much
01:57:02
Speaker
are kind of coding assistant speeding up the kind of coding process and what aspects of it. But I do think there are other things too. Like I think probably for this to work, you need a, you need just like a really holistic assistant of the type that people are trying to build. Like, but they can kind of assist you with just like all sorts of desktop activities and coordination and communication and all sorts of stuff.
01:57:21
Speaker
Um, so I do think, uh, more than code is required, but at a certain level, like once, once the AIs are like coding as well as human coders, like the best human coders at open AI. And once the AIs are now, and now they're like generating ideas for experiments and designing experiments and they're generating new algorithms.
01:57:36
Speaker
I think once the AIs are developing algorithmic breakthroughs as important as the transformer, now we're really cooking with very scary gas. That's a really intense degree of innovation coming out of these systems if you can then really step on the pedal with that, where you go.
01:57:58
Speaker
that scary stuff and thereby important, especially important to track. Yeah, I think when I think of programming, I perhaps, I think of it as a bit of a lower level activity closer to the details. For some reason, I find myself thinking that AI would have trouble automating the role of a CEO.
01:58:19
Speaker
not inventing an AI breakthrough at the level of a transformer, but saying that, OK, now we've seen the transformer paper. Now we're going to invest heavily in this area. We're going to use resources here and not there. We're going to make these strategic decisions, big picture thinking, long time frames and so on. For the running of a company or a lab like OpenAI or DeepMind or so on, that would be required, or that is required now.
01:58:49
Speaker
wouldn't that be a barrier to AI automating, AI research and development? I think it would, I mean, in general, I don't think, I mean, I think there's some scenarios here that focus specifically on
01:59:01
Speaker
a sort of important phase transition that occurs when you move from like 99% automation to like the full automation, such that, you know, now opening AI, like the humans are just watching opening AI kind of surge forward or something. You know, I think that's, I don't think that's necessary to get the thing that I'm scared about. And in fact, like once you have, you know, in Tom's model, once you have systems that can really kind of fully substitute for human cognitive labor, you don't even need to kind of
01:59:31
Speaker
boost their capabilities all that much further to end up in a very scary situation. In particular, he has these calculations where if you look at the amount of compute that is realistically at stake in training a system of that kind, then once you've trained that system, just with the compute that you had of access, you'll be able to run, I think he has some calculation where it's like if compute is, if it's like GPT-6 or 7 or something, this is like, I think it's like 100 million human equivalent workers in OpenAI.
01:59:57
Speaker
You will have a hundred million human equivalent words or something. It's like, okay, well, now I think we can talk a little bit of like, what are the memory requirements? How's that actually work? How's that calculation work? But like the broad idea that you can, you'll suddenly, or maybe not suddenly, because you're going to have built up to this, you have this glut of high quality cognitive labor. It doesn't need to sleep.
02:00:13
Speaker
You know, it can, like, it's probably by the time it can, it can kind of do most of the stuff, or all, you know, really fully substitute for human cognition it's also like way better at tons of other stuff, you know, you can rapidly improve like you can do you know you can try to get software innovations out of this so like
02:00:28
Speaker
I think it's very plausible that you can get into the regime where we have AIs that could really, where if we lost control of them, it would be disastrous without having any sort of like full handoff by the humans to like a fully automated company. It's sort of a question of just, are there sufficiently many sufficiently capable systems that if we look around the world or maybe we don't even have a chance to decide whether to deploy them if they really start kind of
02:00:53
Speaker
you know, taking the reins. But I think, in general, there can still be humans in the loop by the time we die. If humans are out of the loop entirely, then it's especially easy to imagine that things go off the rails.
02:01:09
Speaker
So thinking in terms of scenarios like this, make it clear to me that this could happen much more quickly than I might expect. And you see rhetoric coming out of, again, the top labs or top companies thinking about this decade as being especially important. What do you think we can do or should do if it is the case? Let's just assume that we have something like transformative AI by 2030 or 2035.
02:01:38
Speaker
What do you think we should do in that scenario? Should we perhaps focus in a hardcore way on specific projects and drop everything else? So if we knew that we were going to get a kind of transformative AI by 2035 or 2030, at a civilizational level, I think if we really knew that, and I also think this given to some extent the amount of probability we should have on that, I think there's actually just quite a big burden of proof on tons of other
02:02:07
Speaker
So I'm sort of sympathetic to, for example, like,
02:02:11
Speaker
how much theoretical physics do we really need to do in the next decade? I'm kind of like all of the physics students, if they're physicists or professors who are listening to this podcast or whatever, I'm a little like, if you really knew, so one thing is all of your research, the AIs are gonna do it immediately, they're gonna get all the answers, all the things you didn't get, you're gonna be out.
02:02:38
Speaker
And you'll learn the answers later. But how cool is it to have made whatever incremental progress you'll make in the next little bit? There's a general image in Nick Frostrom's work. If you're digging a hole and there's a bulldozer that's coming, you might not want to focus on what is your marginal contribution to digging the hole. You might want to focus on what's going to happen with this bulldozer, right? And what sort of hole is it going to dig? And is it going to dig you into the hole? I don't know. I'm messing with the analogy.
02:03:04
Speaker
So I think like there's a bunch of stuff that seems to me sort of in that vein in terms of like human intellectual progress. Like I just think there's a bunch of questions that we don't need to answer to deal with this thing. And then once, if we deal with this thing correctly, then a bunch of we'll get this big glut of other forms of progress. You know, I think I want to be wary of sort of broader forms of like emergency mode, like drop everything, especially, you know, in your personal life or other things. You know, I think that we should just be more wary than I think some people
02:03:33
Speaker
are initially about really how much is the cost of totally upending everything and being like, ah, I shall burn all my resources and pivot everything. And I think that applies at a societal level. And I think that applies at a personal level, too. Exactly. We have some stories of people in 2017 thinking that perhaps ATI was coming within the next five years and cashing in their retirement and all of these sorts of things.
02:04:03
Speaker
That paradigm, back then the paradigm was more something like reinforcement learning agents in bigger and bigger environments becoming more and more capable until they could just be deployed in the real world. And I don't know how that paradigm is going, but five years has gone by and we don't have transformative AI. So there are some
02:04:22
Speaker
Some lessons to be learned from that. I think even on the very short timelines, it probably doesn't make sense to kind of uproot your life and change everything just because of the uncertainty involved.

Balancing Life with AI Risks

02:04:34
Speaker
Now, I specifically asked you to discuss this as just a certainty that we will have a transformative AI by 2035, say. And so, yeah, that's something to keep in mind.
02:04:44
Speaker
I think if you knew that the person in 2017 who knew that AI was coming in five years, yeah, why are you saving for retirement? I think that's a totally reasonable response. And I think you should, in some sense, be if you're 50% on that or whatever, you should bet that might affect your retirement planning. It might affect how you think about a bunch of stuff. So I do actually think we shouldn't. There is a real balance where if you actually believe this, it's not just some weird discourse that you not along at, then sometimes it actually can affect your literal expectations of like,
02:05:13
Speaker
If you have kids like what will happen to your kids like what will your life be like like what you know should you. What should your kids study like what should you know what should you expect to happen with like all sorts of basic institutions. I think we need to actually ask those questions I think I think society is gonna like change I think stuff is gonna be different and if I don't want to just like compartmentalize too hard and say like oh sure like.
02:05:35
Speaker
you know, change what you talk about on Twitter and like maybe change what your career is, but like don't change anything else that you expect to see in the world. There are ways of compartmentalization that I think are kind of
02:05:45
Speaker
looking away from stuff. And then there are forms of mentalization that I think are healthy. Like I think it's okay to just like, even if you really think that this is happening, I think it's like, okay to just like have parts of your life where you just don't think about it at all. And you just like hang out and you, you know, you do things that you enjoy, you hang out with your friends, you, you know, you have a family, you do all sorts of stuff. I think that, that is like more true to a later, even as you get closer than I think people often think. People will sort of think, oh my God, if something's going to happen,
02:06:09
Speaker
in five years, I should start like burning my candle at both ends now. And I sort of think marathon, not a sprint, applies for like a lot of lengths of running and there's a bunch of costs to like burning resources that people don't see and other things.
02:06:23
Speaker
You can't sprint for 10 years, perhaps, is what you're saying here. Even if timelines are very short, you can't sprint the whole way. You will burn out. That's right. And I also do think in the context of uncertainty, I think we should be just really wary of ways in which kind of emergency inflected memes can kind of commandeer your resources and tell you to stop thinking, tell you to stop being reasonable, like don't question it, just act on it.
02:06:44
Speaker
I'm really skeptical that I think people should be skeptical. I think people should take the time, even if the AI safety discourse is like, don't think about me. Don't bother to assess the arguments. Just act. Just assume I'm right. People who tell you, don't bother to assess my argument, but just assume I'm right. I'm like, come on. I'm not saying, I don't think we should be in emergency mode to a degree that makes us intellectually blind or insufficiently skeptical or insufficiently discerning. I think we should be still, in some sense,
02:07:09
Speaker
amping up even more our degree of epistemic awareness, clarity, giving ourselves space to be as sane as possible in the midst of something that looks like it might be quite intense.
02:07:22
Speaker
Yeah, let's stay in this framing of transformative AI coming within the next 10 years, say, or 15 years. What do you think in that world? What safety approaches are you most excited about? So I am most excited about safety approaches that apply to the types of systems that we're building today. So I think there's a broad paradigm called kind of prosaic AI. I think you should, especially we're conditioning on short timelines, you should be like, all right,
02:07:52
Speaker
trying to figure out how to align the literal types of systems we're building right now. In that bucket, I think the stuff that I'm most excited about is generally under the heading of scalable supervision or scalable oversight. As a first pass, right now, the way we make these systems, you start out with this really alien thing that we got a glimpse of with Bing, and then we do this process called RLHF, where basically you have humans
02:08:17
Speaker
express their approval of the model's behavior, express some opinion about it, and then you train a model to predict what the humans will say, and then use that to train the base model to behave better. Now, that works if humans can evaluate the behavior. But the more the models start being capable of understanding stuff humans can't understand and doing stuff in domains that humans can't track, the more autonomous and complex and sophisticated AI cognition and
02:08:46
Speaker
action becomes the harder it is for a human to just look at it and go like thumbs up thumbs down it's more like what the heck is that you know it's it outputs just like giant code base and you're like okay and it's like here's my proposal for like nanotech and this nanotech will cure cancer just like build it and you're like oh my god like is this a thumbs down is this thumbs up you know what's gonna happen with this we need we need at a basic level like as we start to have superhuman systems
02:09:07
Speaker
the sort of current paradigm of supervision is going to become less and less capable of constraining the behavior of the systems we're trying to supervise in ways that we would kind of like if we really understood. The paradigm we have for aligning systems that are not as smart as us possibly will not extend to aligning systems that are much smarter than us or just smarter than us.
02:09:26
Speaker
Yeah, so a lot of the hardest problems here come specifically from what does your alignment technique do with a system that is way smarter than you? And I think there are kind of somewhat important differences between, or sorry, there are really important differences between supervising a system that you're smarter than and supervising a system that's smarter than you, or kind of understanding a system that's smarter than you and stuff like that.
02:09:48
Speaker
And I think in general, people just really need to be asking of their alignment techniques, is this scalable? The scalable part is really important. So I'm really interested in that. I think separately, there's a whole bucket of research into threat modeling and demos, where we have this set of concerns. I think a lot of these questions are increasingly empirical questions. I think some people, the really hardcore doomers think that the dynamics here are sort of derivable a priori, that you can just really, without necessarily having seen how this goes, you can just
02:10:18
Speaker
know from some kind of suitable probability distribution over the space of possible goals or the space of possible minds that could satisfy various criteria that these minds will be deceptive and have different types of structure and agency and stuff like that. I think a lot of those questions are actually much more empirical. It's definitely a live hypothesis. It's a worryingly live hypothesis that you get a bunch of these kind of bad forms of behavior by default or in various kind of salient contexts.
02:10:44
Speaker
But we, I think, don't yet know how hard it is, how often it crops up, how hard it is to deal with, like what the kind of techniques, how different techniques will work. I think we need to be just like getting data and doing experiments and kind of starting to really kind of nitty gritty
02:11:01
Speaker
encounter these issues as they actually play out in the empirical world and start to understand that. And I think getting demos of when do you get problematic forms of power seeking? When do you get forms of deception? What sorts of agency do you see in different circumstances? How do systems cooperate? How do they generalize? There's all sorts of ways we can look at the specific issues we're worried about with alignment and see how they show up.
02:11:24
Speaker
Um, and then if they show up in scary ways to like tell people like really, and then like, if you see an instance of this scary behavior, really study it really, you know, very different parameters. How often does that come up? What's, what's, what's up with that behavior? So I think there's a bunch of, those are the things I'm most, I'm most excited about. And then there's some other stuff with interpretability I think is like also great, but I think it's harder on shorter timescales.
02:11:43
Speaker
Yeah, is there a really kind of dark and ironic world in which we, in order to test whether SIP systems are deceptive or power seeking, try to engineer this feature or this capability into them and thereby destroy ourselves?
02:12:00
Speaker
Is it perhaps dangerous to experiment with trying to get empirical data on these behaviors in AI systems? I'm thinking somewhat analogously to perhaps gain a function research in viruses. Yes, I think the answer is yes. I know people who are grappling with this very dilemma as we speak because there is a way in which you're doing
02:12:26
Speaker
gain of function research, you are trying to make these systems scary, or you're trying to bring out their scariness, but bringing out their scariness has hazards, both in terms of the consequences for the system's behavior.
02:12:39
Speaker
If you like try to see like, Ooh, can I build a bioweapon or something like that? And you also can prove out to the world that this is possible or like kind of make more salient these, these, uh, these failure modes in ways that, that kind of, uh, make them more likely, uh, went as instigated by humans. So I think, I think there's a lot of there, there is a lot of difficult stuff there. Um, but I think we do nevertheless need to be finding ways to kind of, to understand these behaviors, elicit them in safe ways, um, and kind of learn how they can be addressed and, and kind of what the impact of different
02:13:08
Speaker
different techniques for addressing them actually is. You mentioned interpretability research, which is this area where we're trying to take this black box AI system, this alien mind, as you described it, and then seeing which algorithms are running, trying to produce something that we humans can understand about what's actually going on under the hood there.
02:13:30
Speaker
How optimistic are you with this? When I talked to Neil Nanda, who's a big proponent and practitioner of this approach, we talked about whether interpretability can keep up with the speed of AI progress, and that being perhaps the biggest open question.
02:13:48
Speaker
quite worried about that dynamic kind of obviating the relevance of interpretability research to at least like kind of very short timelines scenarios and possibly sort of somewhat medium term scenarios as well. I think, and the reason for that is if you just look at like
02:14:05
Speaker
A, how, you know, where are we at right now in terms of how much have we understood about these models? And like kind of how hard is it, like what's the vibe in terms of like what amount of progress is being made, the techniques being used? And you're really like looking inside of, we're kind of at square one, like, ooh, and you're poking in the neurons and we're learning, it's a really cool domain. I think, I think it's like, it makes a ton of sense to be really excited about this. I think if I was just like a generic scientist and you're like, oh my God, it's like neuroscience, like this is
02:14:29
Speaker
I think all the neuroscientists should just be freaking out. It's like, because neuroscience is so data bottlenecked. It's so kind of difficult to do experiments. It's like this really, really costly thing. And now you have these brains that you can just do arbitrary things to and see arbitrarily inside them. You can build new ones. I mean, just sort of like, I think from a scientific perspective, interpretability is awesome. And it's nice in that respect in the sense that it's like quite
02:14:52
Speaker
clear from a kind of, you know, getting more people to work on this. It's a relatively direct, like, just understand how these things go. You can kind of tell what progress is. It feels very normal, science-ish. It doesn't feel very like, oh, I mean, now, of course, it gets a little, there's specific types of interpretability that are more interesting from an alignment perspective. Like, can you understand whether the model is lying to you? Can you understand, like, what the model knows? Are you able to kind of tell, extract knowledge from the model that you wouldn't have had? Overall, I think it's an exciting scientific area. I think the worry is just,
02:15:23
Speaker
We're at such square one with that, and it's not a bottleneck to deployment at all. So I think, you know, people can just like surge ahead with creating more and more, you know, kind of capable systems with, with approximately zero interpretability progress. There's no, so, so it's very easy for it to get just like left behind. That also is sort of a point in favor, point in its favor, because it's sort of very independent of capabilities progress. Whereas if you make progress on like scalable supervision and RLHF, it also can unlock
02:15:47
Speaker
kind of more deployment possibilities and stuff like that. Yeah, it just looks to me like in the near term, very kind of hard to see kind of realistic extensions of our current interpretability paradigm, like being adequate to the sorts of tasks that they might be, that might be asked of them, where you say like, ah, is this model lying to us? Like all this stuff.
02:16:05
Speaker
That said, like everything in alignment, you're hoping to get a bunch of help from the AIs and automate. So it could be that we can automate a bunch of the interpretability process. This is most feasible, though I think also less exciting at the level of if you have a relatively rote task that you need tons of humans to perform. If there's some part of your interpretability pipeline that an MTurk worker is doing, then I think we should be reasonably optimistic about being able to scale that up fairly hard. Because I think we should be reasonably optimistic about getting AIs that can imitate MTurk
02:16:35
Speaker
Once you're doing kind of more complicated stuff or like having to make like real conceptual breakthroughs, you could you could also get a bunch of progress there. And I think so that that's my most salient like way interpretability kind of comes back into relevance is if we get a bunch of AI help. And then I do think we just want we want to have AI help on a ton of different
02:16:52
Speaker
levels. The pessimistic take there would be then we have some AI interpreter trying to understand another AI system. But how do we know that this AI interpreter system is aligned with our interests? There's kind of like a
02:17:08
Speaker
We push the problem back one step. We're not sure that we're getting accurate information from the interpreting system. I think that's true, but I also think it can be overplayed as a concern. And this is actually, I think, one of the sources of hope I have more relative to the more extreme kind of pessimism. I think in particular,
02:17:33
Speaker
I think a lot of that comes from this worry that every system that you're working with is of the type that you're concerned that it's like deceiving you or like has misaligned goals or is sort of relevantly agentic or something like that. And then you're worried that these systems, in addition to all of them being of the dangerous type, they are sort of able to coordinate much better than you are. And they're doing like,
02:17:54
Speaker
I mean, in the extreme case, oh, they can do logical handshakes and show each other their source code and stuff like that. I'm like, I don't, or, you know, maybe in the limit for extreme stuff, but I'm talking about like in the next couple of years, like, I don't think, you know, these models aren't, aren't that capable yet. And, and, and so I think if you have, if you're able to have supervision from like a system that is not an agent or is sort of somehow less.
02:18:13
Speaker
Less worrying or that you trust you trust its output for this like specific category of task Then I think you don't necessarily need to be like working with things that might all be trying to kill you at once Depending or you could have like different probabilities on on the likelihood that a given system is scary You can be using less capable systems
02:18:29
Speaker
just, you know, to automate like specific little chunks of supervision or interpretability or whatever. So I think I'm generally more optimistic about like, can we use AIs even prior to having like fully solved the alignment problem? Can we use certain sorts of AIs for certain sorts of tasks in ways that like significantly enhance our kind of traction?
02:18:47
Speaker
on alignment-relevant forms of cognitive labor than the more extreme end of the pessimistic take. That said, it's still scary. You're like, oh my god, I'm trying to understand the AIs. I'm bringing in the AIs. I'm hoping they're going to help me. It's definitely not like, oh, we've got this done and dusted. But I think there's at least some hope for help there.
02:19:07
Speaker
And in some sense, you ought to think that for capabilities, to the extent you think these systems will be useful for capabilities and not just because they've themselves schemed that helping with capabilities will be conducive to their power, but they should sabotage alignment research. I think you're presuming that you've really lost the game in terms of how sophisticated and misaligned are these systems already. By the time that's your story about why they're not able to help. But if that hasn't happened, then you might, in fact, be able to get a lot of help from them
02:19:37
Speaker
kind of learning about alignment. So we talked about physicists. Perhaps if we're in a world with very short timelines, they shouldn't spend a lot of time trying to solve the most interesting problems in physics. Perhaps they should spend more time on AI. So you've just completed this PhD from Oxford. So I love this stuff. I love the stuff you're writing on your blog. I find it
02:20:06
Speaker
find it extremely interesting, but how do you think about that side of your work or that part of your work in a world in which AI is raising a head? How relevant is philosophy in that world? It's definitely a tension that I feel. It was pretty painful the amount of time I spent. I didn't actually spend very much time on the PhD. It's easier to get a PhD if you're not
02:20:29
Speaker
trying to get an academic job, telling the world. And it is, I think, you know, a source of tension in my life. Like how much should I, how much should I just like totally focus entirely on AI and alignment versus like somewhat a sort of broader set of projects. This sort of compromise I've used is like the blogging and the sort of this other writing is actually a sort of, it's like a personal time thing. It's not, I try to, I try to have like a work budget that is like much more directly optimizing for like just kind of really focusing on sort of what I see as most impactful and bringing this like
02:20:56
Speaker
much more optimizer's mindset. And there I tend to focus specifically on AI stuff. So, you know, not entirely. And then in the kind of domain of writing, I see that more as like, this is a time that has like a much more, many fewer constraints in terms of what I'm expecting of myself, in terms of what sort of like optimization I'm bringing to it. And it's more a place of like self-expression and like
02:21:17
Speaker
Something that I that sort of feeds other aspects of my life rather than my like what is most important part that said I think philosophy matters here I think I think like well in particular so people people bring to some of their orientations towards AI a bunch of extra philosophical assumptions about
02:21:36
Speaker
why this matters. So in particular, I've been interested in long-termism. I think there is a nice feature of AI right now, which is I think it's sufficiently convergent as a problem that we don't need to get too into the weeds in terms of like, are you worried about the long-term impacts or the short-term impacts or not? I think sometimes that degree of convergence is overplayed. I think in particular, if the AIs don't literally kill you but do something else, then it can start to matter what that else is and what's at stake for present day people, future people.
02:22:06
Speaker
Um, so there's some, there's some questions there. I think there are just broader questions where this is, in my opinion, still a sort of deeply philosophically inflected discourse. Like I think, you know, I mentioned this stuff about like the kind of ontology of the agency and then all this stuff. And then, you know, this sort of stuff about utility functions and they diverge. And ultimately this is about kind of human values and what's up with human values. And what does it mean for, you know, there's like a bunch of, you know, I think this is, this is a discourse kind of born of philosophy. It's an interesting case of philosophy kind of predating a real thing that where we encounter
02:22:35
Speaker
and have to deal with. Or digital sentience, as we also talked about. This is an area that comes directly out of philosophy and it could suddenly become extremely relevant to the world. So I think if we start talking about what are we supposed to do with digital, I think that is going to be super duper
02:22:53
Speaker
philosophical. And I think it's really worryingly breaking apart. We're going to be off distribution. We're going to be trying to generalize to this radically new world, all of these norms and concepts, both in ethics and also these sort of really amorphous things in philosophy of mind. We're like, oh, we care about, what is it to be conscious? Consciousness, preferences, autonomy. There are a zillion concepts where we're going to be looking at these AI systems and we're like, I mean, we're already doing this. People are like, what is this? Does it have preferences? I don't know. Does a light switch have preferences? What's a preference?
02:23:22
Speaker
And that stuff is going to bite really hard as we move into this new domain of distribution. You have to generalize. Philosophy is in some sense the art of figuring out how to generalize, figuring out what your naive concepts were and how you should move them to other areas. So I think there's a ton of philosophy that's going to be done or need to be done if we can survive. I think a decent amount of that you can defer to the future.
02:23:44
Speaker
And I talk about that in my thesis and other stuff, but I don't think you can defer all of it. In general, I think philosophy, for me at least, and I hope for many people, is sort of an effort to be a kind of sane and aware and kind of coherent
02:23:58
Speaker
person or soul or human in the world. And I think that project kind of, which is a little less about like knowledge gathering and more about like what is going, it's sort of awareness of yourself and kind of poise and orientation that you endorse. And I think that project kind of persists in its urgency, even amidst AI kind of taking off. Fantastic. Joe, thank you for all the time you've spent with me. Thank you for coming on the podcast. It's been a pleasure. Yeah. Thanks for having me. It was really fun.