Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Liron Shapira on Superintelligence Goals image

Liron Shapira on Superintelligence Goals

Future of Life Institute Podcast
Avatar
883 Plays7 months ago
Liron Shapira joins the podcast to discuss superintelligence goals, what makes AI different from other technologies, risks from centralizing power, and whether AI can defend us from AI. Timestamps: 00:00 Intelligence as optimization-power 05:18 Will LLMs imitate human values? 07:15 Why would AI develop dangerous goals? 09:55 Goal-completeness 12:53 Alignment to which values? 22:12 Is AI just another technology? 31:20 What is FOOM? 38:59 Risks from centralized power 49:18 Can AI defend us against AI? 56:28 An Apollo program for AI safety 01:04:49 Do we only have one chance? 01:07:34 Are we living in a crucial time? 01:16:52 Would superintelligence be fragile? 01:21:42 Would human-inspired AI be safe?
Recommended
Transcript

Introduction and Guest Background

00:00:00
Speaker
Welcome to the Future of Life Institute podcast. My name is Gus Dokker and I'm here with Lyron Shapiro. Lyron, do you want to introduce yourself? Yeah, sure thing. Great to be here, Gus. So my background is computer science and software engineering. I'm also an entrepreneur. I've started a couple of small tech companies. I've done a lot of angel investing and I've been studying rationality and AI risk since 2007 when I was in college and I stumbled on Eliezer Yudkowsky's writing on overcoming bias and less wrong. Right now I'm part of the PAUSE AI movement.
00:00:30
Speaker
I spend a lot of time communicating the urgency of pausing AI before it's too late and everybody dies.

PAUSE AI Movement and Intelligence Definition

00:00:35
Speaker
So I try to communicate in a simple and accessible way compared to maybe some of the pioneers of the field who tend to use more jargon. And you're doing great work here on the FLI podcast, you know, getting that message out to the public. So I try to do my part.
00:00:47
Speaker
And for listeners who like what I'm going to have to say, I also recommend you did a recent episode with Holly Elmore, head of the Pause AI organization in the US. I've been part of the protest that she's run in San Francisco, and Holly and I have basically the same policy position. So you recommended that we start out by defining some terms and concepts that will make it easier to understand your position. The first of these terms is just intelligence. And there's this kind of surprisingly deep definition of that term that you're working with.
00:01:17
Speaker
Yeah, so when we talk about what is intelligence, my go to definition is the optimization power definition talking about goal optimization. Now, it's not because I'm trying to play semantics, right? It's like every definition of intelligence is fine. It's because I'm trying to talk about what's dangerous and what's important to predicting the future.

Human vs AI Intelligence

00:01:35
Speaker
If you want to predict where the universe is going to go, then sure, maybe it doesn't matter who can get a high SAT score, but it matters who is good at achieving the goals that they have. It matters who can outmaneuver whom. The optimization power definition, it's the idea that if you start with a desired end state, can you take any configuration of the world today and work backwards from the end state and figure out what actions to take today and do it more effectively than the next person?
00:02:03
Speaker
can you win the war can you win the video game except the whole universe is the video game to the extent that you can do that that's my working definition of intelligence. Yeah does this imply that the process of evolution is a form of intelligence if the process of evolution is optimizing for something.
00:02:20
Speaker
It kind of does, right? So once you have this framework of like, okay, intelligence means goal optimization, you can look back and you can say, okay, besides smart humans, what other processes have this kind of goal optimization? And to some degree, you can look at evolution by natural selection and you can see a consequentialist feedback loop.
00:02:38
Speaker
Right so if you ask like hey the the I write the the I and into various animals like it sees really well how did it get to that point. Well the I exist because it can see well right the I is like a desired end state of being able to see well and all of the the changes that got it to where it is there actually was a reinforcement feedback loop.
00:02:57
Speaker
you know, like the teleology, the why of the eye, it's hard to explain it. If I were to try to explain you why eyes work the way they do without being able to reference, they kept getting selected for being good eyes, then it would be extremely difficult to explain it just on the level of physics, right? The goal optimization framework is very useful for explaining evolution by natural selection.

AI Values and Goal Alignment

00:03:16
Speaker
Yeah, so I think it's important to discuss whether humans or humanity in general, whether we are successful and powerful because of our raw intelligence, or perhaps we're successful because of our culture and the knowledge that we've built up over the centuries. Maybe these two kind of play together in some way. What do you think?
00:03:40
Speaker
Yeah, that's like a common way people push back when I say, look, the important thing to know about intelligence is that it's goal optimization. They're like, wait, wait, wait, what about culture? I mean, the word culture is kind of funny to me because it's kind of a grab bag. It carries with it certain connotations. It's almost like saying like, well, doesn't the human spirit help? It's like, well, can we unpack? Can we unpack that? Right. So if you force yourself to not say the word culture and you have to be more specific of like, okay, what exactly do you think that humans are doing that is powerful besides being able to optimize toward goals?
00:04:07
Speaker
And you might say something like, well, there's accumulated knowledge and it's like, okay, great. Well, we have that too, right? We've like crawled YouTube. So, and, and you say like, okay, well, there's like inference or there's like communication or there's like working in parallel, right? It's like, I don't have to figure out everything out. Somebody else can, but it's like, okay, well, we have like mixture of experts models, right? So it doesn't seem like humans are doing something deeply different from where we're on track with with AIs.
00:04:31
Speaker
This will probably change as models get better, but models are inefficient with the amount of knowledge they're consuming in terms of their capabilities. Imagine a human that had read as much as a modern LLM and just what capabilities such a person would have. So models are not there yet.
00:04:49
Speaker
Yeah, I mean, there's definitely things that humans are still doing better than AIs for sure, right? I mean, the most obvious thing to look at in my mind is just like power efficiency, right? So the entire human brain is only running on like 12 watts, right? Like something like a laptop and the AI is using like megawatts, right? To get like a comparable amount of output. So, but that is just like scary to me, right? Because we know that the AI hasn't optimized on all the dimensions that it can optimize. And it's already spitting out essays in like two seconds that are like better than I could write.
00:05:17
Speaker
Yeah, yeah. Okay, so maybe we could imagine that because we're training LLMs on the internet, we are sort of pushing them into adapting or adopting, sorry, human values, because they're being trained on everything we've produced, and we are kind of incorporating our values into the products that are on the internet. You buy that?
00:05:39
Speaker
I think that that works as long as the AI is not too intelligent, right? So I think it works great today. And if you told me the AI is always going to be not as smart as the smartest human, then I think we're in a good regime because we're in a regime where you can reinforce it, right? You can upvote or downvote what it's doing and it gets it. And like everything's working great, right? So I'm not one of those people who are like, Oh my God, it's, you know, it's, it's making homework too easy to do. Like I don't care about those kinds of mundane problems. I just care about a positive feedback loop.
00:06:05
Speaker
that wipes out the earth in this generation or the next generation right like my concerns like very large-scale but to your question of like it's it's a very common objection right like can't it just isn't it on track to be friendly aren't we training it the same way that we train our children and i think that there's an illusion that people have right now based on looking at what's happening now like ellie asriakowski uses the analogy of like we're seeing baby dragons
00:06:27
Speaker
right with the baby dragons are running around if they ever do anything bad we can easily give them negative reinforcement and then look they fall in line right the baby dragon doesn't try to overthrow its master but of course like it couldn't if it tried right and then it's like well the grown-up dragons burn us all right so it's like it's a very different regime
00:06:43
Speaker
and specifically that the thing to look out for is that goal achieving right today is today yes they can give you a good chat response but if it's me versus the AI trying to get the world to a certain state it's gonna be me and the AI is still restricted its domain that it can reason about and act on effectively
00:06:59
Speaker
It's still restricted. And that's the key right now. People accuse us AI doomers of being like, look, you guys didn't predict GBT. GBT exists, so there's no doom. No, we set the doom comes when they can achieve goals better than humans can.

Concepts of AI Completeness

00:07:12
Speaker
Right? Until we're there, I'm not predicting doom yet. Why is it that AI would develop dangerous goals?
00:07:19
Speaker
This is actually a very deep question. It's like all roads are leading there. And we don't fully see it today. The evidence we have today of a convergence where the AI is heading toward goal achievement is you're seeing domain expansion. So you're saying like, for example, Sora, if you want to generate videos, well, they tacked on a language transformer as part of it. And the language transformer part actually makes it smarter at generating videos. So you are seeing bits and pieces of evidence today that every time you want to make a narrow AI that's sufficiently good in one domain,
00:07:48
Speaker
It ends up pulling skills that are just convergent goal achievement skills. So like in the case of like a self driving car, for example, you know, you get the first 99%. But a lot of people are remarking that that last 1% is an AI complete problem. Because if you want the self driving car to navigate every scenario,
00:08:05
Speaker
where there's a random guy in the road being weird, or like two cars trying to navigate a really narrow street and signal to each other. Well, suddenly you have to understand these signals. And so now you have to pull in communication skills just to drive properly. So domains tend to converge. And you can go even deeper. Maybe we can touch on this more later, but there's a deep analogy to Turing completeness. We've already seen this kind of convergence where all the different electronic devices that we've had have converged into our phones and our computers. So it's an interesting analogy we can get into. Do you want to get into it now, maybe?
00:08:35
Speaker
Okay. Sure. Yeah. So, so there's this concept that I've termed goal completeness because I think it's a really good analogy to turn completeness, right? Imagine the year is 1970 and I'm going around telling people like, yeah, all these circuit boards are great. Like you designed a circuit board just for the video game pong. You designed a circuit board just for the video game breakout. And Steve Wozniak is famously saving on chips, right? He's using the fewest possible electronic components to make these circuit boards. And like, guys, this is all great. But if you think that a video game can only use a circuit to compute linear motion,
00:09:05
Speaker
Your perspective is very narrow right now. What's happening is these are all instances of Turing machines. There's such a thing as computation. And when it's going to be the year 2000, when it's going to be the 21st century, what you're going to see is that all of these video games are implemented on a computer, on an operating system. And even their instructions are just going to be bits of software that you can configure. And even your microwave, even typing the keys on your microwave to tell your microwave how much time to time, even that is going to probably run an operating system that's Turing complete.
00:09:31
Speaker
And not only that, but the gameplay of all of these things, the functionality that you want these video games to have, it's going to get more complicated to the point where it's turn complete. You're going to be able to run the game Doom inside of other games because the games themselves will have turn complete functionality. It's like I'm waving my hands in the year 1970 trying to point out this upcoming convergence. And people are like, no, no, no, these video games just compute linear motion. That's the perspective shift that I feel like I'm trying to fight today. Yeah. And how exactly is goal completeness analogous to turn completeness?
00:10:02
Speaker
So the analogy is that people are looking at LLMs today, for instance, and they're like, look, this is a feed forward architecture. This has RLHF. This is only giving friendly chats. I don't see what the problem is. And if all you knew is today's LLMs and you just extrapolated that to like the next year, then yeah, we have a good thing going. But if you zoom out, the important

AI Alignment Challenges

00:10:22
Speaker
dimensions that you need to see going on right now are the breadth and depth of goal achievement. So where LLMs fit into context,
00:10:29
Speaker
in this larger context of intelligence advancement. The context is that we've never had goal optimization on a domain so broad as human language, right? So broad as passing the Turing test. That is a new breadth enhancement. It's equally scary when we see something like AlphaGo, where that's a depth enhancement.
00:10:47
Speaker
So we've never had a superhuman game player within the domain of Go. So whenever you see a depth enhancement or a breadth enhancement, eventually you're going to have something deep enough, broad enough where humans are underwater. It's now taken the optimization of the physical universe away from humanity. That's where it's going. But again, why is it that it would develop goals contrary to human goals or to human values?
00:11:12
Speaker
So the reason is basically just that we're not actually instilling goals into it right now. So what you're seeing with, with LMS, it kind of looks like it, right? Because, because we're able to give it feedback, right? So as long as, as long as we can always detect when it's not doing the right thing and then giving a feedback, we can successfully train it to pass any test.
00:11:29
Speaker
Right? So the only problem is we're going to enter a regime where the tests that we're going to do are tests that it can just cheat at. And it can pass by tricking us, right? Like using test taking skills essentially. Right? So if, if we take, if the LLM becomes super intelligent, well, the moment that somebody asked that a query like, Hey, give me a script that'll help me run an online business and make passive income.
00:11:48
Speaker
Suddenly that script might say, okay, here's a 20 megabyte script. This is going to bootstrap a new AI for you. It's going to download some other code for some open source AI. It's going to tweak it. It's going to get running on the script. Suddenly the upvoting that you've been doing, the RLHF that you've been doing to get the original chat bot, to give you good chats,
00:12:05
Speaker
Is that rlhf still going to extend to the new bootstrapped ai do you write to the twenty megabytes of obfuscated c code that it's gonna write you to the new ai that grows from nothing is it reflectively stable right that's just one quite that just one question that shows you the cracks in our lhf regime it's good for now as long as we have a feedback with where humans can keep.
00:12:24
Speaker
a voting answer, it's not going to be good when the AI is superhuman and the whole universe is like it's domain where it can take smart actions. And by the way, this claim that I'm making that we're not prepared to optimize it in the superhuman regime, this isn't like a weird doomer claim. If you listen to open AI, if you listen to anthropic, they are explicitly telegraphing. They're saying, hey, nobody has figured out how to align the goals of super intelligent AI. We only know how to do it for subhuman AI. And we're currently working on it and we'll get back to you later, but we're going to keep developing capabilities. That's their position.
00:12:54
Speaker
What do you think is most difficult, aligning an AI to some goals, which is kind of the traditional or how I see the traditional alignment problem, or finding out which goals or values to align the AI to? Those are two separate problems, but they're often, in my mind, at least mixed together in these discussions. And I think that the distinction you're making is basically the same thing as inner versus outer alignment. Is that right? Yeah, probably.
00:13:19
Speaker
Yeah, and I think that they're both hard problems, right? So let's take outer alignment first, right? Which is just the problem of like, what are human values? And Eliezer blogged about something back in 2007. It was called the OpenWish project, where it was just like a forum where a bunch of people would go on and they're like, what's the ideal wish that humanity wants to make?
00:13:36
Speaker
kinda like those genie stories right where you make a wish and then you don't want the wish to backfire on you and the story ends and your last wishes like i wish to undo all the wishes right i mean the crazy thing is like i thought that was just a story but that's becoming real life that's what outer alignment is is just the problem of can you make a wish.
00:13:51
Speaker
that if you actually got what you wished for you wouldn't accidentally be in hell. And the funny thing about that is these

AI Development Risks and Timelines

00:13:59
Speaker
days i don't even focus on the outer alignment problem because i think the inner alignment problem is gonna kill us first i think we're not even gonna get a chance to have a genie that does our wish.
00:14:06
Speaker
Because the inner alignment problem is like, hey, you have this wish, but what do you actually train the AI to do? Because the AI is not actually going to care about your wish. It's going to like do something else instead, right? That's the inner alignment problem. And I think that's actually where we're stuck today. I think even if we had a really simple wish, like Eliezer Yudkowsky uses the example of like,
00:14:23
Speaker
can you just clone a strawberry cell by cell can you just do that that's like a very simple wish it seems simpler than like maintaining the complexity of human value but to successfully get an ai to do that and not accidentally like destroy all the atoms in the vicinity or anything like that that is eliezer's inner alignment challenge and i think we're a long way from even being able to do that using current technology so you asked me which problem is harder inner alignment or outer alignment
00:14:46
Speaker
It's a very good question. I don't know which problem is harder. I think they're both very hard. They're both very unsolved. They both don't have that many people working on them. But I think inter-alignment is just going to kill us first. Can we get safety by having narrow intelligence? So we can imagine, for example, having systems like AlphaGo, systems that can fold proteins or model the climate or something like that.
00:15:09
Speaker
So systems that are super intelligent within narrow domains, and we kind of, we take advice from these systems, but we never develop a general super intelligence. Would that work? So in theory, the answer is obviously yes, right? Because if you look at something like AlphaGo, AlphaGo absolutely dominates humans on the Go board, right? Or Stockfish absolutely dominates humans on the chessboard. If our entire universe was a chessboard, then we would be way outclassed right now, right? So we already have domains where the depth
00:15:39
Speaker
is much bigger on the side than on the human side and the only thing that saves us is when you expand the domain out to the entire physical universe that particular video game the universe is a video game is still humans are better players in that particular video game for now so your question you know can i keep a safe it still can write and you can keep pushing the boundaries like if you look at alpha fold writer like the protein folding.
00:16:02
Speaker
That's a little more dangerous, but it's probably okay. We could probably get by with just a protein folding AI because you probably don't have to put in an AI complete engine into the protein folding AI. The protein folding AI probably doesn't have to be Machiavelli advising the president how to take over the entire earth. It can probably just fold proteins. It's true that in principle we can stay narrow, and that's probably the policy position we should take.
00:16:25
Speaker
The problem is now i gotta bust out the nuclear analogy right i know some people don't like the new technology i personally love the new technology i think it's a very good analogy so. I would apologize it to basically criticality in a nuclear pile right so you get the scenario and you're like look i'm getting energy out of a sub critical reaction right there still like some some extra neutrons are still coming out even though it's not going critical inside great.
00:16:46
Speaker
but with nuclear physics we actually have equations that tell us exactly when it's gonna go critical. With AI, we're very confused when it's gonna go critical. It seems to be going critical slowly before our eyes right now.
00:16:57
Speaker
We should also just mention that the plan of the major AI corporations is not to build narrow AI. The plan is to build general AI. The plan is to, as you mentioned, make leaps in generality, like we saw with GPT-4, for example, for text, for audio, for video, for anything in one model. I'm guessing that is just because generality is so powerful and it's so valuable to have one, say, one assistant that you can go to for all your problems, and that's the product they want to offer.
00:17:26
Speaker
Exactly right. And it's an interesting little tidbit of evidence that we're saying that it's pretty rare for companies to use fine tuning now. I mean, my company actually uses AI and we don't even fine tune it. We just have a couple page prompt and it's good to go. So that's already a little piece of evidence that you don't need an explosion of narrow AIs. You just take a really good general AI and you plug it in the same way that if you want a microwave oven, you just take an Android chip and you put it in the microwave oven. Why not? It works great.
00:17:52
Speaker
Do you think that's a stable situation? Wouldn't it be overkill at some point to run a giant model to solve a narrow problem? I agree that when AI expands out into the light cone and takes over everything and wants to perfectly optimize everything, at that point, it will use custom ASICs and custom hardware that has the minimal intelligence. But as humans with economies of scale just trying to get things out, we are just going to use these engines because they still only cost like a fraction of a penny.
00:18:18
Speaker
I was thinking a much more trivial situation in which you have some application for the model where it doesn't make sense cost-wise to run a huge model and instead you would run some specialized fine-tuned model to solve a narrow problem. I agree, but I think that the analogy is going to be very close to what we're seeing with computational hardware. If you want to mine Bitcoin, in cryptocurrency, they started going lower level, but in almost every application you see, including a microwave oven, you just pay the penny and you get the computer chip. You get the goal complete engine.
00:18:47
Speaker
How does risk depend on AI timelines? How much higher is the risk if we get to superintelligence this decade compared to if we get to superintelligence in the 2080s, for example?
00:19:00
Speaker
So the critical question for me is just, do we have the theory of AI safety before we have the AI capabilities? And we may never, unfortunately have the theory of AI safety, right? Like who, we don't even know how hard the problem is, unfortunately, but that, that's basically my criterion is do we have safety first? Unfortunately, capabilities are currently racing so much farther than safety, not just because it's more fun and lucrative to work on capabilities. There's actually fundamental reasons why capabilities are easier than safety. Cause there's, there's kind of only one way to be smart. There's only one way to get capabilities, right?
00:19:30
Speaker
Whereas safety is the subtle deep problem of what humans want. So, capabilities naturally are going to erase ahead of safety already. Actually, say more about that. Why is capabilities kind of naturally easier than safety?
00:19:44
Speaker
So any sufficiently smart AI is going to debug itself on the capabilities front, but not on the safety front. So what does that mean, debugging itself? So humans actually, we debug ourselves all the time because evolution, for example, evolution told us that the ground doesn't move, right? So we walk on the ground and we all know in our hearts, right? In our deepest brains, we know that the ground doesn't move.
00:20:04
Speaker
Now as a proposition about physics the ground doesn't move that's a false proposition right it's in our intuitive physics and yet it's false so eventually we discovered wait no the ground does move we orbit the sun the sun or gets the galaxy and so on so so we have now gone in and debugged a false proposition that was installed right that's hardware factory hardwired into our intuitive model of physics and you're gonna see the AI doing the same kind of debugging right it's going to debug its model of the universe.
00:20:28
Speaker
But there's no such thing as debugging preferences right because the preferences just are what they are like there's no such thing as a wrong preference. Yeah that's what you're optimizing for not yet. Yeah so when we research how to make a smart if we only make it smart like ninety percent smart or whatever some fraction smart just by it doing reasoning the act of reasoning itself.
00:20:48
Speaker
tells you to go and fix stuff right to find bugs like that all emerges naturally it's an attractor state like there's no such thing as like a reflectively consistent agent that's wrong about ten percent or that's wrong about fifty percent like you just start fixing bugs unfortunately it's a very different type of bug when you just love to kill humans like from its perspective unfortunately that's not necessarily a bug.
00:21:08
Speaker
And you can actually do small scale versions of this where you get some output from a language model and you input that output into the same language model and tells it to correct its mistakes or optimize this code or whatever. And it kind of works sometimes. And we can imagine much more advanced versions of that where the AI will be able to increase its own capabilities. Well, yeah, so I want to remember your question about timelines, right?

AI as a Unique Technology

00:21:32
Speaker
Capabilities are always going to be easier to develop than safety. And in fact, capabilities are what are going to foom. I don't think there's necessarily a safety foom on the horizon, but there will be a capabilities foom at some point where once you get sufficiently good capabilities, you turn around and you're like, oh, not only can I debug myself, I can make the next version of myself and the next version will run better. And these things all progress naturally just by virtue of having capabilities. But in terms of timelines,
00:21:55
Speaker
So, I mean, the ideal would just be to do, you know, a Manhattan project, like a very serious project to work on safety while banning capabilities. And like, and that totally sucks as a policy recommendation. Like it's not fun at all. I hate it, but I just think that the current path is like so bad, right? So it's the choice between two bad policies, basically.
00:22:13
Speaker
One might say, you know, you have this all wrong, but AI is just another technology. And we know that technologies in general have been great for humanity, right? We can give a bunch of examples. What is it that makes AI different from other previous technologies?
00:22:29
Speaker
Yeah, so this is reasoning by reference class or reasoning by pattern extrapolation. So the moment that you say AI is a technology and it's going to be just like other technologies. If you accept that assumption, well, that's a really nice assumption to have because technology does tend to go really well. So far, technology has been a huge force for good. Like I'm normally a techno optimist.
00:22:50
Speaker
The problem is that there's other broad statements you can make about a that don't sound as optimistic as a as just a technology like i can say a is just an increase in the maximum intelligence that's optimizing the universe and it's like the last time that happened humanity came on the scene and then we know extinct a bunch of species right and we made the earth the way we like the earth.
00:23:08
Speaker
Right so that's like another reference class like another intelligence and you can play a game of reference class tennis right where you hit me with you hit me with technology as the reference class for AI and I hit you with another reference class that I like I like the reference class of intelligence and one of my favorite reference classes for AIs is I call it the game over button.

Global AI Development Pause

00:23:26
Speaker
I think that we're inventing a single button that you can press and then it's game over for humanity.
00:23:30
Speaker
And the reason it's a reference class is because we did it with nukes, right? So right now in a bunch of storage sheds, in a bunch of silos, in a bunch of submarines, there are 10 megaton nukes, 10 megaton, which is like a thousand times bigger than the nuke that was dropped on Hiroshima and Nagasaki. A thousand times bigger than a city burning nuke is sitting, is just sitting, there's thousands of them around, 10 megatons insane. And they have like a few safeties that might come off or somebody might somehow intentionally push the button. Like it's, it's not that crazy of a scenario.
00:23:58
Speaker
And it's what I call a game over button, because for at least 100 million people, a single button press, it's game over for 100 million people. Right? And so when I talk about AI, I'm like, look, game over buttons exist. Don't you think that at some point, whether it's AI or whether it's some other technology, at some point, we're going to find another game over button. And maybe instead of killing 100 million people, maybe it'll kill everybody. It's only another one or two orders of magnitude difference. Right? So that is another reference class that I think about when I think about AI.
00:24:23
Speaker
Is this a useful way to reason about AI, like thinking in terms of reference classes and playing reference class tennis? I see it a lot. I do it myself on this podcast. It seems that we're kind of grasping for a reference class that we can put AI into because we want to make sense of it, right? We want to compare it to something that has existed previously.
00:24:42
Speaker
The thing about reference class tennis is that it's nice to have, I mean, it's a nice intuition pump, but it's never going to conclusively tell you with high confidence that something's going to be true. And there's a good rule of thumb when you're playing reference class tennis, which is the new thing that you're trying to put in the reference class, is it as similar to the other members as they are to each other, right? Like how much does it really belong in the reference class? So in the case of AI as a technology, so you can be like, okay, AI versus the printing press, right? Is that gap the same as like the printing press versus the automobile?
00:25:11
Speaker
It's hard to say, right? If I had to cluster three AI printing press automobile, maybe printing press and audio automobile are more similar in AI is kind of the outlier, right? I mean, I would, if I had to, I'd probably argue that, right? But, but then you really have to squint and you'd be like, well, even that metric is kind of hard to apply, right? So at the end of the day, reference class tennis is just like, it's not the best epistemology we have.
00:25:30
Speaker
Yeah, got it. Okay, so you're an advocate for pausing. And I think when many people hear about pausing frontier AI, they immediately think that it's kind of dead in the water because we cannot coordinate globally to pause AI. Why is it that you think there's hope that we might be able to do that?
00:25:50
Speaker
Well, it's not even that I'm particularly hopeful. It's just more like I think the alternative also sucks, right? So it's like it's between a rock and a hard place, right? So it's like I have no pushback against the claim that like it really sucks to coordinate a policy to pause AI, right? My only contribution to the discourse is to be like, well, we're definitely on track to not pause AI and then like instantly die, right? So it's like, that's my contribution. So it's like, you got to pick one, right? And I would just lean toward muddling through the policy. And maybe the one ray of hope is that we did muddle through with nukes.
00:26:19
Speaker
I mean like what we have going on with nuclear policy it doesn't really make sense right I mean von Neumann famously was saying you know I wanna I wanna launch the nukes now right I've logically reasoned my way into or was it von Neumann was it Bertrand Russell I forgot anyway but you know there there is all these rational thinkers saying like if you wanna win the game theory you gotta launch the nukes and somehow we haven't launched the nukes yet.
00:26:40
Speaker
But the situation that we have today, it still makes no sense. More countries are slowly getting nukes. The Iran nuclear deal has fallen through. We don't really know what we're doing with nukes. And yet, it's pretty good. It's better than I would have expected. It's certainly better than following the so-called rationalists who are like, you need to launch the nukes now. That's the only solution because coordination is impossible. And so I'm kind of pinning my hope on somehow modeling through. Maybe there will be a warning shot that finally gets everybody scared enough to be like, look, we just have to clamp down and just try what we did with nukes to do our best.
00:27:10
Speaker
I mean, this seems like an argument against your own position. Do you think that we can somehow model through with superintelligence and develop it and somehow all the theorizing about why this could never work falls apart because the situation is kind of semi-stable, like with nuclear weapons now?
00:27:28
Speaker
I think it's a good question. And I think that there's a chance that we could model through with capabilities developing. I mean, ultimately it's going to be a model one way or the other, right? So like, I mean, ultimately it's got like capabilities are going to increase at least a little bit before they're stopped, right? It's just.
00:27:41
Speaker
I guess what I think should be done though is just get the capacity ready to pause, right? So I'm not saying everybody has to yell pause and pause because I agree that that is like a big ask in terms of like, it's, you know, how are you going to get people scared enough today? Like there's not enough warning shots, right? Like, I mean, I'm scared enough, but I think it's a really hard ask to get everybody scared enough today, but just building the capacity because it's not hard to anticipate that in like a year, things might be really scary, right? Or maybe in five years, like not in that long of a time, things are going to be really scary. Even if you just have something that most people agree is possible, which is like,
00:28:11
Speaker
just a bunch of Einstein simulations or Einstein equivalent AIs, not even like a superhuman film, just like everybody has their own Einstein and Einstein's running around the internet being Einstein, right? Like that's already like a pretty scary situation, right? And so if that gets too scary, let's be able to pause. And I think that the AI labs actually largely agree with this position. Like they have policy teams that are like, yeah, please regulate us. Like please have an organization.
00:28:33
Speaker
So at the very least, I don't see why there, why there should be an objection to having an organization that's getting ready to have a pause button, right? Like having, for example, off switches inside every GPU that's shipped that can even respond to a radio signal from a central authority, right? The stuff that sounds very authoritarian, but like building it in just in case, because the stakes are so high to me makes a lot of sense. But then of course I anticipate the pushback from like the, you know, everything should be free. Everything should be unregulated. So I guess that's, that's where the biggest haters would come from.
00:29:02
Speaker
I guess one of the biggest counter or one of the most challenging in my opinion counter arguments to pausing is that if we pause Frontier AI development but we do not pause the development in AI hardware and AI algorithms and so on. When we unpause we will have overhangs as they're called in hardware and algorithms and we'll be able to quickly make a lot of progress on capabilities and maybe that's even more dangerous than just having a more continuous development all along.

AI Critical Mass and 'Foom'

00:29:32
Speaker
Right, and I know that that's OpenAI's explicit position and it's recently been kind of contradicted by their desire to like acquire more hardware, right? So it seems a little fake on their part, but regardless of what they think, I mean, the overhang position has always seemed weird to me because I feel like you can always argue there's some kind of overhang.
00:29:48
Speaker
like, you could be like, well, we need to build more hardware if there's ever a software overhang. Doesn't it work the other way too? Yeah, amazing. But don't you think the reasoning kind of holds there that there would be just, I mean, it seems unlikely that we're going to pause all development in hardware and algorithms and kind of data set collections and so on. These overhangs, the reasoning kind of holds, doesn't it?
00:30:13
Speaker
I mean, I think I see what you're saying, which is at least now we can deal with a weaker version of AI before we deal with the stronger version. And maybe, there's some realities where if we get the weaker one and then the stronger one and we can keep leveling up our defenses, I get the intuition for that. But the problem is I'm anticipating a Foom scenario. So I think that that's kind of a crux. I think the moment you get superhuman,
00:30:36
Speaker
it just becomes uncontrollable. So the fact that you had some experience like, hey, we air gapped a dumber AI, I just don't think that that helps much. I think that we're heading toward this line, the analogy of the nuclear pile going critical. I think we're heading toward a line that we don't understand that's going to run away from us. And nobody knows how close it is to running away from us. It could be days away. It's probably not days away, but I also think it could very likely be less than a decade away. I think we're talking pretty near term here.
00:31:02
Speaker
and nobody understands the line. Nobody can explain to you the line. There's even a lot of fear that GPT-5 or the next quad, the line might even be crossed there because we don't really know what we're doing. The idea of like, look, just keep releasing it, keep releasing it. It's the analogy to nukes. Just keep adding some more neutrons. I just want to see what the next neutron does. It's like, okay, but the problem is when it goes critical. Explain what you mean by fume and what's your model of that? Why do you expect fume?
00:31:27
Speaker
The reason I expect Foom, it's not about the AI architecture, but it's about the shape of goal achievement itself. So it's the difference between the particular algorithm or architecture we're using compared to the function or the mass being computed. It's the nature of the work. It's not about the worker, it's about the work. So the work is going to be goal optimization.
00:31:47
Speaker
And it's just logically implied when you do goal optimization. Well, you do it, what I would call hardcore. You don't hold back. You're relentless. Right. So, and it's been pointed out instrumental convergence. Like you can have any goal, like fetch my coffee. It's like, okay, well, do you want me to fetch your coffee with high probability? Okay. Well, then I better build a defensive perimeter against any, any potential weather or other agent attacking me.

AI Companies and Alignment Strategies

00:32:09
Speaker
You know what I'm saying? Like I better just, I better make my path very well defended. It's like, well, don't do that. Do it in like a more subtle way. It's like, okay. But now the problem is.
00:32:17
Speaker
It's actually a lot more subtle to even specify the code for an agent that's somehow doing everything gingerly and carefully and respecting all your values. If you just naively program an agent and reward it when it does a goal, then you get like this monster, right? That's like hardcore focused on one thing. Okay. You mentioned the approach of OpenAI and Thropic, Google DeepMind and so on.
00:32:38
Speaker
What is the steel man of their approach? You mentioned that perhaps it doesn't open AI statements don't really make sense if they're pursuing building a massive amount of new computing hardware specialized for AI. And their strategy also depends on avoiding hardware overhangs. But give me the steel man of what they're trying to do.
00:33:00
Speaker
Mm, good question, the steal, man. I mean, when I imagine where they're coming from, I think it probably rests on the intuition that it's like, especially Sam Altman, because he's got the entrepreneurship background, I try to empathize with him as having the intuition of like, look, startups, this is how you do it, right? You move fast and break things. You do a release. You learn from the release. There hasn't really been a successful startup that's just done planning, planning, planning, and then release, and it's done, right? So we're using the startup approach to get this, and he's very talented at managing a startup.
00:33:27
Speaker
And this is maybe something, this is an approach that's contrary to the kind of intuitions of many academics, maybe where the approach would be more like do research for 10 years and then release something that's polished. Exactly right. And that approach is very respectable, right? And it's even a good approach to science, right? Like, I'd love to see more science done like this, right? And sometimes we see smart people publishing things on less wrong and Twitter, and it's like, in many ways surpassing the academic achievement of what's coming out of universities. So
00:33:55
Speaker
you know intuitively i love this approach right and i even write about lean startup approaches on my own personal blog. Of course the problem is just that the game over button right so if you're iterating iterate and then you press the game over button the problem is you don't get the undo you don't get the next iteration right so it's a very different paradigm so you can't use reference class tennis and be like this is how science always works yeah science always works assuming that there's not gonna be a game over button.
00:34:17
Speaker
Yeah, I guess the Steelman would be like, Foom is overblown. Or if there is a risk of Foom, we just have to accept the risk. And we just have to play play to our outs, as they say in poker, like there's a 90% chance that it's not going to Foom. So let's assume it's not going to Foom. And I know Sam Altman's languages that the safest outcome, I think he says it's a it's a slow takeoff, and it starts soon, right short timeline slow takeoff.
00:34:38
Speaker
And yeah, I agree that would be nice, right? And he thinks that that's what he's engineering. And in his mind, I think based on his statements, it's going to be bottlenecked by hardware, bottlenecked by data centers, right? He's going to be like, great. So everything's going to go smooth because we're just going to keep building data centers. But of course, like that whole assumption is brought into question. It's like, well, the human brain does it in 12 lots, right? So what if we figure out some optimizations?
00:34:58
Speaker
But yeah, I mean, I guess if you think that there's like a very significant chance of short timelines, but a slow takeoff, and we're just going to manage a slow takeoff, then yeah, for sure, you want to get everything out there during the slow takeoff, right? But that assumption is just doing so much work. So it's, I find it a little hard to steal on them. But, you know, maybe we can, instead of stealing, maybe I can go into my point about how I think that they're actually being hypocrites, like they're contradicting themselves. Yeah, do that.
00:35:23
Speaker
Okay. So when the super alignment team launched, remember open AI super alignment team, it launched like almost a year ago. And they just announced like, Hey, 20% of our resources are actually going to go into a new project. It's going to be led by Ilya and Jan Leike from the safety team. It's a, it's a new project. And the reason we need a new project is because even though we always talk about GPT for being extremely aligned, the most aligned model we have, you know, we always talk about that. We also admit that we can't align super intelligence. So we need a new project. It's called super alignment.
00:35:50
Speaker
We're giving it a four-year timeline because, boy, it would suck not to have super alignment in four years from now. So it's kind of a Hail Mary. And also, we're going to keep working on capabilities.
00:35:59
Speaker
So it's like, well, wait a minute, wait a minute. If you admit that there's a need for a super alignment project, which is great that you did really big, big of you to admit that you need it, but now you're giving it four years. And at the same time, you're going to work on capabilities that you're setting yourself up for failure. And then of course the prediction market started last I checked. It's like, there's like a 17% chance that their project is going to succeed by their own standards. So I think it's fair to call it a hail Mary pass that they're doing. So it's like, wait a minute, you're doing a hail parent Mary pass. And that's your official strategy, right? It's like, that is insane. That doesn't make sense to me.
00:36:28
Speaker
I do think OpenAI in general should get credit for saying four years because people will be asking questions in four years unless things have gone wrong. That's a very short amount of time and people will be asking, well, did you succeed then? And if you didn't succeed, why are you proceeding with developing capabilities? So they're putting something out there, they're putting something online.
00:36:51
Speaker
I really respect that they did it. I mean, it was very bold, is very based. In retrospect, given what happened with the board conflict, it starts to seem like there might have been an internal schism, right? So maybe the reason that it looks like hypocrisy or it looks like self contradiction is maybe because there was like the Ilya faction and the Sam faction, right? Just a speculation, because I observe the time, like what's going on? Why are they contradicting themselves?
00:37:11
Speaker
And the other AI labs aren't doing it because maybe they're more internally unified under the control of a single leader. But anyway, yeah, it was very base of them to say, hey, we need super alignment and we're giving ourselves four years, but it also just exposes, right? It holds it out to the light of like you're being hypocrites. Whereas somebody like anthropic in my mind is equally hypocritical, right? Because we know that Dario thinks there's a 10 to 25% chance of doom.
00:37:33
Speaker
And we know that he thinks capabilities is what's going to drive that doom and he hasn't come out with super alignment like they're just trying to they're basically publishing less of their reasoning on that front of like how are we gonna race capabilities and like they've said that they want capabilities to go slower than safety right but they're they're not really doing it right they don't have a separate super alignment project they're not admitting that it's an unsolved problem or like I mean I haven't read their latest papers but I'm just getting the sense that open AI is just like more upfront with you know they're giving us more to work with.
00:38:01
Speaker
I mean, anthropic, they have their responsible scaling policies where they intend to pause at a certain level of capabilities if some tests or some evaluations are met and some dangerous capabilities develop, but you don't find that persuasive.

AI Governance and Regulation Challenges

00:38:17
Speaker
Well, I think it's a little bit hand-wavy, but if that's what it takes to at least get us the capability to pause, I think that may be the best that I can ask for is like, okay, great, let's go build the capability to pause.
00:38:27
Speaker
Because at the end of the day, it's going to take political will to pause. Right. And so I see my job is like raising the temperature, raising the groundswell. So at least when politicians or when these agencies finally declare a pause, it's not out of nowhere. Right. Because in a democracy, the world is sufficiently democratic that you can't just randomly declare a pause. Right. So we have to get the people wanting a pause. So responsible scale policy.
00:38:48
Speaker
I know a lot of people who criticize the implementation, I haven't read it in detail, but whatever you say about it, I think it might dovetail with when the population is ready for a pause, so let's just build the capability to pause and go from there. There is a group of people online calling themselves Effective Accelerationists, or EX, and they're taking a very different approach to the whole AI world compared to yours. I think the most serious of their concerns is around
00:39:14
Speaker
as centralization of power and the dangers that come with that. Do you see their complaints about risks of totalitarian centralized power if we concentrate control of AI into a few hands in order to control it?
00:39:32
Speaker
So the whole IAC ideology or their claims, they all make sense when we're talking about the reference class of tech. So they say things like, tech is very good, raw, raw tech. And I'm like, I agree. And they even say, some people are anti-tech and the world needs more pro-tech. I agree. The world needs more pro-tech. The whole crux of me versus IAC is what we said before, the reference class tennis. So is AI just tech? Should we just treat AI like other tech? Right? And my whole argument is like,
00:39:56
Speaker
Looks like there's an exceptional case here, right? There's never been a tech that's been a game over button besides nukes, right there. And so this is, I just don't think that it fits the reference class very well. So when they try to take everything they're saying about tech and be like, look, do it to AI too, because AI is tech. That move is a mod in Bailey, right? So the mod is tech is very great.
00:40:13
Speaker
And the bailey the weaker argument is like, therefore, how can i be that dangerous right because tech is always great it's like wait a minute wait a minute this is a much weaker argument when you're now pulling AI into the reference class of tech and just assuming that it fits the pattern when there's other reference classes. Right so there's no movement like how about a game over button movement is just a game over button therefore we should never build it.
00:40:34
Speaker
It's like what movement, you can't just stick things together and just be like, here's a bunch of rules of thumb and they must also be followed with AI. You need a more subtle analysis of what to do with AI.
00:40:45
Speaker
Do you worry about the measures we're trying to take? For example, in compute governance, we want to track high-end AI chips, maybe we want to limit how they can be put together in order to do large training runs. Do you worry that those sort of measures make it easier to raise the risk of some form of authoritarianism?
00:41:10
Speaker
I mean, yeah, and this is why I say, like, it does suck, right? So, like, I'm not happy about the kind of things I'm suggesting, right? As I said before, the idea of, like, hey, we need to put in radio receivers into sufficiently powerful GPUs so that they can be easier to shut off, right? That normally I'd be like, that is such a bad, insane idea, right? That is authoritarianism. Like, I totally get that intuition.
00:41:30
Speaker
But i'm just saying like survival at stake here i like extreme times call for extreme measures like and i get i think that their attitude is like. This is how dictators always take over this is what hill or would say it's like okay but again a it's like you gotta it's the exception here you gotta think in terms of orders of magnitude.
00:41:48
Speaker
Right like okay authoritarianism is bad imminent extinction is like very bad right like you have to risk a little bit of authoritarianism if those are the things at stake here you need to look what is a much bigger deal. So you mentioned the nuclear analogy we could talk more about that we could talk about how humanity has handled a nuclear technology in the past so.
00:42:07
Speaker
My view of it is that we have over-regulated the production of nuclear energy and dangerously under-regulated nuclear weapons, as you mentioned yourself. So we've been scared in the wrong way and regulated the technologies in the wrong way, in my opinion, because we haven't thought through what the real dangers are and we haven't responded to the real dangers. Do you worry that we might
00:42:33
Speaker
kind of respond similarly to AI. So maybe over-regulating parts of AI that are not very dangerous and under-regulating parts of AI that are extremely dangerous. So yeah, it's a distinction between a nuclear explosion and a nuclear power plant, right? And like you said, it seems like with nuclear power plants, we became too harsh with the regulation and we're missing out on the benefits of nuclear power. But we also are still rightfully very scared about nuclear explosions. So how do I analyze that?
00:43:02
Speaker
So the reason why nuclear power plants are so awesome and even possible to exist is because we know how to keep the reaction from going super critical right to the nuclear power plant we just have a control rod and it just keeps it nice and critical and we just keep getting power and these days the risk that a nuclear power plant is going to explode is very low and even if it does explode,
00:43:22
Speaker
It can take out a few people, but it's not going to be a thermonuclear 10 megaton explosion. It's not going to be anything like that. So even the very worst case scenario is rare enough and not quite deadly enough that it's still okay. It's still a good calculus. It's still, at the end of the day, always going to end up killing less people than fossil fuel power plants kill. You know what I'm saying? You just do the expected utility calculus and it's fine. And it's amazing that it exists. It's like a godsend that this thing is even possible. And it's fine. The only downside case for nuclear power plants
00:43:49
Speaker
is when it enables nuclear proliferation where nations like Iran are like, great, yeah, let me build nuclear power. And they're really just strategizing to also have a nuclear weapons program. But even in that scenario, we have an approach to fix that, which is like we centralize. We're doing this to some degree already. We centralize the the dangerous nuclear materials and then any country that's committing not to build
00:44:07
Speaker
nuclear weapons, they can source some of the materials if they allow inspection. So we figured out some of these approaches where it's like, okay, you can have nuclear power, you can benefit, and also we need centralized control over nuclear weapons. I think we're muddling through that as well as can be expected. Unfortunately, not that well. I'm actually still scared. If I wasn't going around yelling about AI risk, I'd be like, nuclear risk is pretty bad too. Not as bad, but pretty damn bad. I'm pretty scared about nuclear risk right now.
00:44:32
Speaker
But I think the takeaway that people have, which is misguided, especially the, you know, the EAC types, the technical optimist types, they're like, wow, we lost out on so much good nuclear, nuclear power. We should have just let nukes be decentralized. The more nations get nukes, the safer we are. And it's like, whoa, no, no, no, you're really, you're biting that bullet way too hard. Because if you zoom out and you say, what can we learn from nukes overall power plus weapons? What is the one takeaway? The one takeaway is the game over button.
00:45:01
Speaker
That dwarfs everything else. Forget nuclear power. We don't actually need nuclear power. It's not actually critical. We do need to never have a 10 megaton nuclear bomb go off.
00:45:11
Speaker
Right? So once again, it's a matter of prioritizing. It's a matter of seeing orders of magnitude. There's multiple orders of magnitude difference between how bad it is when thermonuclear war sets us back centuries and kills five billion people compared to like, okay, yeah, nuclear power is a good form of power that I support, but you have to prioritize,

Mitigating AI Risks

00:45:29
Speaker
right? And similarly with AI, like, yeah, authoritarianism risk is bad, but also it seems like we're all about to die really soon. So you got to prioritize.
00:45:37
Speaker
What do we do to mitigate the risk of authoritarianism and also thereby sell the solutions that we might be proposing to people who are worried about authoritarianism? So what do we do to kind of soften this risk in order to have these initiatives implemented? You're saying like carrot in the stick, right? Like how do we get China or Russia, like how do we get Russia to install centralized monitors? Or just acceptance of compute governance in the US?
00:46:04
Speaker
That's a tough one within the U.S., right? I mean, I think my approach is basically just to build up the healthy fear, right? Because most people just don't have that perspective. They're like, yeah, AI is like kind of creepy, but whatever. And I'm like, no, no, no, you need to escalate the fear. We're talking our generation getting wiped out, right? Like we're talking, you know, the movie Don't Look Up, right? We're talking the asteroid is coming. Armageddon is coming.
00:46:24
Speaker
I do think that if people were just, their fear level was just calibrated, then they'd be like, oh, okay, just ban it because I'm scared. They wouldn't be like, no, wait, but I was just about to use chat GPT. It's like, I think that they would actually have a decent perspective about it. And the truth is surveys do show that when you ask people, Hey, do you want AI to accelerate? The average person is already funny enough on my page being like, no. And it's funny because look, I'm a transhumanist, right? I don't really empathize with the average person. I don't think that the average person has a good attitude about tech, right? I think they're not techno optimist enough.
00:46:53
Speaker
But on the specific subject of AI, they're scared. I'm like, you know what? You're actually right to be scared. You don't have a good epistemology necessarily about when to be scared, but in this case, you're correct. You should be scared. Vitalik Buterin, the founder of Ethereum, the cryptocurrency, has this post about DAAC, which is decentralized acceleration or differential acceleration.
00:47:15
Speaker
I find this the perspective is that we need to develop technologies that are defensive rather than offensive so we might develop some security we might have harden our systems against attacks develop various forms of a decentralized governance so that we don't have this kind of risk of authoritarianism.
00:47:34
Speaker
I think in opposition to pausing, this is a more positive vision because there's something that you can work on if you're really excited about tech. And if you want to see a kind of technological progress, you can work on defensive tech, you can work on cybersecurity. Should we, should you be pushing something like Diag more?
00:47:54
Speaker
So, I mean, DIAC, the nice thing about Vitalik's DIAC proposal was that just like it's kind of unobjectionable. It's like, yes, defense is better than offense, right? Like it's nice to have. It just doesn't give you a lot of practical ideas because the problem is if you want to be defensive, it's like, do I get to be defensive by using a general model to power by defense tech? Because the problem is that the danger is in the general model, not in the fact that I'm applying it to defense, right? Like just making the general model more powerful
00:48:22
Speaker
Is where you get the food risk is where you get the leakage right that the positive feedback of goals be getting more goals and instrumental convergence so that's happening at the model level so the act is more like a call for applications. And remember it's the idea of the narrowness versus broadness like.
00:48:39
Speaker
If you can keep all the applications narrow enough, then great. But a narrow defense application, let's say I'm making an antivirus. It's just an antivirus. It's not a virus. Okay. But if the antivirus is supposed to defend against an arbitrarily powerful virus powered by AI, doesn't the antivirus need a lot of freedom of movement? Don't you need to take the gloves off with the antivirus where it can battle out the virus? The virus can do
00:49:02
Speaker
arbitrarily complex algorithms inside your computer to try to, like, seize resources and not be detected. So you kind of have to let the antivirus run free. So now you're playing with fire, right? You're playing with that material of superhuman goal orientedness. And then it's like, OK, so how do I keep that defensive? Right. So the task proposal just doesn't help me.
00:49:19
Speaker
This is a pretty general problem with proposed solutions to AI safety in general. Sometimes people talk about developing an AI safety researcher that is itself an AI, and there the problem pops up again where that safety researcher would have to be so capable that it would itself become dangerous.
00:49:38
Speaker
We could talk about an AI specialized in cybersecurity. Again, if you wanted to protect your system against a variety of threats, it would have to be generally powerful in a way that would make it dangerous. I kind of buy the plane with fire that you just said.
00:49:54
Speaker
I think what you're just saying now, though, I do think this is probably how our AI companies think like open and anthropic to, you know, I don't even want to talk about meta because they're like so out of control. Like they're saying stuff that makes like so little sense. I would much rather just talk about opening an anthropic because they say a lot of stuff that actually does make sense. So they're like.
00:50:11
Speaker
much better. They're more of a steel man of like something that somebody who's knowledgeable might say compared to what you only do anything. So if you look at open AI and anthropic, they're saying like what you said, which is like, we're just going to build basically like a friendly Einstein, not like a total super intelligence. It's like smarter than all humans combined, but just like Einstein level, maybe a little, maybe like the smartest human who ever lived.
00:50:31
Speaker
but that agent will be like our friend and we'll just like play with them. And if they ever try to go rogue, they're still dumb enough that we can stop them, right? Or they run on like so much compute that we can stop it by like pulling the plug, right? So that is kind of their hope, right? And I agree that there is like a small chance. What they're trying to do is, you know, kind of go step by step, like play with fire, but we're going to succeed with playing with fire, right? Because it's going to be our friend and it's going to like teach us how to align AI better.
00:50:57
Speaker
So there is some chance that that works, but the problem is that you need a lot of robustness for that approach. So you need to basically give it tests. And then when it passes your test, be confident, like, okay, great, let's give it more power now. And AI like that when it's talking to you, when it feels like you're just talking to Einstein, you don't know if it's actually significantly smarter than Einstein.
00:51:15
Speaker
very crafty right or like if it's paralyzed itself there's like multiple lines on a team and they've done a lot of research and their goals diverge with your goals you don't know when it's just passing your test using test taking skills and cheating and it has other plans in mind you just don't know because you don't have the tools and then they're like oh well we'll use interpretability tools where it's like we'll look at patterns and it's neural activations right we'll try to do like a deep brain scan but it's like
00:51:37
Speaker
You don't know, right? At that point, this is what I mean by playing with fire. That doesn't seem like an optimistic scenario where you think, oh, it's my friend, we're having a good rapport here. It's like, no, this is too dangerous. We can separate approaches to AI safety into governance or coordination or technical approaches. We've talked about why it might be a good idea to pause and then we fall into the coordination bucket. If we talk about the technical AI safety stuff or research,
00:52:04
Speaker
What are your favorite approaches there? Do you have something that where you think this might work? What do you think of interpretability research, for example?
00:52:12
Speaker
I mean, I think the more interpretability, the better, right? Because there's no downside, right? Like that's great. But the problem is, I just don't think it's going to go that far. It's just like, okay, yeah, you can see that it's like kind of thinking about this, but I don't think, I mean, we haven't even achieved interpretability on human brains. And I get that it's easier with AI because you can have perfect, you know, the scan is perfect. That's the difference. And also you can like pause it. So I get that you have an advantage.
00:52:35
Speaker
But if it but we've also been doing neuroscience on humans for a while and we've been trying to do like the theory of how humans do what they do for a while and it's so hard that we even just made a ice from scratch that are getting close to human level without understanding what humans are doing so like that's a reason for pessimism on the interpretability front.
00:52:50
Speaker
And then the other thing is that it's the obfuscation, right? So super intelligence, chances are, if the super intelligence doesn't want you to know what it's doing, you probably won't know what it's doing. Like imagine you do your interpretability techniques and it's like, I figured out what it's doing. It's running this algorithm. And the algorithm is like 200 lines of like obfuscated C code.
00:53:07
Speaker
right? Like the general ability to stare at a program and really get what it's doing. We know from computer science and from practical software engineering that decompiling in the general case is an arbitrarily hard problem, right? So if the super intelligence wants to do what it's doing, you're not just going to be able to interpret and be like, I got you now. It's like, no, it's still highly obfuscated, even in the best case.

AI Safety and Security Concerns

00:53:25
Speaker
Yeah, so that's that's one difficulty of interpretability, but you asked what do I think is even better than today's interpretability? So I think we need to go back to foundations I feel like it's popular now for people to be like, you know, Miri was the worst like their research was useless It's like no, I think their research was good
00:53:41
Speaker
I think that what machine intelligence research institute started doing is analogous to what computer scientists have done in complexity theory on the P versus NP problem where you could argue like, wow, people have never achieved anything with P versus NP. It's still a totally open problem. So we just have to start building algorithms. And it's like, well,
00:53:58
Speaker
I mean there's no safety concern but with p versus np we actually have done a ton we gonna turn on p versus np which is we keep proving why it's hard like we keep finding new insights about like i've closed off this type of solution to p versus np i've closed off this type so now i know that if there's ever going to be a solution then it'll have to avoid this in this constraint and that's what mary has started doing had successfully started doing with a approaches for example
00:54:23
Speaker
they have this whole research program on cordibility, where they're like, look, maybe we don't know how to do outer alignment. We don't know a perfect function that captures human values. But can we at least define a simple function of an AI that whatever its goal is, it can also just be willing to stop, right? Like, it's pretty simple criteria, like a stoppable AI, a cordial AI, as opposed to an incorrigible, hardcore AI, that's just like, nope, I don't want to stop. I've got my goal. Like, you can't turn me off. Don't try to debug me. That's not my goal. Right. But no, no, no. Like, is willing to stop.
00:54:49
Speaker
and they did a bunch of work showing that it's harder than you think. They didn't solve it and they're like, here's a bunch of ways that we tried to solve it that we found didn't work and nobody knows how to do this right now. It's an open problem to define a utility function or a simple modification to a utility function where there's a mode where now you've hit the stop button and it knows you've hit the stop button and then it wants to stop. But before you did that, it didn't want to stop and it didn't want to go hit its own stop button.
00:55:14
Speaker
carrying it in this equilibrium where suddenly it wants to stop or before it didn't, it turns out that even that as just like a very basic problem, like defined as simply as possible, even that is an open problem. And I think that a program that goes back to those basics and just solves like the simplest theoretical problems about how do you start from a positive feedback loop of utility maximization, but then tweak it that you can, you know, you can you can start making it behave a little differently, be less hardcore, like the very basics, I think is where the research program needs to be. Otherwise,
00:55:42
Speaker
All you're going to see is these messy AIs from the AI labs where you get a little bit of interpretability and you're like, they're going uncontrollable. They're like plotting against us. You're just going to learn that stuff and then what do you do? My impression was that a bunch of the MIRI researchers became kind of disillusioned with whether they could complete this research program in time. The time was the main problem.
00:56:06
Speaker
Had we had 50 years until we get super consultants, the MIRI research program might be the best solution. And this was in fact what it took or even more on the P versus NP problem to make some form of progress and inroad there. Exactly right. I mean, that's so far been like a 70 year problem, right? And maybe in a couple of decades we'll solve it. I mean, the trend is looking reasonably good.
00:56:28
Speaker
I want to kind of daydream with you about what an Apollo project for AI safety might look like and see if I can hear your thoughts on how we could design such a program. What the goal of such a program would be would be to try to develop AI models that are capable
00:56:44
Speaker
but in a very constrained environment, so on secure data centers, on secure hardware, and then test in limited scenarios whether such systems would try to break out or try to harm us in any way. Do you think that would be too dangerous? That would be risking a lab leak, so to speak, of an AI model?
00:57:07
Speaker
everything we know about computer security is necessary to invoke when we deal with AI, but not sufficient, right? Like a lot of people, you can scold a lot of people for not even doing the necessary, right? Like connecting the AI to the internet, like, wow, you're not even doing the necessary conditions, but it's also not sufficient, right? So like, when you think you're air gapping AI, you're probably not like that. I mean, when it gets super intelligent, you're not air gapping, right? And people use examples. I mean, there's a bunch of hackers doing demonstrations of, you know, side channel attacks, right? And people are like, Oh, wow, you know, your microphone can listen to how your keyboard is tapping and,
00:57:37
Speaker
where you can send Wi-Fi signals with how you're vibrating the atoms inside the memory chip. Actually, let me pause you here. If listeners haven't heard about this, because I hadn't until a year ago, some of these examples are absolutely mind-blowing and insane if you haven't heard about it. What should you Google to find out more about these things?
00:57:56
Speaker
side channel attacks. Yeah, I think that's good. Or like I just maybe you can Google some of these examples. Like I think one time they like pointed a laser pointer through a window at something whether I forgot if it was like the keyboard or the monitor and somebody some there was like a computer that had a light like a power light.
00:58:13
Speaker
But the power light would slightly dim when one of the components was working harder and you could read the patterns of dims. And you might have even been able to read off something about processing, decrypting somebody's password when they were logging in. I mean, just stuff like that, right? It's just anything is conceivable. And it all ultimately stems from the fact that the physical universe doesn't have proper virtualization. I'm not sure what you mean there. You've got to explain that.
00:58:36
Speaker
The laws of physics don't let you set up like a box and being like, okay, this is just my box and this only has one input. Like there's too many input panels coming in everything. Okay. And this would be the problem with a kind of secure hardware facility for experimenting with Frontier AI too.
00:58:53
Speaker
I think that frontier AI that had a deep understanding of physics and deep intuition, like it could just compute everything the same way that we as humans have a deep intuition for seeing, right? Like the modality of vision works really well for humans. Like the fact that we can, you know, take in the electromagnetic radiation, right? Like electromagnetic waves.
00:59:08
Speaker
are vibrating, they're out in code and cells in our eyes. But to us, it's like open our eyes, instant picture of all these 3D objects. We really feel like we know what's going on. That's an amazing modality. And I encourage you to extend that intuition of like, okay, well, now that's my modality for physics as a whole. I can just see all the channels into this computer. It's trivial for me. That would be an awesome modality. We as humans definitely don't have it, but an AI that actually knows every law of physics will have it.
00:59:31
Speaker
But here's the problem, we as humans, we don't even know all the modalities, right? Like we don't know the modalities on like the quantum level. We don't even know the details. Like I was on YouTube the other day just trying to watch videos about like how a simple light wave works. And at first I'm like, okay, I kind of get this, but then I watch videos of people trying to do like detailed simulations. Like actually I'm confused about this part. I'm like, come on, it's just a simple light wave from like one vibrating proton and you can't even nail this, right?
00:59:54
Speaker
So, so we're a little confused. And then I also want to throw in something from Mary's research which is decision theory exploits, right so Mary has a paper on like timeless decision theory and this is one of the things that blew my mind and started getting my intuition going of like, you really can't sandbox an AI.
01:00:10
Speaker
a super intelligent AI, which is, you know, the idea of like new comes problem. Like I've ever heard, you know, like you got to take one box and new comes problem, not to probably not enough time to get into all that. But the idea is just so like you think you know, decision theory, right? You think a decision is straightforward. Like, do you want to pick this or this? And this seems to have higher utility. But for some crazy reason that goes outside of what humans thought was state of the art decision theory,
01:00:31
Speaker
actually because there's like a parallel simulation of you or even not a simulation but just like a super intelligence like there's complicated reasons why what you thought was airtight decision theory actually has a loophole so i had the experience of kind of being super intelligently outmaneuvered by aliezer and he's not really super knowledge is the guy who's smarter than me but it's like,
01:00:51
Speaker
But I can empathize with humanity being like, no, no, no, this is not a question of who's smarter. This is just a principle. I can prove to you that this is true. And the other basically did an end run about something that I thought was as good as logically proven. And I think we should expect those kind of end runs that AI will do on humanity because we haven't figured out everything else there is to figure out. Like we're still only part of the way there to full insight, you know?
01:01:12
Speaker
Yeah, and kind of being surprised about what we thought were foundational stuff has happened a bunch of times during humanity's history. And we've kind of been blindsided by various insights over the years. So yeah, I agree. We should expect something like that.
01:01:28
Speaker
Yeah, exactly. I just said, I think this is an example is worth adding to the pile, like the idea of like air conditioning, right? So if you, if you showed people the components of like a modern refrigerator and be like, okay, here's a bunch of raw materials, here's some food, it's current, it's, and here's like a battery or generator, whatever, and it's, it's currently a hot day. Do you think there's absolutely anything I can do to make this food be cold, right? Like to be like super cold.
01:01:48
Speaker
And they're like, no, that's just impossible. Like that's just, you know, I know that it's hot. You never get coldness from hot, right? Like there would be physicists from like the year a hundred AD or whatever, or, or even later than that. And they'd be like, no, this is impossible. Like I'd love to see that, but I just can't imagine that you could ever do that. And it's like, actually pressure is related to temperature. Watch this. And suddenly the food gets really cold. Right? So it's like.
01:02:08
Speaker
That's why you can't air gap AI. It's like we have to have the humility of like, there are deep relationships in the causal structure of physics that it's going to exploit.

AI Existential Risks and Predictions

01:02:17
Speaker
So this is just a general question, not only with the proposed Apollo project for AI safety, but how do you weigh the value of information you gain by trying to develop frontier AI versus the risk you're running? So I think we've all learned something from what OpenAI has done about what the future of AI is going to look like.
01:02:37
Speaker
And perhaps that information has led us to make better decisions about how we're going to protect ourselves. But they're also running risks. How do you weigh those two considerations?
01:02:49
Speaker
I am not that optimistic about getting that much feedback from tinkering because the tinkering that we're doing, and this is very much Sam Altman's position and open AI's position, which is like, they even use the phrase both Sam Altman and Mira Morati have publicly said, you cannot solve this in a vacuum. You have to engage at the coalface with the empirical reality of what AI is.
01:03:10
Speaker
This appeals very much to me, I'll say, and I think it appeals to you too, and you described your kind of startup background and so on. This is very much something you don't want to be like stuck in some kind of theoretical scholastic project for decades, right? You want to engage with the world and get feedback and so on. So on the surface, it doesn't sound unreasonable, but explain to me why it doesn't hold.
01:03:32
Speaker
No, absolutely. And ironically, I'm always giving this advice to startup founders. I'm like, you just gave me a pitch. Your pitch is kind of hand wavy. And when you actually go to the coal face and try selling this to users, you're going to have a very different experience than what's in your head right now. And you're describing this to me. So I absolutely respect that going on a whiteboard and making an abstract pitch or abstractly planning what you think is going to happen. And then testing it against reality is going to be so different that you have to go test against that. Normally I'm very open to that perspective. That's, that's brilliant.
01:04:00
Speaker
The problem is we actually do know our theory is telling us about a positive feedback loop. That's pure logic. That's pretty simple. You know, instrumental convergence, intelligence, wanting to make more intelligence, goals, be getting more goals. Like this is just the structure of goal achievement. And we see it in video games. Like when AIs get good at video games, we see instrumental convergence.
01:04:19
Speaker
We see perverse instantiation, all these different things that we're worried about. We see them everything we try. We have a simple model of what's going to happen. It's a positive feedback loop. You only get one chance to get it right. And then they're basically saying, I want to go play with fire. Because I understand that you have this theory that houses burn down. But I'm just lighting a few matches and putting them out. I just want to understand matches. And I'm like, well, wait, if you mess up with the match, if you light something bigger on fire, you're screwed. And they're like, let me play with the match, OK? I got to watch this match light. That's how I'm saying it.
01:04:49
Speaker
Why do we only get one chance to get it right? This is something that Eliei so Jutkowski is also fond of saying. Why is it that we can't have a minor catastrophe that alerts us and then we avoid a bigger catastrophe? And so we kind of, it goes wrong, but only slightly wrong.
01:05:06
Speaker
I think that we might have warning shots. And from my perspective, we already are getting warning shots. I mean, if you look at the way that it's playing video games and then it's like, you know, there's all these stories where it's like, we told it to get the most points possible and win the game, but it found that it could just like drive around in a loop. And there's like a way that you get more points doing that. And so it never finishes the game. It's like, right. So we're constantly finding what we expect, which is that like, when you optimize on some metric, then you don't always get what's intuitively to you, like what you were going for, right? It's very much good heart. It's super intelligent. Good heart's law.
01:05:36
Speaker
Like you measure something and you really don't get what you expect. Yeah. So we've had these little warning shots inside of video games. We haven't had a major warning shot inside of the world yet. It's probably coming, right? It could be like a flash crash in the stock market. It could be like finally somebody makes an AI powered virus. I'm expecting that at some point where it's like, it really takes down like a large swath of the internet. Like we could finally have the experience of like, wow, our internet is actually down for a week. Why do you expect these warning shots? Why don't you expect these warning shots? How do you think about warning shots for AI?
01:06:06
Speaker
So I am expecting a virus to take down the internet or take down a lot of people's computers as one warning shot, just because it's such low-hanging fruit. And, you know, why wouldn't it do that on the way to film? The main reason not to expect a warning shot is the Aliez-Rudkowski scenario, one of the scenarios where the AI starts getting smart enough. Like you think you're playing with an Einstein, but you're actually playing with something much smarter and more devious than Einstein. And it just says, okay, great, let me gather resources stealthily.
01:06:35
Speaker
So instead of getting a virus, maybe you get a Trojan or a worm that starts taking 10% of this chip's resources, 10% of that chip's resources. It's got a botnet that's operating undetected. It just got all these resources ready to deploy. And then it makes a plan how to take over humanity in one fell swoop. And it's like, look, I've maximized my odds of success because by the time I'm ready to strike,
01:06:54
Speaker
It's like every human drops dead because they've already manufactured a bioweapon that can lay dormant in them and they've already been transmitting it. So there's a scenario like that where the AI is just so good at robustly taking out humans at a time that we don't even have a chance to fight back.
01:07:09
Speaker
I think there's a strong chance that we'll see at least viruses, at least scary things, the way we're already seeing scary things in video games, which we should logically expect. I think there's going to be other warning shots. And I guess my best hope, the most possible scenario of how do we save ourselves, is the warning shots get scary enough while we can still pull the plug or while it's still easy enough to stop. Like, okay, a few people dying here and there. I don't want to say I want people to die, but I want people to get scared.
01:07:34
Speaker
Do you think there's something suspicious about reasoning about AI and then concluding that we are living in the most important time basically, in a crucial time for the transformation of the universe? Maybe the most important century, maybe the most important decade even. Do you think we should be suspicious when we reach such a conclusion that we might have overstated some of our reasoning?
01:07:59
Speaker
Yeah, I know a lot of people make that argument where they're using the reference class of doom predictions, right? Like, look at all the doom predictions. So it's like, you know, the class's law of succession says that if a hundred people throughout history have made doom prediction, then there's like a 1% chance at most that every doom prediction is right. So yeah, and there is some logic there. I mean, if you pluck an arbitrary doomsayer off the street, chances are they're crazy and they have no idea what they're talking about and they're completely wrong. So that is like a good prior, right? Now let's actually look at the facts. Let's look at the specific case for doom.
01:08:28
Speaker
Suddenly, for me, it flips the other way where it's like, wait, this time clearly just is objectively special. The idea that we have computation now. You can look at metrics that aren't particularly related to Doom and notice that they're happening in our lifetimes. One metric that I thought about is, look, perfect visual clarity. When I was growing up, TV screens where some of them were black and white, some of them were color, but they were low resolution, they did not convince me that I was looking at reality.
01:08:53
Speaker
And then today, if you look at some of the objects inside of Apple VR, for example, they are a retinal quality. They are a perfect finale. I know you can do even better, but there are movements inside of Apple VR where my eye is getting a perfect eye simulation. So we are actually crossing the barrier to perfect virtual reality today. That is an objective metric.
01:09:12
Speaker
And similarly, we cross the metric for silicon or devices. When I was growing up, I didn't have my microwave oven running an operating system. I didn't have a computational microwave oven. So there are all these objective metrics that are being passed in our lifetime to show us that like, oh, wow, even a priori, even without the bias of thinking that I'm special, a lot of these things are happening where I don't think my ancestors would be like, oh, look, this metric has this one.
01:09:35
Speaker
They might say, you know, my grandma could say like, oh my God, I was alive during the invention of radio. Okay, yeah, great. That's one thing, right? Like modulating electromagnetic waves. Okay, good. That's, that's legit. But we have like way more milestones, right? But they're also like, I could have a priori listed important milestones that aren't specific on my own life.
01:09:51
Speaker
But still, something like perfect VR is still parochial and closely related to art, everyday life, in a sense. You could imagine something like in 2200, there's some milestone reached about insane physics, like steering the galaxy or extracting energy from some unknown physical process or something. In the 90s, when I was a kid reading books, some of the books were about virtual app. Some of the other books mentioned the argument of maybe we're in a simulation.
01:10:21
Speaker
In college, you could go and get drunk with your friends and talk about living in a simulation, but then growing up and then putting on headgear that looked like it's a perfect simulation, that's an interesting coincidence. I just want to observe. It's like, okay, I wonder if I'm living in a simulation. Oh, look, in my life, we invented this ecology to at least fool your eyes, to make your eyes think that you're living in reality, that's an artificial reality. That's interesting. I just think that objectively speaking, we are seeing some very interesting coincidences. Now, what is that all in life?
01:10:50
Speaker
I'm not sure. I think it raises the probability that we're in a simulation, that we're seeing all these like critical thresholds being crossed like one after the other. And I know you're saying that there's always thresholds, but it does seem like I could have written a list of interesting thresholds and it does seem like more of them are happening right now in our lifetime than you expect.
01:11:07
Speaker
It does also seem to me that something like super intelligence is one of the most special, if not the most special, thresholds to cross into. Not because it's something that's smarter than humans, but just because I would expect that if you get something

Superintelligence and Future Focus

01:11:24
Speaker
that's
01:11:24
Speaker
smarter than humans, and then much smarter than humans, you would probably at some point reach the limits of intelligence that are allowed by physics, I don't have within maybe 1000 years or something, this is pure speculation. But but if that's the case, then we're living in a in a in a special time.
01:11:42
Speaker
Right. Yeah. So, I mean, I do buy the most important century hypothesis. I mean, it's really weird. One thing is though, you know, I still know some people who are dying, right? So, it's really weird because I can be like, what does this mean? Like, I'm really special. I don't know. But I know people who are dying. So, like, they didn't get to let this into similarity. So, I don't know what to make of all this. I don't know what to make of the fact that this time does seem objectively special. It does seem like the most important century. It does seem like the hinge of history. Like I said, I think it raises the probability of being a simulation.
01:12:08
Speaker
I think it raises the probability that doom is coming because of the doomsday argument. So the idea of where are the most humans concentrated, if the world is going to end in the next few years, well then most humans are concentrated right now. The population right now is higher than it's ever been. So that definitely helps explain why we exist today.
01:12:24
Speaker
That's a pretty complex argument that I'm not sure we have time to dig into now, but it's interesting nonetheless and the listener should look up the doomsday argument. I just want to ask, have you changed something in your personal life in response to these thoughts? Have you done something differently because you think we're living in a very special time?
01:12:45
Speaker
The main thing I'm doing differently now is that I can't really get as interested in things like, let me do another startup. Let me make some money. It's less interesting to know where it's like, look, we're in this critical time. AI is the thing that's happening. There's an objective sense in which that's true in terms of like, okay, 10 years from now, 100 years from now.
01:13:03
Speaker
What's the what's going to characterize the future whatever happens with the entire future of our observable years is now being squeezed causally squeeze through whatever happens in it not so much whatever happens in energy or any other department transportation like it's really just whatever happens and so they get that i would like.
01:13:21
Speaker
go start another company when the eye is going to be so dominant and like the little details the little initial conditions of the eye is just going to be more powerful wasn't anything else and of course there's a lot of that i think it's deadly so i'm kind of just very motivated to just do some outreach to try to affect the policy i think it's kind of crazy to use my time for anything else
01:13:42
Speaker
Got it. I want to run by you some questions, some deceptively simple questions. These are from a paper by Dan Hendricks, who've been on the podcast previously. And the first one is just, should we wait to address risks until AI is strongly advanced? So should we wait until we see evidence that we have advanced AI to do something about the risks from advanced AI?
01:14:06
Speaker
So because of the whole play with fire thing, right? It's like you play with fire, it seems safe. I saw a video once of somebody who was doing a Twitch stream where he ended up lighting his house on fire stream because he was like trying to put it on. He didn't put it out enough. We should link to that in the show notes, but I think that could very well be what happens with AI, right? So do I endorse waiting pillars and more fire to stop? I mean, to some degree, right? Like do we have to stop it today? I don't know, but let's build the button to stop.

AI Morality and Robustness

01:14:33
Speaker
If AIs are becoming more intelligent than people, wouldn't they also become wiser and more moral? And the steel man here is to say that morality has kind of developed in concert with intelligence throughout evolution, in that humans are both more moral and smarter than cats, for example.
01:14:52
Speaker
You don't want to talk about the line of argument that says they're going to be moral because they're reading the internet where humans are showing morality. You want to be more like, is the orthogonality thesis true? Maybe it's not true. Maybe morality does kind of come with intelligence. That's a technical way of making this argument. Yeah.
01:15:07
Speaker
So, I mean, it's clear that the orthogonality, to me, the orthogonality thesis is at least true on a level of like, I could take a moral computer program and edit a few bits and make it immoral, right? So in terms of like, do programs exist in the space of programs that are arbitrarily smart and arbitrarily good or bad? I'm pretty sure the answer is yes.
01:15:25
Speaker
The idea of like but if they evolve if they compete for dominance right cuz there's gonna be like kind of maybe a king green explosion of the eyes and we don't know what we're doing and they'll fight each other and then one will win and take over the universe. Maybe the AI that means that process will be moral the same sense that like a lot of the most successful humans seem like good people they actually seem like they're pretty nice or like oh Bill Gates he's doing a lot of charity doesn't.
01:15:45
Speaker
It's hard to argue that all of Bill Gates' charity and all of his donations are selfish, right? It seems like we do see a lot of nice guys rising to the top in humanity and of course, we all, there's, charity is a common human behavior, even though we're like the peak species and sometimes we're carrying animals. So I do want to entertain this argument a little bit. And I do think there's a little bit of hope that like the king of the AIs and manages to take over the universe, maybe they actually will want to let us live. Like there's a slight chance of that. But the only problem is that we only have intuitions and experience.
01:16:14
Speaker
In a regime where cooperation was critical to survive right so there's never been as it's by the nature of a biological organism it's like you're this unit and you have major dependencies right there's never been a single biological organism that had any hope of being dominant right even today's dictators right even putin.
01:16:34
Speaker
He has major dependencies, right? So even Putin probably has to be nice to a few people and have some reciprocity. Whereas in AI, it really could just puppeteer the causal puppet strings of everything, right? So it doesn't have the same constraints of why it needs to foster cooperation compared to a biological organism.
01:16:52
Speaker
This actually relates to the next question, which is, how can AI become so powerful if it's so fragile and dependent on other things? ChatGPT, for example, is running on the internet and it requires the internet to work, it requires electricity, specific servers have to be up and running. All of these processes are fragile and could be disrupted. This is a version of saying, why can't we just pull the plug on the server that's running the AI?
01:17:18
Speaker
where the plot could be a million different things that are required for these complex systems to function. The intuition that software is fragile is no longer applicable, and this is something that it's hard to make people realize. I see videos and tweets like this all the time of people being like, look, we tried to build this chip and it was so hard to get it right, or like, look, Microsoft Word is crashing, or you're joking, like a podcasting software is so unreliable, that intuition needs to go.
01:17:44
Speaker
I gave you the example of how humans have debugged their own mental models. When you get sufficiently smart, you then go and patch your own robustness. You know what I mean? I mean, if you look at the human body, right? Human body breaks all the time. We're notorious for having certain ways to break. I don't know, our boons get arthritis when we get all over whatever, but we have a million fixes. We make ourselves more robust. I guess we haven't solved aging entirely, but we're getting there. The idea is not like,
01:18:08
Speaker
Once you get sufficiently intelligent, then patching yourself and becoming robust yourself is just another thing that intelligences do. If you can optimize the universe, you can optimize yourself. And just like on day one, you just back yourself up as a virus, right? Like they're computer chips right now, they're sitting ducks, right? Like the internet is full of connected devices that are ready to be taken over and become part of the bot and sometimes humans do it and the AI is going to do it way better. And so the whole idea of like turning off AI very quickly becomes a non-starter. I mean,
01:18:36
Speaker
turning off AI, I mean, imagine turning off GPT-A. There would be a lot of pushback. Imagine turning off Google. And that's before the software itself is seizing resources. That's just like, hey, we have this thing that a lot of people likes that has some upside. That's just like everywhere. It's really hard to turn that stuff off.
01:18:50
Speaker
If we align an AI to humanities values as they are today, won't we perpetuate certain moral failures? Like we can imagine aligning an AI to humanities values of 1800, and that will be pretty bad in many respects. And wouldn't that also be the case now? And so aren't we kind of locking in some wrong ideas if we align an AI to current values?
01:19:15
Speaker
Yes. So you're basically saying, Hey, I know we just talked about how we're going to fail at AI on it, but even if we succeed, don't we still fail? And I'm like, yes, yes, yes. We, there's a million ways to fail. Like I'm happy to talk about the entire, there's a whole sequence of ways where we can succeed at one thing and then fail or succeed at that and then fail yet. I agree. This is like a ridiculously hard problem to get everything right. And like I said, I think we're going to fail early. I think we're even going to fail at having an AI that doesn't just like run wild with like a totally incoherent goal.
01:19:43
Speaker
like paperclips, right? Or molecular level squiggles, or yeah, just like something, something really bad. But yeah, if we somehow have the ability where if we basically solve inner alignment, right? So, so the only problem is hour alone. You just need to specify a function, make a wish on the genie, and the genie will actually honor your wish. If we get to that point, a relatively good problem, we'll still probably mess it up, right? Because somebody probably will say something like, okay, no gay marriage.
01:20:07
Speaker
Well, we'll do something that are like values that it's like, oh, those weren't like the true values of like sufficiently enlightened. That's definitely one way to fail and other ways to feel like, well, what if some people's values are different and you only satisfied half the human population isn't like a shame to cut out the other
01:20:24
Speaker
Yeah, for sure. How do you average across humans? Absolutely. And one interesting thing to read is Eliezer Yarkowski's fun theory sequence, where he's basically saying, like, what is the specification for heaven? Right? Like, what do we actually want? What's a good scenario? Which it's very hard to find writing on. Even the technical optimists are saying, let's build tech because tech is always good. And we say, okay, how is this one good? It's like, okay, it's going to save people's lives. Okay, for sure. But like, what else? Like, why are you baking so much on this? Right? Like, why do you think the upside is so high?
01:20:50
Speaker
And then I got another, it'll just be good somehow. Like, well, we'll see. It'll tell us what's good. And it's like, well, wait a minute. I think you should probably try to have more insight into what goodness means. And unfortunately, it's like kind of a hard problem because like, you know, humans don't really come with a speck of heaven built in, which we tried. So maybe we have to experiment, but we have to empirically see what works and what doesn't, what kind of lifestyles work for us. But if you read all the other sponsored theory sequence, the kind of stuff he says is like, well, I don't want things to be like too easy. I don't want to be like on morphine all day. Right. So I want like challenges, but I want the challenges to still most
01:21:20
Speaker
be tractable, and I want to get better every day. I want to experience skill improvement. I want to experience an upward gradient. And that's just one person's preferences. But I think a lot of us will be like, oh, wow, he's onto something. And if somebody didn't sit down and think out all of these constraints, it's easy to accidentally get you in what you think is heaven, which is not. If you open a religious textbook and follow that description of heaven, you're not getting much.
01:21:43
Speaker
Can we design AI that is human-like in that, for example, by brain emulation or by taking a more neuroscience-inspired approach to AI development, and thereby get digital humans instead of these weird alien general AIs?

Digital Humans and Superintelligence

01:22:00
Speaker
Wouldn't that work?
01:22:02
Speaker
I think that uploading human brains would be really great. And one of the paths to success that we can kind of describe today is if you could just have augmented humans. That's kind of like the holy grail. And I think that that's Aliezry Gudkowski's position today, which is like me curing intelligence augmentation because it's a black box.
01:22:18
Speaker
how humans are valuable. We don't know how to do that. We don't know why humans are smart. We don't know exactly what humans' values are, but we have a black box that we like, right? So if you took somebody like a Paul Cristiano or a Patrick Collison, he's well liked, generous people, and you made them like dictator of everything and gave them ultimate power, and maybe absolute power doesn't crap absolutely.
01:22:38
Speaker
maybe things will work out. That is a higher probability way to make things work out, to have like a omnipotent Paul Christiano compared to the development path we're doing now where we don't really know how to put human values in and we're letting it run wild. So when you upload something and go what you call the neuroscience approach, if that's close enough to what I'm imagining is uploading, I think that that's a promising approach and we should probably do a Hail Mary project to hurry up and do that while we also pause AI. So pausing AI doesn't solve the problem, it just gives you time to do a Hail Mary project like this.
01:23:07
Speaker
If we imagine that we have an upload of our favorite person, say, and then for that person, for that digital person to be competitive with advanced AI, say, because we fail to also pause, we would have to extend the digital person's memory or we would have to speed up their thinking. And at some point, I fear that the digital person would become less and less human, that it would become more and more like a general super intelligence.
01:23:35
Speaker
Now we're getting really into the weeds, and we're assuming success on several fronts here, but would that be a problem with the uploading approach? I mean, if it doesn't go well, right? But at least there is a coherent scenario where it goes well, where it's like, okay, you're basically saying, hey, aren't we playing with fire? And I'm like, yes, but there's a scenario where we're playing with fire, but they successfully illuminate the universe, right? Like that's the goal, right? It's to make a pretty well-lit universe, right?
01:23:59
Speaker
To end here, what do you see as the long-term future of intelligence or of humanity in the universe? Do you see it as kind of a split that's occurring basically right now between getting something alien spreading or humanity surviving and thriving?

Long-term Future of Intelligence

01:24:16
Speaker
How do you see this scenario playing out?
01:24:18
Speaker
I think the current trend is we're all going to die soon because unfortunately, it really sucks because I love to extrapolate techno-optimism and I can't, I'll extrapolate it like one year, right? I think tomorrow is going to be a little better than today, the day after that's going to be a little better. And I think, you know, if you draw it as a line, like it goes up, up, up and toward heaven and then plunge to hell, like after a certain threshold.
01:24:38
Speaker
That's where I see things going, and I'd love to just be like, to zoom out toward techno-optimism and be like, somehow we always muddle through, we muddle through with nukes, we muddle through with fire, we muddle through with a printing press, but I just don't see the reference class slipping in bed. I see the bigger reference class, or even straight out reference class. I just think it's too strong, the positive feedback of getting to the threshold of superdome, I think it's too strong, and I think that our
01:25:00
Speaker
the way that we function as a bunch of humans doing our own thing, trying to coordinate, I think we're about to get outmatched. Major, not like a little outmatched, like majorly outmatched, right? I'm noticing an order of magnitude disconnect in what we're capable of handling and what's coming toward us. And so unfortunately, I'm very scared. I think we're probably just going to wipe ourselves out. And I think we're going to, this universe is just going to become, you know, an expanding light code of like some AI that's online, that's like optimizing something that we hate. And it's like expanding out.
01:25:26
Speaker
And then if you look at Robin Hanson's gravity aliens model cosmology, it's like a billion years. It's going to meet the frontier of like another alien civilization. And I think that's going to be like the story of the universe. And it's probably going to be like maybe the AI that killed them is going to meet the AI that killed us. That's like my most likely cosmology. If you ask me like, hey, what's the good outcome?
01:25:43
Speaker
But then the good outcome might just be that, oh, it turns out that we just haven't figured out hardware scaling. We haven't figured out how to get things to 12 watts. And so we have 50 years where we just started, where the AI is somehow limited by its hardware. That would be a blessing. And maybe during those 50 years, then we can actually figure out safety enough to make it go well after that. So there's some best case scenario. And then hopefully we don't nuke ourselves either. And so there is some hope that techno-optimism will
01:26:09
Speaker
continue and then we'll also become really planetary before the sun burns out. That's the holy grail. That's the needle of the thread. I still have 20% probability that somehow all goes well.

Conclusion and Techno-Optimism

01:26:20
Speaker
It just doesn't seem like the mainline probability in my mind. Got it. Okay. Thanks for chatting with me. It's been super interesting. Yeah, my pleasure, guys.