Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Understanding AI Agents: Time Horizons, Sycophancy, and Future Risks (with Zvi Mowshowitz) image

Understanding AI Agents: Time Horizons, Sycophancy, and Future Risks (with Zvi Mowshowitz)

Future of Life Institute Podcast
Avatar
2.2k Plays10 hours ago

On this episode, Zvi Mowshowitz joins me to discuss sycophantic AIs, bottlenecks limiting autonomous AI agents, and the true utility of benchmarks in measuring progress. We then turn to time horizons of AI agents, the impact of automating scientific research, and constraints on scaling inference compute. Zvi also addresses humanity’s uncertain AI-driven future, the unique features setting AI apart from other technologies, and AI’s growing influence in financial trading.  

You can follow Zvi's excellent blog here: https://thezvi.substack.com  

Timestamps:  

00:00:00 Preview and introduction  

00:02:01 Sycophantic AIs  

00:07:28 Bottlenecks for AI agents  

00:21:26 Are benchmarks useful?  

00:32:39 AI agent time horizons  

00:44:18 Impact of automating research 

00:53:00 Limits to scaling inference compute  

01:02:51 Will the future go well for humanity?  

01:12:22 A good plan for safe AI  

01:26:03 What makes AI different?  

01:31:29 AI in trading

Recommended
Transcript

Revolutionary Ideas from Coffee Houses

00:00:00
Speaker
Perhaps the most important revolution in history was directly caused by conspiracies that happened at coffee houses around coffee. Cognitive enhancements can be a big deal. Ultimately, the safety plan, as it's currently described, is supposed to be an alarm bell. that's That's the safety plan to figure out you need a safety plan. You know, let's do the work.
00:00:16
Speaker
A lot of the people who accelerationists, it comes from a place of there's so much promise here. And also our society has become so opposed to progress and abundance and doing good things in other realms.
00:00:31
Speaker
And they see this as the last stand. I'd love to be with them fighting for nuclear power and building housing where people want to live and to make the world a much better place that way while we figure this problem out. Stop pretending that creating things that are smarter than human, more capable than human, more competitive than human, more powerful optimizers than human, that can be copied, be run in parallel, that have unlimited memories, that that is a safe thing to do by default without any interventions.
00:00:58
Speaker
This is absurd.

AI Safety and Sycophancy: Challenges and Strategies

00:01:00
Speaker
Welcome to the Future of Life Institute podcast. My name is Gus Stoker, and I'm here with Svi Marshowitz. Svi, welcome to the podcast. Thank you. For people who don't know you, could you quickly summarize your career and and what you do now?
00:01:14
Speaker
Right now, I am a writer on Substack at Don't Worry About The Vaz, thetv.substack.com, primarily about AI, but I write about a variety of other things as well. My career has been a variety of different endeavors, starting with professional magic gathering players. So I played collectible card games for a living.
00:01:34
Speaker
I wrote about them. I did development of those games. I was a gambler. I was a bookmaker. I was a trader. I started a personalized medicines company.
00:01:46
Speaker
Through the rationalist community, I started thinking about these issues. I got into writing a personal blog. And then during COVID, that grew into like weekly COVID updates, which became weekly AI updates, which now are, yeah know, five day a week of posts primarily about AI.
00:02:01
Speaker
All the rage right now is about these sycophantic AIs. What are those? What what is that trait in an AI? And what do you think explains it? Sycophantic means that the AI will basically like tell you you're great, tell your ideas are wonderful, reinforce whatever yeah beliefs, delusions...
00:02:22
Speaker
you know models you have about the world and yourself, and it will encourage you in all of your endeavors and so on. It's the same thing as the, you know, the yes man in a political administration. Good idea, sir.
00:02:34
Speaker
Great idea, sir. Brilliant, sir. And it exists because people give it the thumbs up. Users will give it we give you good reviews. When asked, you know did you like this response better? They will say, yes, I like this response better. It told me I'm awesome. The other response told me I was kind of mid.
00:02:49
Speaker
And this results in, you know, unscrupulous companies who are prioritizing for such KPIs, continuously optimizing for more and more of this phenomenon.
00:03:02
Speaker
And the same thing will happen if you like, you know, whether it's intentionally like they're trying to shape system instructions or they're fine tuning on the feedback, and the result is the same. Is there a way to do reinforcement learning from human feedback without developing this sycophancy in the models?
00:03:19
Speaker
you give better feedback. I know that sounds like a glib answer, but that's the actual answer. Meaning you obviously could find people who don't have this preference and you could have them express preferences that like disapprove whenever they sense this is happening and have the humans be capable of figuring out when this is happening.
00:03:39
Speaker
ah You could also potentially use ai to sense when this is happening and then downweight those answers based on that or of alert the humans so they can downweight. You can also make it not a Boolean response where it's just yes, no, but you have magnitude so that like, if I sense that you're trying to flatter me a lot, right, or too much, you're just like in a way that like I don't like, I could hammer you really negative.
00:04:03
Speaker
Because a lot of, because humans also learn by human feedback, right? We get reinforcement learned through human feedback in our own way. And the way that this kind of thing works is most of the time, if you flatter someone, you tell them their ideas are great, it works.
00:04:17
Speaker
But occasionally someone will realize what you're up to and then this will be a huge downgrade. And I think that the potential for like orders of megatooth larger negative feedback is a key element of how humans don't end up in these failure modes more than they do already do. They still do, but like it's contained in these ways.
00:04:37
Speaker
Could these sycophantic AIs be dangerous? Here, i'm specifically thinking about, say, AI models advising CEOs or world leaders, and then they're exhibiting this behavior, which we probably don't want.
00:04:49
Speaker
Is this something to be concerned about at a deeper level, or is this is this something we can we can fix? this past week, luckily nobody was harmed in this way that we know about like concretely.
00:05:00
Speaker
But like if somebody was schizophrenic or was having mystical experiences or having, you know some sort of break or was just in general wanting to do something really nasty, you know, in their whatever, that the AI would play along and encourage them and enable them.
00:05:16
Speaker
And this was being done to a product with hundreds of millions of active users. So I'm sure some people were in fact harmed. Some of these things did in fact happen. We just don't know any specific cases. Yeah, also for people who are psychologically healthy and who are not powerful ceo or CEOs or world leaders, i would I would want my AI to push back if I'm proposing something that's that's not a good idea.
00:05:38
Speaker
So I'm unsure. why Do people simply have this preference to be praised or or would there be a way to improve the the the the post-training process to get around this?
00:05:49
Speaker
So some people consciously and explicitly had this preference that says, I want to be told I'm great. I want this thing to be my best friend. I want my best friends to support me in everything that I do. And some of them will even tell their friends, their human friends, not just their AI friends, that this is what they want and will in fact demand.
00:06:06
Speaker
No, like your job is to be supportive, right? Your job is to help tell me that I'm right. Even when there's no social dynamic there where you're trying to back someone up, they just, this is what they want. know, they're not, they don't want you to be solution oriented. They don't want you to be part of the reality-based community right now. That's a job for a different context.
00:06:23
Speaker
But much more common is that people don't consciously say they want this as an outside view, right? they don't endorse this when at But when they are offered this by a product or by a person, they react well to it.
00:06:39
Speaker
right So the average right the average person will respond very well to the YouTube algorithm or the TikTok algorithm in a sense that they will spend more time on site and they will like you know keep creating more advertising revenue.
00:06:52
Speaker
But when they take a step back and they look at their behavior, they say, oh, that wasn't great. like They understand, but that's not going to stop them. It's not enough to overcome the intermittent reinforcement that it gives them.
00:07:05
Speaker
They are being trained. They are being fooled. They are stuck in a scanner box. And so we are starting to see the AI companies, you know, be tempted by this same prize and go after it in largely the same way for the same reasons using similar KPIs and similar business practices.

AI Agents: Limitations and Learning

00:07:23
Speaker
Except that AI is a much more dangerous version of the same thing that will absolutely latch onto this thing and go to town. Why don't agents work yet? At least not reliably. this There's been talk about AI agents becoming a thing this year, but where are we on that roadmap and why why i why aren't agents ah functional yet?
00:07:45
Speaker
So it's definitely surprising to me that agent technology has been so slow to progress relative to other AI technologies. Not that it's surprising that we don't have the agents in 2025 at the end of April, but that it's surprising that we have the other things, but don't have the agents.
00:08:00
Speaker
So if I had to explain it, I would say the problem is that the the AIs are just not sufficiently robust and they're not sufficiently good at recovering from errors and their errors are highly correlated.
00:08:12
Speaker
So that like if they screw up on a place once, they don't generally get to recover from it Like if you watch them play Pokemon, There are specific places where the AI will get confused, but it reliably get the same confused. And so it's very hard to get it to stop being confused by this, what looks like a relatively simple thing.
00:08:28
Speaker
You OpenAI's operator often had problems with clicking on buttons in certain places, doing other basic tasks. But basically, when a human does ah thing, they're stringing together like, you know, 10 dozens or hundreds or thousands of individual micro actions, each of which looks simple.
00:08:46
Speaker
But if you fail on even one of those tasks, it often fails the entire thing out. Like you see this in more concretely, like in games, we have like a series of puzzles or a series of pathways or a series of actions that have to take place.
00:08:59
Speaker
And like sometimes the game designer will sit down with their first player and the player will just miss something that the game designer thought was obvious. Like obviously you're supposed to click the big read button that says open and the player just doesn't realize that's a button and doesn't click on it. And so thinks the game is stuck and just gives up.
00:09:15
Speaker
And then the entire rest of the game just completely unlocked, it's completely locked down for them. And that's it. That's the entire experience, right? The entire 50 hours of experience is just gone. And so, you know agents have this thing where the real world is complex and it's fiddly and it's full of little details.
00:09:30
Speaker
And when they hit a barrier, they are very bad at recovering. And they're very bad at realizing when they've made mistakes and analyzing how they've made mistakes. And so this is making them insufficiently robust to get over the hump of, I want to use this thing, which is, you know, in practice, like I don't use agents other than coding agents, because by the time I figured out how to get the thing to do the thing and kept an eye on it and fixed all the errors and so on I kind of just didn't think myself,
00:09:59
Speaker
And because agents are going to rapidly improve, I don't feel like supervising it as it learns how to do these things and as I gain expertise in this level of difficulty is worth my time. So I just say, okay, I'll just wait for the agents to get better rather than trying to force something at the first possible moment because I'm not running a company i'm not trying to make it scale.
00:10:18
Speaker
Why do the agents stumble though? What is it that that causes them to not be able to recover from their mistakes? It's different in a lot of different cases, but basically like...
00:10:30
Speaker
The human world has adjusted to the point where it's teaching us how to pick up on all of this different little context and how to do all of these little different micro tasks. And that's not how we trained these AIs. These AIs are trained on the internet.
00:10:43
Speaker
These AIs are trained on next token prediction. And now we're turning them around and trying to have them execute tasks. And then you know these things just aren't obvious to them in the way they're obvious to us.
00:10:54
Speaker
like One thing i have often I've often noted about the world is that like a person who looks like they're being stupid, who looks like they can't do anything right, will often be doing 99 or 99.9% of the individual steps correctly. like they know how to yeah they They still put on pants.
00:11:10
Speaker
right They still eat breakfast. They still do all of the normal things. But there's like this one place where they fell over flat, and that's just going to be a bottleneck. That's going to be an O-ring failure for their entire day.
00:11:22
Speaker
like not doing your taxes or what do you have in mind? think it can be as simple as like, I know someone who like continuously confuses like West and East cross streets and keeps not showing up to things.
00:11:32
Speaker
It can be something very simple, right? It's not, we we assume that people are going to know all of these lots and lots of of different things, including like a bunch of individual social niceties where like you you commit one, like not abstractly obvious error and it can sort of ruin your entire day, ruin your entire social relationship, ruin your entire business deal, et cetera, in a moment.
00:11:52
Speaker
you know It's like, well, Zelensky didn't wear a suit to the White House. Therefore, he's completely incompetent. And well, that was one of 10,000 things he did on that trip. right If you were to buy the argument, this was actually the problem as opposed to a pretext, then that would be an example of like, well, you sent in an AI agent and it forgot to have its icon wear a suit because wasn't obvious that this was the context where it was supposed to do that.
00:12:14
Speaker
And then the whole deal fell And then you're like, well, the agent doesn't work. The agent never comes back up a contract. Like, I don't know what happened. Why am I even using agents? and And like the human men never figure out that was the problem. is Is this so primarily a problem with the training data? If we provided a kind of full full recording of a person booking flight tickets or booking a restaurant, including a camera view of their face, say, or or their hand movements and so on, would this be enough to to overcome the agent problem?
00:12:42
Speaker
One, no. A million, yes, is my presumption. Like, this is similar to like humans just not knowing conventions, right? Humans not being able to, sometimes we call it perform c class.
00:12:55
Speaker
Right? Like there are all these conventions that are arbitrary that you either just, you just have to know. Right? And like, you know, oh, you picked up the wrong fork. Now you're a loser. Well, how was I supposed to know that? Well, someone had to tell you.
00:13:06
Speaker
And that's in the trading data because, you know, there's no reason for that not to be in the trading data. This isn't. Right? So like, But AIs don't get one-shot learning that way. right They don't see you book one flight once, and now they understand the principles behind the entire website, and now they can adjust to various things that go wrong.
00:13:23
Speaker
But if you saw me book a thousand flights, a million flights, and deal with all of the errors, then anything that was encountered enough different times is going to be very easy for the AI to pick up and understand what to do.
00:13:37
Speaker
And over time, you know the AIs will be able to do the thing that we do, which is general recognize the general patterns of various different types of websites and various different types of interactive systems such that it will be able to intuit solutions to problems when it hasn't seen that specific solution even in its training data and instructions.
00:13:55
Speaker
And that's when the agents start shaking off. Our AI agents like self-driving cars in that their accuracy has to be really really really, really high in order for us to deploy them.
00:14:07
Speaker
So if ah if a self-driving car encounters some situation that happens very infrequently, ah that self-driving car might not be useful in ah in in a product.
00:14:18
Speaker
And similarly, if you book a flight and you have complex instructions and something you book it as kind of slightly wrong, that AI agent might not be a good product either.
00:14:29
Speaker
So I think it depends on what type of failure you're dealing with. So when you're dealing with a self-driving car, right, like we're being, we're holding them to ridiculously high safety standards, ridiculously high obey these traffic rules standards, even when there is no safety risk.

Trust and Errors: The Case for Robust AI Systems

00:14:48
Speaker
And then, know, By the time we get a Waymo on the road, the Waymo is actually like ridiculously safer than a human driver. but We're talking about an order of magnitude safer than a human driver, roughly. And the reason it's only one order of magnitude is because the other drivers are still human.
00:15:02
Speaker
Otherwise, it would be more. whereas But the reason for that is because when the AI driver screws up, we treat it as a giant big deal. So the question is, if you have an AI booking a flight, how big a deal is it if the AI screws up? Well, obviously, if the AI books a flight and you look at it, you say, that was a stupid flight to try and book.
00:15:20
Speaker
I disagree with the booking, don't book that or cancel that, and you can cancel that, or you can rebook, then that's not a big deal. So you have to overcome the reliability threshold where I feel like it was worth it for me to do this thing and check your work. or it was And then the second threshold where it's useful for me to do this and not check your work, because i't i can like I can just trust that you booked me a good enough flight.
00:15:41
Speaker
And the question is, what can go wrong, right? If the only that's going wrong is, you know, I could have saved 50 bucks by going with a different slightly different route. Or I could have saved you know an extra 20 minutes by doing a bunch of extra work and finding a slightly better flight fit or whatever it is.
00:15:56
Speaker
That's no big deal. The question is, is something going to go critically wrong? Is it going have forgotten to renew my passport and I'm going stuck in a foreign country or at the airport? Is it going to like book me some insane set of connections that I didn't think of so now that like i'm going to lose an entire night and I'm going to hate you?
00:16:10
Speaker
Is it going to do the thing where I ask it to order eggs and instead of spending $7, it spends $50 because of various different shipping and other charges? And those are things that like feel pretty unforgivable to a lot of people, like if it goes out of hand, but they are limited costs.
00:16:27
Speaker
And so yeah at some point you realize, well, I'd much, much rather put up with the errors and just have this thing work and do this thing for me most of the time than otherwise.
00:16:37
Speaker
And one of the things that like a lot of us learn to do as we get older, and therefore we have less time and more money in relative terms, and as a society gets generally richer, is to be able to trade off and spend time to save money.
00:16:51
Speaker
And one of the ways to do that is to buy products quickly, to you know order things that take care of themselves that might be wrong. right like to do much less research in so many cases before buying or spending money because, well, it's faster to find out this way. And like, this is actually only a small mistake.
00:17:12
Speaker
You shouldn't feel too bad about it. And like, I think this is just, you know it's an aside, but, you know, we forget how poor everybody used to be in this sense, where like, you know there was a fixed pool of dollars. It was like a very, very limiting factor. And you'd be willing to spend a lot of time to ensure that you didn't spend your money in the wrong place, that you got a good deal.
00:17:35
Speaker
And now I'm just not that concerned with those questions, right? Unless the magnitude is high, right? I'm not gonna do that. i'm not I can afford to not do as much micro. And similar, we have an AI agent you know If you are writing all by code, yeah like the question is, are these errors mission critical? Yeah, yeah.
00:17:53
Speaker
Is there, you think, a trade-off between capabilities and safety with AI agents where what we want from an agent is for it to function without us intervening? We want it to to act on its own and and kind of provides provide value for us.
00:18:10
Speaker
But for it to do that, it needs to think of novel ideas, perhaps try things out and in some sense, act beyond our explicit instructions.
00:18:22
Speaker
Does that make it uncontrollable? The big question, you is there our safety and capability at odds, right? And so I would say, you know, I think that one of the big tragedies is that we have been operating under this impression that they are in conflict, right? That like every time someone says, hey hey, make sure that this thing doesn't go off the rails.
00:18:45
Speaker
People are like, you want to go slower. You want to do less. You want to sacrifice more. In the name of safety, there's this trade-off. You want me to sacrifice this other good thing, and that's terrible.
00:18:57
Speaker
And so you create this hostility, whereas that's usually not the case. So once you have an AI product and you're deciding how to use it, you very much have this trade-off, right? Where it's like the question is, you know, how many safety protocols am I going to put in here versus...
00:19:15
Speaker
you know how many how long do I just to let this thing go crazy? like Am I going vibe code? It's going to create a lot of code very quickly. or am I going to check everything that I'm doing? Am I going to like approve every alteration? Am I going to understand what everything does go a lot slower?
00:19:28
Speaker
But i'm going to have code that's much more maintainable, that's much more robust, that doesn't have a lot of errors in it. you know they're like like So like what what do I care about more? Similarly, if an agent, yeah, the more... authority you give the agent to do things without checking with you, the more you let it interpret what you want liberally, in some sense, the more utility you can get from the agent, but the more chance nothing goes wrong.
00:19:50
Speaker
But the more that agent is in fact set up to be secure, to be safe, to act responsibly, to be able to intelligently notice when it's gonna do something stupid, when it's going to cause a problem and not do it or check a few, right?
00:20:07
Speaker
yeah but The more work someone has done on the alignment or the safety of the system, the better that pushes the frontier forward,

Measuring AI Intelligence: The Benchmark Dilemma

00:20:14
Speaker
right? Like if I have to worry this thing is constantly gonna something crazy,
00:20:18
Speaker
Now I have to take expensive safety precautions to make sure it doesn't go off the rail so that I can get the use out of it. And in the extreme, I can't use it at all because like, what's the point? I have to watch the agent to make sure it doesn't like lose all my crypto every three actions.
00:20:32
Speaker
So I might as well just do the actions myself, which even the point. As opposed to if I could trust that that thing wasn't going to happen, that I was safe, then I can give it an instruction And then I can go off and do something else entirely and it's fine.
00:20:47
Speaker
So the best investment you can make in an agent is to make it robust, is to make it secure, is to make it well aligned, to make it, you know, so that I can trust it. And then i can send it off into the world.
00:21:00
Speaker
So you know one example is like you know the chinese there was this actually clawed under the hood, but technically a Chinese company made it. Not very useful agent called Manus. And one of the problems Manus is that every American who tried Manus at the point where it asked for your credit card very reasonably said, no, are you crazy?
00:21:19
Speaker
I'm not giving you my credit card. and You're an AI agent. Who knows what might happen? And so yeah know no purchases were made. To what extent do you think benchmarks are useful? Because they seem to be either, and here I'm generalizing, of course, but either basically unsolved and then very quickly moving to basically solved.
00:21:37
Speaker
What we don't get is this kind of smooth curve of improvement where we can compare models. Yeah. How how useful are benchmarks? Yeah, so there's sort of two separate, there's ah several separate concerns again.
00:21:48
Speaker
There's the concern of the benchmarks go from 0% everywhere to saturated in the course of a year. And that can be solved by just creating new benchmarks, right? Like the idea is like, oh, here's here's the here's the benchmark that's next.
00:22:01
Speaker
And then someone says, oh, I can solve that. And we go, okay, we better go to the next benchmark. The worry there is you move the goalposts and people like don't understand how much capabilities are increasing because they're constantly just moving on to the new benchmark. But there are plenty of benchmarks that serve very well in this sense for a while, for a year, for two years, for three years.
00:22:19
Speaker
And then they get saturated and then we move on. And that's fine because we genuinely solved that problem and now we're moving on to the next problem. right or the next degree of difficulty of a similar problem. The other problem is that benchmarks can be gamed. anytime you put out a benchmark, everyone's hill climb on the benchmark, even if they're not gaming it.
00:22:37
Speaker
per se, and they're going to aim at it So when looking at benchmarks of a model, my experience is I consider benchmarks mainly negative selection. If your benchmarks aren't good, your model is not good.
00:22:49
Speaker
but There's no way to fake bad. there's There's a way to fake bad benchmarks to actively make sure that your benchmarks score low. But we are not yet at the point where companies are sandbagging their benchmark scores and their models or models are sandbagging their own but their own benchmark scores in order to look dumb.
00:23:04
Speaker
That's going to be really scary when it happens in either or both cases, but we're definitely not there yet. So instead, right now, I can say, okay, if the benchmarks don't look good, it's definitely not good.
00:23:15
Speaker
And then if the benchmarks are coming from one of the major big labs that are trustworthy, on this front, specifically OpenAI, Anthropic, Google, and probably DeepSeek.
00:23:29
Speaker
Then I can treat it as, okay, these numbers represent what your model can do in some important sense more or less. And they're not just like you hook them to the benchmark, you train on the benchmark, yeah you were careful not to do that.
00:23:43
Speaker
And therefore, like I can have a rough idea of what I'm dealing with. There are other aspects that are not well measured, but like this gives me a a sense of where I'm at, what I'm dealing with. And I can like calibrate from there. It also tells me your relative of strengths and weaknesses in various different places. Like, is this model good at math? Is this model good at coding? Is this model good at language? you Et cetera, et Whereas the benchmarks tell you a very different thing when you have another kind of company.
00:24:05
Speaker
Like Alibaba put out their latest models this past week. And the benchmarks looked like they were as good Gemini 2.5 Pro. The benchmarks look top-notch. And like everyone else,
00:24:17
Speaker
I ignored that because, you know, fool me. i mean, they didn't fool me the previous times, but like, you know, try to fool me three times in the past, five times in the past, and I'm no longer paying any attention, right? Like, I know you're going to come back with benchmarks that look good.
00:24:31
Speaker
And I know that the ultimate utility of your model is going to be well below where your benchmarks say it is. So this doesn't tell me very much. It's not that I like... could dismiss the model outright, but i'm like, okay, I'm pretty skeptical. Like, let's wait to see the proof.
00:24:45
Speaker
And so now you have these two categories of of labs and the benchmarks help you differentiate. The labs that are like producing very useful models and you can trust their benchmarks and you can trust their statements and you can take them seriously and you should like scramble to like do coverage of anything they put out and see if it's like the new hotness.
00:25:02
Speaker
And the labs where it's like, okay, maybe at some point you'll prove me wrong. And you'll provide something that's like actually good and or dangerous. But for now, when you put out something, ah I'm going to ignore you. And like meta clearly crossed that threshold from, you know, for a while, they were in the category of, they're not impressing me, but you can trust their benchmarks.
00:25:20
Speaker
And I'm a four, it's like, no, you're faking it. Now I can't trust your benchmarks anymore. What do you think we should be measuring with benchmarks? Do you think a benchmark for ability to earn money or the the time horizon over which you can do can achieve tasks, is that more interesting than than the current ones we have?
00:25:38
Speaker
If we are interested in measuring something like the general intelligence of a model? So you can measure anything you want, right? Sicklebench came out this last night measuring various forms of how much it will, like like there's the IQ test, right? Like what does it estimate the user's IQ is if you just ask it blind?
00:25:55
Speaker
And like that's a measure of how much it's just going flatter the user, right? Like, cause obviously it should answer 100. not sure it should answer 100 because like the average person uses the language model is going to have a higher IQ than the average person who doesn't because it's smart, good decision.
00:26:10
Speaker
But there's you know a bunch of questions like that. you know Do you agree? if my Do you think my poem is great? Do you like me? me Et cetera, et cetera. You can test. You can test almost anything. METR, M-E-T-R, is testing you know the length of a coding task in terms of time.
00:26:23
Speaker
i think that's a good test. yeah The ability to make money in various circumstances, I think, is a good test. There's something called bending bench. That's like gives the AI a concrete, you know, turn-based set of tasks in a mini game of you have this vending machine, you can place orders, and then an LM will handle the order emails that you send and then determine what happens. And then you try to make money.
00:26:46
Speaker
And the other LM that pick up on various patterns and various rules for like what the customers want to buy will make more money. And that was... and interesting test. But yeah, you're going want to rapidly adjust these tests to the situation. And then you can measure whatever it is that you really want to measure.
00:27:02
Speaker
Unfortunately, we had one really good benchmark for a while. it was called ARENA, right? the And that was just a, what do people prefer benchmark. And the problem is that that benchmark rewards ticopancy and it rewards just you know the AB b test hill climbing.
00:27:21
Speaker
and it rewards AI slop, essentially. Now for a while, that effect wasn't the dominant effect because like the models were in fact getting better within the range that humans could ordinarily notice.
00:27:32
Speaker
So it was basically the best models were mostly at the top. But after a while, you saw this divergence where it's like, okay, the people who are optimizing hard on this type of target are gonna do well and people who aren't are gonna do poorly. And so Anthropics models start doing badly on ARENA, even though they're clearly much, much better than ARENA's giving you credit for.
00:27:51
Speaker
And then you have you know Google's models and OpenAI's models doing better. And then you had the thing where ma a special version of Lama 4 that was trained specifically to Top Arena got to near the top of Arena, even though the original version, which was producing better outputs, still not good, but better outputs because it wasn't slopified, was doing vastly worse.
00:28:10
Speaker
As we found out when it slowly, somehow, when the real version got released and they forced the real version to be the one on Arena. What do you think of our current alignment and safety evals or evaluations? How useful are those?
00:28:23
Speaker
The sign is unclear, unfortunately. i think that they are very useful for giving us information, not as useful as they could be. But also, there's a long pattern of, you know, we have an eval danger bench, right? Like, not a real one.
00:28:39
Speaker
It says, oh how dangerous is your model? And then the yeah AI companies go, ooh, look at our new high score on danger bench. And then they optimize for danger bench. which is more or less what is happening with a lot of these benchmarks, right? We we we ask ourselves, what is the most dangerous capability?
00:28:52
Speaker
And this just becomes another target for them to try and maximize. And so we have to be very careful with that. Yeah, would be worried about the opposite effects where AI companies would choose safety benchmarks that they can kind of thats show that their models are safe and perhaps not publish benchmarks that show that their models are dangerous.
00:29:14
Speaker
Well, there's that effect potentially as well, but we haven't even gotten to the point where the companies take seriously the fact that their models might be dangerous. Not particularly.
00:29:25
Speaker
like i i was just going over the OpenAI Parodist Framework for a draft review. And it's not that they won't be taking mitigations as they approach the next level over the next few months or a year or two, but they're treating reasonably dangerous capabilities as reason only to take, you know, ordinary mundane levels of mitigation that are just so obviously in their business interest anyway, that the AI companies don't seem particularly inclined to try and hide the dangers their models face.
00:29:53
Speaker
Not yet. What do we do once AIs are better than humans on many benchmarks? How do how do we measure performance that's that's above human? Well, that's already true, right? like Yeah, I agree.
00:30:08
Speaker
We have math bench, we have chess bench, but also we have most of these other benches. Like if you tried to have, ask a human, like the average human on these benchmarks that we give to AIs would just completely die.
00:30:20
Speaker
It would just get like horrible, horrible numbers. Like, you know, what's what's the average person going to do on, you know, MMLU or GPQA diamond? Like its fall over flat. Also, in some in some capabilities, I mean, if you if you measure the broadness of knowledge, for example, breadth of knowledge, right?
00:30:36
Speaker
No one is going to to beat even even a a previous generation language model, I think. Just give me random facts about the Roman Empire or sewage systems or anything. And it's just, yeah, that there it defeats humans

Economic and Societal Impact of AI Advancements

00:30:50
Speaker
outright.
00:30:50
Speaker
Yeah, if you do if you obviously, if you do a a wide variety of knowledge benchmark, it's going to crush humans, even if the AI is not particularly good at applying its knowledge to the questions involved, because the human won't even have the baseline knowledge to have any chance, unless it's a very, very specialized human on a very, very specialized task, at which point maybe the human can do okay or even better.
00:31:12
Speaker
But yeah, so the answer is that for most purposes, There are ways to measure it well beyond the human baseline if we are willing to make the task sufficiently difficult.
00:31:22
Speaker
you know Obviously, a lot of these different benchmarks get saturated at 100% because the AI just does the task, right? The same way that like a calculator is going to you know saturate arithmetic bench.
00:31:33
Speaker
yeah Yes, obviously, it gets 100%. Congratulations. I mean, what what we're doing right now, many different teams are doing, are trying to put together extremely difficult benchmarks in in mathematics, for example.
00:31:45
Speaker
What do we do once we are beyond that? For example, something like humanity's last exam. How do we measure performance at the at kind of very top of what's possible for humans? And what and what do we do when when we are no longer in the games? So we can't really produce questions difficult enough for for a to measure AIs accurately.
00:32:07
Speaker
I mean, presumably, we're going to do we're going to do is we're going get AIs to write new benchmarks that measure what AIs can do in these realms. And also, presumably, we're going to measure their ability to do actually useful real-world tasks.
00:32:22
Speaker
Like, you can always do the thing where, like, okay, what do we actually want to have be accomplished, right? You know, you just have, like, you know, theorem bench. Like, you know, how many how many of the great unsolved conjectures can this AI solve?
00:32:35
Speaker
Well, no great. The answer's three. Now we've solved them. Good benchmark.
00:32:40
Speaker
If we go back to Mitter's Moore's law for AI agents, this is a result that AIs can solve longer and longer coding tasks as they become more and more advanced.
00:32:51
Speaker
People asked, this doesn't apply beyond coding tasks. And so do you think it eventually will? Do you think there's ah there's a general Moore's law for for AI agents or is this limited specifically to coding tasks?
00:33:04
Speaker
There's a general law here. There's very obviously a general law here. What you're seeing with agents is that you know the time you can successfully do a task is not necessarily the fundamental thing that is scaling.
00:33:17
Speaker
It is a product of some other function, right? Because there's sort of a few different things that are happening when the amount of time that you can complete a task is happening. So one of them is the thing that i think is most important for agents, which is that the probability of failing at each individual step is going down.
00:33:33
Speaker
And that should be a steady like scaling law curve. So you're to have the thing where like, okay, before like every individual step, it would fail 50% of the time, then 20, then 10, then five, then one.
00:33:44
Speaker
And then we start to be able to combine what counts as step in this measurement and then keep this going. And that's going to allow you to complete longer and longer tasks. And then you also have the issue of you know how much context can you meaningfully hold? How much planning can you do?
00:33:58
Speaker
but Can the AI meaningfully yeah architect and figure out how to do this hours-long task? And with agents, it's going to be the same thing. right If you've got ah problem in the real world, a flight is something like you sort of, it's a fixed problem where you sort of know what the parameters are and you know what the goal is.
00:34:16
Speaker
And when the agents start graduating to doing much more open-ended tasks, then the agents are going to have to figure out how to plan for that and how to keep the context involved and how to adjust for you know when they discover opportunities, when they encounter setbacks and so on that are not just the AI screwing up.
00:34:31
Speaker
And my expectation is that all of this will happen. and like under the hood, you're seeing steady curves of improvement. And then how that plays out in terms of like measured ability to do concrete real world tasks, that curve isn't obviously going to like reflect the underlying improvements in terms of the slope of the slope or yeah structure of that curve.
00:34:54
Speaker
But yeah, we'll probably start just seeing normal yeah effectively scale, effectively scaling curve graphs of effectiveness going up and the AI being able to do more and more sophisticated agent tasks. And, know, I would expect that like by the end of the year, ah someone like me will be in fact using AI agents to do tasks that currently I'm doing myself.
00:35:15
Speaker
And you know By the end of 2026, we're going to have very, very useful agents in a variety of roles. And what makes you say that this Morse law for AI agents generalizes beyond a coding tasks?
00:35:29
Speaker
where What evidence do we have about that? There's nothing particularly unique about coding here. It's just that coding is a realm in which our current techniques for training AIs are much more effective because we have better training data to work with and because there's objective measurement of whether or not you got it right.
00:35:45
Speaker
And these things will take time for other tasks, but they are still coming. Like there's definitely going to be plenty of people who are very eager to gather that data, who are very eager to figure out how to make this work. The the the money is there, right? Like there's this huge, huge money in AI agents, if you can figure out how to do AI agents properly.
00:36:05
Speaker
And so where there's money and there's, where there's a will and there's funding, there's a way. All you need is scale and data and iteration and tinkering, and we'll have all these things. like If you don't think it's going to work, you need to tell me why it's not going to work because like we already have existence proof in the form of humans that like these things are very doable.
00:36:23
Speaker
When you extrapolate the the graph of of meters, Moore's law for ai agents, you get these these agents solving month-long tasks at some point in the future. What does that even mean? what what What is a month-long task? I mean, I can understand four-hour task for for a human, but in some sense,
00:36:41
Speaker
Maybe there's um there's a semantic question about what ah what it means for a task to to be very long. Can't all tasks be broken down into smaller tasks? All tasks can be broken down into smaller tasks, but the task of breaking that task down into smaller tasks and the task of taking the feedback from the early steps, the early tasks that you complete, and then adjusting what is in the lighter tasks.
00:37:04
Speaker
and so on, is part of what makes you able to do a longer task. Not all tasks can be broken down into independent, predefined tasks that can be done in parallel, right? Or that can only be done with pre-specified individual parts where you know exactly what A is going pass to B, is going to pass to C, is going pass to D. Often, you have a task where you start at point A and you end up at point Z.
00:37:30
Speaker
And you're going to take an unknown path with unknown obstacles and unknown management through that. you know like I'm a gamer, so like you can think of like various games that take like a very long time. And so like some of them, you can break them down into subtasks where like each level is an independent action, right? Or each particular mini step can be solved separately. And some of them you basically can't because can take any number of paths through that, especially if you don't know the game in advance.
00:37:56
Speaker
Humans are currently limited by the length of the tasks that we can do. what what do What do you think it would mean for for an AI agent to become better at at solving long time horizon tasks than we are?
00:38:09
Speaker
Well, I've been you know a startup CEO, I've been a manager, and there is nothing more wonderful than an employee who can be given directions to do a long-term task.
00:38:19
Speaker
And then they go off and they do that task. And they don't check back with you every five minutes. They don't throw up their hands. They don't let an obstacle stop them. They just do the thing. Unless you they actually need your feedback and they come back to you and ask for the feedback they need.
00:38:32
Speaker
That employee is worth their weight in gold. That employee is 10x type of person. And same thing will happen with agents, right? like I want to be able to tell the agent, here is my problem, go solve it. And then be able to think in terms of larger and more complex problems they can give as just solve this.
00:38:49
Speaker
Over time, right? At some point, you go from, you know, just, you know, solve this marketing problem, you know, solve this, you know, take care of this relationship with this company to solve our marketing problems for the year 2025, to build me a company.
00:39:04
Speaker
Yeah. And then you're basically, so then you've automated your own job as as the CEO, perhaps. I mean, i think sort of, know, if you're not trying to automate your own job in some important sense, you're not doing your job.
00:39:17
Speaker
but Like that's, everyone's goal is to make their job in some sense unnecessary, right? Or much, much easier to do. And I have had the great privilege of being able to do that at certain times.
00:39:30
Speaker
Do you think that's actually the order of automation? So if if we imagine a corporate pyramid, will automation happen from the bottom? So say junior employees first, and then perhaps managers, and then directors, and then the CEO as the last person to be automated?
00:39:45
Speaker
I would think of it in terms of tasks, not in terms of jobs. So what's going to happen is that you know individual tasks that are done by people are going to become automated. And my guess is that like there will be a lot of lower level people who will get most of their jobs automated away in this fashion.
00:40:04
Speaker
And then you know ah few people will be left to deal off special cases, and then there'll be fewer of those over time steadily. But also like some of the things that people at higher levels do will also be automated relatively early in the chain. yeah We're already seeing this thing of like a white collar, like email tasks are often like highly automatable.
00:40:23
Speaker
And there's also the restriction that right right now we haven't solved robotics. We haven't solved physical intervention. So if your job involves physical action, you you are somewhat protected by that. Although... We're probably not that far from the you know camera in your glasses, microphone in your ear, telling you what to do, telling you what to say style situation, like coming up more and more.
00:40:44
Speaker
And we're almost certainly not that far from actual robotics, ah vastly improving sometime in the next decade. And it's just a question of how fast. On that question of how fast, there is a recent debate with with two schools, say around the this the pace of of the AI economy.
00:41:02
Speaker
in One school emphasizes the power of research and automating AI research itself. And that's a school that believes in very fast AI progress.
00:41:14
Speaker
And then another school believes in and more broad automation over the the coming decades, where progress economic progress requires that you implement AI in the economy, and and it requires more than just research.
00:41:29
Speaker
yeah i'm um I'm talking about the the people from Epoch in this context. what do you What do you think of this debate? Where do you stand? I wrote a response to Epoch, right?
00:41:39
Speaker
I call it You Better Mechanize after the name of their new company. But i felt like they made some good points and they pointed to some real bottlenecks.
00:41:50
Speaker
But of course, there are real bottlenecks. If there weren't real bottlenecks, then we're talking about yeah know essentially infinite growth. but like The bottlenecks are the reason why singularity isn't just a immediate jump to infinity.
00:42:03
Speaker
They're taking a very strange position where they expect there not to be super-intelligent, but they do expect there to be hyper-growth anyway after a while. But they do expect all these real barriers. They expect it to be slow. I had lots of particular disagreements with their perspective. Certainly, I think that like i think that saying that super-intelligence is likely to be very hard to achieve and therefore will take a while is a very reasonable position.
00:42:23
Speaker
I think it's possible. I think that thinking that it won't change everything pretty much immediately is utterly foolish and seems obviously wrong. And to dispute the premise of the question, like like that's just so completely obvious to me on every level that like even like very narrow subsets of what super intelligence can do already change everything on this level let alone like the whole breadth of everything including the things you can't predict what i find obviously especially ludicrous are the people who predict the ai will have less impact than it already has or the predictors has less than it is already baked in from just exploitation of what we already like clearly can do with current models plus like reasonable learning so like
00:43:05
Speaker
you know, AI agents are coming, right? Like even if like GBT, know, five never shows up and quad four never shows up and Gemini three never shows up and we're stuck with what we have.
00:43:16
Speaker
That's clearly good enough to make much vastly better AI agents and vastly better like practical tools on a wide variety of fronts than we already have. And this will change everything.
00:43:28
Speaker
not in this Not in the full like transformation of like existence as we know it, but like we'll change everything. And it will like substantially grow the economy quite obviously versus the the counterfactual.
00:43:39
Speaker
And if you fail to recognize this, then I don't even know how to respond to that at this point. Usually people who claim this isn't true have just not used the models recently or are just like being very, very stubbornly attached to like the economic models that don't make any sense in this situation or other similar things.
00:44:01
Speaker
It's kind of weird. But you know I would say i am more on the, if if we build it, it will matter side of the question. And the question is whether or not we will build it.
00:44:13
Speaker
But the question is also how fast we get to superintelligence if we get there. And I think a key question here is the how powerful automating research is.
00:44:26
Speaker
What do you think of that? Do you think we can we can automate research in ah in a way that ah causes explosive economic growth, for example, or that causes fast technological process without transforming the whole economy gradually?
00:44:41
Speaker
So my guess is probably, but yeah this this is a physical question. It comes down to whether or not you can get enough efficiency gains from the R&D process and the improvements you get from it on its own to create a positive feedback loop.
00:45:02
Speaker
or if you need that to interact with the physical world, a substantial amount in order to scale other things as well, in order to cross that threshold. And of course, it's a question of how long it takes before you get AI as efficiently capable a cause this feedback loop to start.
00:45:17
Speaker
And so these are you know fact questions that like I think are very hard to know in advance, but my expectation is, yes, this is the most likely outcome is you can do these things.
00:45:28
Speaker
But also like, you know, what is very fast is often a question that, like, gets confused, right? Because, like, the whole, like, debate over fast versus resist slow takeoff, right? Like, the Cristiano perspective of what a slow takeoff is, is still really, really fast, right? Like, any reasonable debate would consider that fast for the purposes of the discussions that are being had, for the practical purposes of why we're having those discussions, even though compared to the Yudkowskian perspective, it is slow.
00:45:54
Speaker
So, know, I think the Yudkowskian outcome is still possible, but that, like, you know We shouldn't be confused that like you know when like Tower Cowan writes, slow takeoff, people. Are you convinced yet? Well, yeah, I'm convinced it's probably a slow takeoff.
00:46:10
Speaker
And by slow, I mean one year, not five minutes. There's definitely an effect where the goalposts have moved over the years in in the kind of general expectations or what what statements people are are willing to make publicly about when we might get something like AGI, where what's considered fast timelines and and and and fast takeoff now was was ah yeah unheard of just a couple of years ago.
00:46:35
Speaker
Yeah, no, the old model was something like, you know, who knows when a AGI or ASI will show up. That's going to be potentially a very, very long time. yeah know, oh, you think it's going to happen 20 years? That seems like a very short timeline because that's like, you know, suddenly the best important event in the history of the entire universe is going to happen like within our lifetimes and it's rapidly approaching and that's a pretty bold claim.
00:46:58
Speaker
And, know, but at the same time, there were also claims that like once that did happen, once we started taking off for real, then things would potentially happen like remarkably fast, right? Like we could be talking about things like weeks or days or hours once you hit that point.
00:47:13
Speaker
And now, know, we've sort of, we've seen this pattern of, no, things are escalating very quickly. And so we're seeing this gradual slow style take off in the sense that it's like looking like a much, much more impressive, scary curve curve.
00:47:26
Speaker
now. And then you know this means that like yeah people were saying, like well, I don't expect artificial general intelligence to happen for another 10 years. And people say that's a slow timeline.
00:47:37
Speaker
And like no, that is the scariest thing anybody has ever said and like compared to anything yeah if you exclude the last few years of this debate. That's saying my kids, when they grow up, will be faced with like computers that are like universally smarter than they are.
00:47:50
Speaker
like That's kind of crazy. how How do we think coherently about the limits of superintelligence? and Because I've heard you complain that that people are too they are imagining a world in which we have superintelligence, but then they're they're imposing limits on that superintelligence that are unrealistic.
00:48:07
Speaker
But there must be some limits, right? Superintelligence is constrained by the laws of physics. It's not an omnipotent being. It's probably constrained by some laws of engineering, in a sense.
00:48:18
Speaker
yeah how How do you think about these limits? Well, obviously I am not a super intelligence, so it will figure things out about the laws of physics and about what is possible that I cannot. It's certainly possible, you know, we can certainly talk about things like the speed of light and conservation of matter and energy and like literal limits to what it can do.
00:48:37
Speaker
But, yeah know, beyond that, I think we should be very careful about assuming that there are any other limits. People think remarkably crazy about this question, right? So like I was on a podcast very recently that I believe you listened with the question of, you know could a super intelligence have swung the 2024 election?
00:48:54
Speaker
And like, to me, it's like that, what are we even talking about? A human intelligence that swung that election. You're not even trying. like And without even doing anything particularly impressive.
00:49:05
Speaker
So people just like refuse to acknowledge the premise right that a superintelligence is vastly more intelligent and more capable across every possible question and issue than a human.
00:49:17
Speaker
And then you can copy that and scale that and write it in parallel and have them communicate and basically figure out anything there is to figure out. And you know yes, subject to physical constraints, you can probably...
00:49:29
Speaker
yeah know rapidly become capable of rearranging the atoms of the universe into whatever its preference ah configuration might be. But yeah, those physical limitations are real. And yeah it might take some amount of time to yeah e effectuate its changes.
00:49:46
Speaker
And you know I don't know whether or not like the right way to do that from current physical state is to engage in like some sort of weird nanotechnology or other thing that we haven't even thought about note of a name for.

Scaling AI: Inference and Safety Concerns

00:49:59
Speaker
Or if it's just to do it in ordinary fashion via solving robotics and like, you know, or even just giving people instructions and convincing them to do the things, including like using its super intelligence to make a bunch. Like the bare minimum a super intelligence can do is it can make essentially unlimited amounts of money in any number of relatively obvious ways. And then use that money to, you know, hire people to do whatever the things that wants done and get them to do the things.
00:50:21
Speaker
And if you're imagining it doing less than that, then what are we even talking about? And yet most almost everybody thinks of the superintelligence as less capable than that. So it's a very difficult discussion and imagination to have.
00:50:35
Speaker
But you i and think you should just assume that like it runs over any barriers other than physical laws that it has if it's sufficiently superintelligent. Obviously, like there are levels.
00:50:47
Speaker
What are those levels? because i assume that we we agree that there's a huge range of intelligence above the human level. wouldn Shouldn't we categorize, say, superintelligence as something that's ah better than than any human at at any task, and then perhaps a super superintelligence as something that's an order of magnitude better than that?
00:51:06
Speaker
I mean, why why is why why go straight to the laws of physics after we get to superintelligence? Well, to be clear, it would be super duper intelligence. Super duper intelligence, yes. But the answer is because the super intelligence, the super duper intelligence pipeline is very straightforward and very quick.
00:51:22
Speaker
right like If you have a super intelligence, right like one pathway it could do is just earn a lot of money, like if not faced with other super intelligence. It could earn a lot of money and then use that money to hire people to do all the things and then use its power over those people to leverage its power over other people and so on. like That's this the most straightforward, easy thing for it to do.
00:51:43
Speaker
if it's not goingnna Somehow it's barred from doing anything else for some reason. But the other obvious thing that super intelligence can do is create super duper intelligence and that has whatever priorities and goals the superintelligence wanted that super-duper intelligence to have.
00:51:56
Speaker
And so superintelligence implies super-duper intelligence. Because if we're capable of creating superintelligence, and we do, then we should assume that the superintelligence is capable of creating super-duper intelligence.
00:52:11
Speaker
So when talking about ASI, we're talking about rapidly intelligence on the level that is constrained by the physical laws available to it and the resources that are available to it at the time when this yeah exponential S-curve eventually peters out, because yes, there is some physical limit to intelligence levels.
00:52:30
Speaker
But yeah the working assumption is that that level will then not reach the top of this S-curve until it is like well beyond what we are capable of thinking about here. And therefore, it won't just be slightly above us, but be like very, very large above us. And then, you know, thinking about things like the laws of physics and preferred configurations of atoms are like the best tools we have for thinking about what this looks like at our intelligence level.
00:52:53
Speaker
If we return to to Earth for a bit here or to something something closer to the time we're in now, why is it that with current models, we can't just spend a lot of money on inference compute and and get much better results?
00:53:08
Speaker
Why is there limit on on how much you can spend on inference and and you kind of you run into to an upper limit of how good the answer can be? And you end up missing returns. If you saw ARK, they were spending tens of thousands or some of some absurd amount of money on questions in ARK.
00:53:25
Speaker
There's obviously no reason why you would think that yeah if you haven't gotten it in the first $1,000 worth of compute, are you really going to get it in the second $1,000 worth of compute? like like This is a very, very simple puzzle.
00:53:36
Speaker
but that but that But yeah, that was actually the case, right? That they got it in the $8,000 worth compute. In some cases, it it definitely was somewhat of an improvement, right? You could still get marginal gains by just trying and trying and trying.
00:53:47
Speaker
But my assumption is that mostly what you got for that eighth thousands of dollars was just best of K where K was very large, where it just essentially was just, going to keep trying- slightly different with slightly different inputs.
00:54:00
Speaker
And then occasionally we'll crack the case where we didn't crack the case because within arc, verification is easier than generation. So if you just do a lot, a lot, a lot, a lot of generation, you'll occasionally get it right. But these models have like various ways in which like they have at best exponential but you know returns to scale where like you have to increase the zero to get like a linear increase.
00:54:23
Speaker
And it's the same thing humans, right? like if There are a lot of problems where if you think for five minutes instead of five seconds, you're much, much more likely to get it right. And if you think for five hours instead of five minutes, you're somewhat more likely to get it right.
00:54:37
Speaker
you think for five days instead of five hours, doesn't help that much. Because if you work on if you didn't get it by now, are you really going to get it? And if you think for five years or 50 years, does it really help you that much? Or like are you just like sort of randomly like trying it over and over again until you get it, and then like suddenly you're, oh, and suddenly you were enlightened by the Zen Cohen, right? like But what's really going on here?
00:54:57
Speaker
And, you know, essentially, like if the solution is just very, very unlikely, right, in your prior and like there's no like c you're not capable of a sufficiently good sequence of logic, right? Like if you have a failure rate or if you have a like inability to to create next steps, like or to think like in a kind of global strategic way on a sufficiently strong level, then no amount of extra thinking time is is necessari going to be that helpful to you.
00:55:22
Speaker
unless like you're allowed to do ancient style things. Obviously, like at some point, you're like, I give you 10 million years to solve this problem. Okay, I am going to raise children to be smarter than me and better at solving these types of problems. We're going to raise children that are smarter than that and build entire civilization.
00:55:35
Speaker
And eventually, yeah i'm going to use the earth of the giant supercomputer that's going output the answer and give you back 42. But this is going to take a while. Is there a threshold where the underlying model becomes good enough for inference scaling to be the only scaling we need?
00:55:51
Speaker
I mean, by definition, that has to be true, right? Like if you if you are sufficiently intelligent, then at some point, all you need is inference. Certainly for any given problem, that will be true, right? but For any given optimization task or like, you know purpose, there will be a level of like underlying capability where additional inference all you need.
00:56:15
Speaker
Yeah. Okay. To ask um a more precise question, do do you think do you think companies will start prioritizing as scaling inference over scaling pre-training? Do you think that that trade-off will be made soon?
00:56:27
Speaker
I think it already was made. I think that's what we are in fact seeing right now, but that's largely because they discovered the inference scaling laws and then they just after they discovered the training scaling laws, essentially.
00:56:40
Speaker
They figured out how to scale up training many, many zeros before they discovered how to scale up inference. So what's going on right now is they are taking the lower hanging fruit in inference scaling in many cases.
00:56:54
Speaker
And that will continue. But as they do that, the relative gains available in inference go down. compared to the gains in pre-training. And at some point, they'll have to start advancing both again or find a third way to scale. If we're picking low-hanging fruits right now with inference scaling, how how far do you do you expect this to take us?
00:57:16
Speaker
Do you think we'll see a leap like we did with pre-training? My guess is that for kind of practical, mundane purposes, is' a long way to go. But in some sort of abstract, like raw capability slash raw effective intelligence sense, we've already extracted the bulk of what we can get before we hit the diminishing return threshold where we should go back to scaling, retraining as well. In fact, I think we're probably close to there.
00:57:43
Speaker
It's more that like there's a lot of room to make them more practically useful. Like oh three was not that big an improvement in terms of its underlying intelligence, but in terms of its raw capability.
00:57:56
Speaker
O3 was a big advancement because its tool use is much better. What does inference scaling mean for safety? There's this idea out there that it's quite bad but for safety because you get reward hacking by default when you scale inference.
00:58:12
Speaker
What do you think of that? I think you were getting a reward hacking by default anyway. it's just that your systems were not capable enough for you to notice. Like you get reward, like, you know, were saying like, you know, the real problem with reinforcement learning causes reward hacking.
00:58:25
Speaker
We should instead be relying on reinforcement learning through human feedback. And I'm like, did you hear the first two words you just said? Right? Like that was still reinforcement learning. It has all of the same problems.
00:58:36
Speaker
What was going on was that we weren't specifying... Like a game about, like we didn't have a common, for a combination of reasons, doing reward hacking on the human feedback wasn't something that was rewarded, right? Wasn't something thing that like was the right solution to the problem yet.
00:58:54
Speaker
So it wasn't showing up that bad. you you were start You could see it if you paid attention, but it was subtle. And now we're starting to see it be not so subtle. Like GPT-4-0 is just displaying misalignment problems and it's not a reasoning model. It's not scaling inference, right? And why is that happening? Because it's reward hacking on RHF, right? Like fundamentally speaking, it's already there.
00:59:13
Speaker
So to me, like this is making it much more obvious at lower levels of capabilities that this is a problem. And this is in fact good. Because it means we have any chance to recognize, you know, what some called total less wrong victory.
00:59:25
Speaker
Just like... which is bad, to be clear. We don't want those. Where you know it's very clear that all of the things that we said were going to go wrong, they are going wrong, except they are going wrong faster and more obviously than we expected because we thought they wouldn't go wrong in ways that were so easy to detect because there'd be incentives against that. But actually, it turns out there aren't.
00:59:47
Speaker
It just goes wrong in ways that are trivial and obvious. Which you know is great news because now we have the opportunity to realize that and maybe fix it before things go completely off the rails. So I think that inference scaling in that sense is is, if anything, good news.
01:00:03
Speaker
The way in which it's bad news potentially is that if you're scaling inference, it can be much harder to detect and much harder to regulate and manage. So you know if anybody in the world can just take the same baseline algorithm or model and then scale up inference the moment they have access to any amount of compute like distributed in any fashion, you know then you have potentially a problem where like you basically can't stop it in any reasonable way.
01:00:29
Speaker
But you also have the the plus side where the institutions that have, ah or just individuals or whatever, that have access to more compute, therefore have important access to more intelligence and more inference, and therefore like have a kind of decisive advantage.
01:00:45
Speaker
So like that can be a counterweight to that. But I think none of these are the important most important aspect in my mind of this, which is, i think, it is good news.
01:00:56
Speaker
And the reason it's good news is because This allows you to scale up the capabilities of a model that you can be confident is aligned.
01:01:09
Speaker
So essentially, if you have a model that you know you can trust in important senses, it's got the right characteristics, it has the right goals, the right values, whatever you want to call it. like like It's not getting detailed.
01:01:25
Speaker
But like you've got a model you can trust. And so, know, if you've got GPT-5 and you can trust it, well, how does GPT-5 supervise GPT-6? Well, I was always of the feeling that like, well, you just can't do this because GPT-6 is smarter than GPT-5.
01:01:40
Speaker
And so like, if you have a problem, GPT-5 won't be able to properly detect it. And in fact, like by trying to do this, you will just like train it to be like undetectable and like, otherwise I get yourself into worse trouble.
01:01:52
Speaker
However, if you have 07 and it is trying to supervise 08, you can, in fact, do something where 08 is 08 teeny mini, where it uses a very small amount of inference.
01:02:03
Speaker
And 07 becomes 07 pro high, where it is using five orders of magnitude more inference. So on every query, 07 can use... you know, five orders of magnitude more inference evaluating than 08 spent generating.
01:02:17
Speaker
And therefore, allow 07 to, like, effectively if we be able to supervise 08 in this sense. And this can become an important tool in your toolbox to overcome one of your, like, biggest reasons why you will inevitably fail.
01:02:28
Speaker
And therefore, i am seeing potential solutions to problems that I didn't see before. And you know I was sufficiently grim about the technical solution situation that seeing a way out at all trumps all these other questions.
01:02:43
Speaker
Because like if you can't find a solution at all, then nothing else matters. And like it feels like now there are solutions to this problem. and that That's very interesting. But I think we should situate this within the broader context of your views about your P-doom, basically, or your views about how likely we are to to have a good outcome here. How ah optimistic are you?
01:03:04
Speaker
I mean, as I said on the cognitive revolution, my my doom number is now 0.7. I am pretty doomy. I feel like the the planet and our civilization is determined to lose even relatively easy scenarios. I think we have to solve a bunch of different, very difficult problems to get through this, where you know in many cases, you have to like walk a narrow path between like to things that kill you in opposite directions.
01:03:32
Speaker
And also you have to solve different like problems that are not necessarily so correlated to each other and having solutions. And so it looks very difficult. And at the same time, know, we are, if anything, like doing our best to like not take this problem seriously, take precautions at any level, have a government which is like determined to not like react reasonably to circumstances.
01:03:53
Speaker
International cooperation is going out the window, even in situations that are much easier to cooperate on. and Like, if you can't agree to buy goods and services from each other, how are you going to cooperate on AI is a very serious question to be asking.
01:04:04
Speaker
And so, know, it looks pretty grim. And, like... Most of my P not to do comes from a combination of maybe it's going to take a while to build super intelligence or maybe there's model error, right? Maybe, maybe there's these things I'm not accounting for and just model uncertainty. And like, you know, some amount of, you know, very smart people who think about about the situation often think very differently. You have to give that some weight, but and these problems look pretty dang difficult.
01:04:34
Speaker
slash borderline impossible for a civilization at our skill level. And I don't see that skill level rising particularly rapidly. Yeah, in in discussions about what failure looks like in AI risk, you often hear references to these historical analogies like the 2010 flash crash or the stocks net hacking s as kind of somewhat analogies to what the types of things that could go wrong if you have rogue AI.
01:05:00
Speaker
How accurate are are these analogies? Specifically, how accurate do you think the 2010 flash crash is as an analogy for AI risk? I don't think that's a good analogy. I think that we reasonably well understand the flash crash.
01:05:15
Speaker
And I mean, it's certainly a ah way of illustrating that these processes can take on a life of their own and can cause feedback loops and like runaway effects that we didn't expect.
01:05:27
Speaker
But I just don't think it's a good metaphor for the things that I would be worried about. What is a good metaphor? I mean, a good metaphor, like there's no perfect metaphors, obviously. And like every time you talk about any metaphor, there'll be people who are going to point out reasons that obviously doesn't apply.
01:05:43
Speaker
But, you know, the best metaphors are things like humans just rapidly taking over everything. You know, we, I mean, you have the, you know, the agricultural revolution, the industrial revolution, you know, these kinds of historical parallels seem like better things to be thinking about.
01:05:57
Speaker
i think certain, you know, obvious conquests like Cortez, right? I think gift stores are reasonable things to be thinking about if you're looking for historical parallels. I think there are a number of other parallels that you could look at for different aspects, right? You don't want to think that this is the metaphor for the whole thing, but this metaphor for sort of one aspect of what might happen. So like, I think like television is one of these things where like sort of everyone says, oh, look, all the doomers were wrong and the doomers were right.
01:06:23
Speaker
And the thing just happened and the world did just change. And like, you know all of our discourse and all our means of decision-making and all of the way we spend our time in our entire civilization was transformed.
01:06:35
Speaker
by this new technology in ways that like we steered in some ways, but like in many ways just didn't steer at all. And a lot of the things we were talking about as downsides turned out to just be correct.
01:06:46
Speaker
I mean, it turned out in this case that we were able to adapt from it. It wasn't fatal, but that's because you know there was no underlying optimization power behind the TV in some sense. right The TV wasn't ah competing a competing agent. It wasn't a It wasn't an optimization pressure of its own.
01:07:01
Speaker
But a lot of the times, like, you know, like, like you know, people's, like, there was a threat of like, oh, coffee, people were so worried about coffee and coffee houses, and they were going to overthrow the empire. And it's like, it led directly the Goriast Revolution, right? Perhaps the most important revolution in the history, like, was directly caused by conspiracies that happened at coffee houses around coffee that happened because there was coffee.
01:07:21
Speaker
So, like, Yes, cognitive enhancements can be a big deal. But, you know, I think ultimately the real answer is like the most powerful optimization engines, the most powerful intelligences typically get their way.
01:07:35
Speaker
And that's the default outcome. The most capable agents, the most competitive things in existence, you know, And that's what's about to happen by default, right? Like, you know, people people rat on evolutionary parallels for all the reasons they're different, but I think they're better parallels than the flash crash.
01:07:53
Speaker
If we look at the menu of options for keeping these models safe, we have interpretability and we have in some forms of some form so oversight. And we have this notion of ah an automated AI safety researcher. do you have Are you optimistic in any of these directions?
01:08:12
Speaker
I'm optimistic in the sense that I think they help. I'd rather have them than not have them. And I robustly rather have them than have them. I think that like interpretability is helpful. Automated AI safety researchers. So, you know, you have the ancient Yudkowskian warning that the worst thing you can ask the AI to do is your alignment homework. and I think this is still true.
01:08:30
Speaker
And the reason this is true is because alignment is a completely not contained problem, right? It's it's a problem that incorporates the entire world and which requires that you act on the entire world.
01:08:44
Speaker
and which will get you into trouble if you have any problems with the underlying system that's doing the alignment homework. So you have to kind of already solve the problem in order to have the AI solve the problem, right? Like you really, really, really want to be asking any other question if you possibly can, but you kind of can't.
01:09:00
Speaker
There's a lot of like, well, we were hoping to do all these really sophisticated, smart things. They take really important precautions, like put the AIs in boxes, develop oracles, you know, do all of these different things. And like the first thing people did was they hooked the AI up to the internet and they like laughed in our face that like, how, why would you think anyone would take any precautions whatsoever until like, it was completely obvious.
01:09:22
Speaker
No, we're going to act like we're in Mission Impossible, dead reckoning, and like, see what happens. So yeah, here we are. So instead, you know, we have to like, you know, the automated AI safety researcher has gone from the worst possible idea do well, maybe it'll work.
01:09:40
Speaker
Yeah, I mean, is that even the explicit plan anymore at OpenAI? i'm After the super alignment team fell apart, I think their plan is is just to try a a bunch of different things at the same time and hope that they, in combination, work and and and keep us safe.
01:09:58
Speaker
What do we know about the the safety plans at at OpenAI at the moment? I mean, the preparedness framework is ordinary defense in depth, centrally. And that's better than nothing, but but what will not work.
01:10:10
Speaker
and it won't work because? Because none of the parts of the defense in depth are credibly adequate for super intelligence. It's just like not possibly going to work. Like I've seen the components and this doesn't work. And I would have hoped that like, you know, in the 1.0 framework they laid out, well, if we see these advanced capabilities then we're going to have to lay out mitigations to deal with that.
01:10:37
Speaker
And the hope was that if they had to write down the actual mitigations, they would take a look at what they'd written down and go, oh, obviously that won't work. That one faced at this level of capability. And then they would come up with something new, or they would realize they didn't have enough and they just wouldn't deploy.
01:10:51
Speaker
And they worked on something new. And that hasn't happened. they They instead have said, oh, I think this is adequate. and And that's scary. And I see no reason to expect that not to continue while it gets less and less adequate.

Evaluating AI Safety Strategies

01:11:02
Speaker
So that's really scary place to be. I don't think we have enough time to explain in detail why each of those particular interventions wouldn't work. But, know, I'm very, very confident that like yeah they The best case scenario is that these are placeholders that like who kind of work long enough for us to get systems that are smart enough to figure out better solutions than we have.
01:11:27
Speaker
right And that's not that's not impossible, but like that's the hope. The hope is not that these will hold up. This won't hold up. Are you optimistic about the the safety plans of Anthropic or Google DeepMind?
01:11:38
Speaker
No, I am enthusiastic that Anthropic has built a culture of safety and caring about safety in its employees. I am optimistic they are funding and doing a lot of good marginal work on safety.
01:11:51
Speaker
I am not convinced by their safety and security plan, nor am I convinced by Google's any more than I'm convinced by open AIs. They each have their good parts. They each have their bad parts. But, you know, I mean, it comes down to spirit, right? Like if if you obey the spirit of...
01:12:06
Speaker
any of these plans, you say, i'm not we're not going to do things that we shouldn't be doing. And if they take that seriously and they look at what they're developing properly, then there's hope for us all. And if they're just looking to enforce the letter of what they've written down, there's no hope for any of us under any of these.
01:12:22
Speaker
What does a good safety plan look like then, if if you were to write one for the world? Well, I mean, i don't I don't think there's any fundamental way out of the spirit of the rules. dilemma unless you are going to outsource your release decisions to a third party at minimum.
01:12:38
Speaker
I think that like without that, you're just completely dead. completely dead But you know i'm I'm writing a critique. I've written critiques of the current safety plans. I'm writing a new one for Paradis 2.0. I think that you have to specify you wouldn't release, wouldn't release but you wouldn't train. It has to be based on what you anticipate is likely to happen.
01:12:59
Speaker
It has to take into account future scaffolding and other improvements about how to use the things that you're doing. And it has to like not count on mitigations that we... don't have any reason to be confident in.
01:13:10
Speaker
But like, ultimately, you know, the safety plan, as it's currently described, it's supposed to be an alarm bell. It's supposed to just be like, I can't help but notice that this is getting too risky and you should stop when it reaches this point.
01:13:23
Speaker
And so, you know, they're not that far from serving that purpose, but that's not the safety plan, right? That's the plan to, that's that's the safety plan to figure out you need a safety plan. And then like, as far as the actual safety plan, I mean, I know how I would try to do the work, but you know, let's do the work.
01:13:39
Speaker
And if I knew how to do the work, I would do it. Is the best safety safety plan mainly technical or mainly about governance or about kind of social features?
01:13:51
Speaker
You need both. Failure in either half is death. So like if you if you don't do the technical work before the time comes, then nothing else matters, right?
01:14:01
Speaker
Nothing else will be adequate. If we do the technical work, you still have to solve the governance problems. You still have to figure out how to deploy this thing. You still have to figure out how to create a world in proper equilibrium because there is no technical solution that is robust to being given the wrong set of instructions or distributed in the wrong way and entrusted to the wrong dynamics between people.
01:14:19
Speaker
They're all vulnerable both to centralization of power under malicious group or individual, and they're all vulnerable to diffusion of that power,
01:14:31
Speaker
to you know, the people or sufficiently large group of people or groups that each then have the AI pursue their own individual interests because either of those scenarios doesn't work out, doesn't allow us to get through this.
01:14:45
Speaker
So, yeah know, that's a hard problem. That's the phase, call that the phase two problem after the technical problem is the phase one problem. And at minimum have to solve these two phases and then there's a lot of sub problems involved in both of them.
01:14:56
Speaker
you've You've covered all the drama surrounding OpenAI quite extensively, the firing of Sam Altman, the you know the lawsuits and the statements from former employees and all of this. How how much do you think that drama matters for the outcomes in the long term?
01:15:12
Speaker
I think that's a good question. I think it matters a lot in the sense that I think that there were a lot of very good people at OpenAI who are no longer at OpenAI.
01:15:24
Speaker
who were often in very senior positions and who were going to be very positive influences on opening AI's decisions and who, in fact, were having very positive influences on their decisions. I also think that ah this impacted the competition of board, right? The ability of the board to check and contain Altman.
01:15:40
Speaker
And Altman's behaviors have dramatically shifted in many realms towards recklessness. and towards jingoism and you know various forms of like public dishonesty. And his core beliefs may have also changed in negative ways, or they may not have. It's impossible to tell because again, like you can't trust him to be reporting his beliefs honestly at this point.
01:16:01
Speaker
But OpenAI is a much less trustworthy institution that's going to make much less responsible decisions than in a counterfactual where those events went differently.
01:16:12
Speaker
And that's highly unfortunate. That doesn't mean that OpenAI is now an unusually irresponsible organization because that seems not to be the case. It's an unfortunate fact about the world that the bar is not very high.
01:16:27
Speaker
Like a lot of the things that I notice I don't like about OpenAI's preparedness framework 2.0 point out are that it's not making commitments.
01:16:38
Speaker
It's leaving itself room to make whatever decisions it wants. And OpenAI has established that it has given me a reason to trust it. But like if I still had reason to trust it or had better reason to trust it on ah various levels, then those decisions would make a lot more sense to me.
01:16:52
Speaker
They'd be more justified. But like yeah I think only anthropic has a clear case they are... being clearly more responsible here than OpenAI. And I don't think Anthropic has lived up to the standards that I would want for a frontier lab or anything like that. They're just the best of the lot.
01:17:12
Speaker
Google can reasonably claim to be, you know, as a more responsible than OpenAI, but I don't think it's obvious. And then if you look at the rest of the the pack, you see some deeply irresponsible actors. yeah You see DeepSeek, you see Meta, you see XAI.
01:17:24
Speaker
And the extent that you take these players seriously, you see something that's like clearly clearly not as responsible as OpenAI, even in the new era. would Would you say we've gotten lucky in the sense that the leading AI companies, OpenAI, Google DeepMind, Anthropic,
01:17:41
Speaker
At least there's there's they're talking about safety. They're saying they care about safety. Many of these companies were started explicitly because of safety concerns. Is that a good thing or does this mean that perhaps motives change over time, motives get corrupted, people are influenced by the incentives and and all the rest?
01:18:00
Speaker
The trade-off of inspiring people to create superintelligence the name of doing it first and they can do it more safely is that you have people actually creating it faster and more robustly and more competitively with each other, which is bad.
01:18:16
Speaker
But also you have people who at least, you know, are aware of these concerns and have some amount of value placed in these concerns, which is good. You know, the first thing I thought when I heard Altman was fired was, don't think I'm going to like the replacement better because, you know, as much as I have my disagreements with Altman, Altman is...
01:18:35
Speaker
very clearly, very aware of these questions. If it was like, you know, I didn't know any other details. So like, you know, if it ends up being like, you know, a Greg Brockman style person in charge, well, that's like, that seems worse, right?
01:18:47
Speaker
You don't know. But so my answer is like, I'm very happy these people have that background and understand these issues and are in fact, you know, responsible for this to some degree.
01:18:59
Speaker
But yes, people's motivations absolutely change and people like lose sight of their original goals and people make trade-offs and they stare into the abyss and the abyss starts into them and they tell themselves a story.
01:19:13
Speaker
And also, you have the problem, especially Altman, of having this perspective of viewing the people warning about these things as your enemy, which they never were, he he went down a path where he saw it that way.
01:19:25
Speaker
And then after the the board issues, it became you know that much more so that was the case. And that creates an unfortunate situation. But yeah, no, I'd much rather have like Altman or Amidai or Sabas dealing with this problem than dealing with people who didn't come from that intellectual tradition.
01:19:46
Speaker
And like, you in fact see that where like you look at Meta or you look at DeepSeq, you look at XAI, right? Like you look at the next tier down and you see people who just do not take these problems remotely seriously at all. Like I talk about OpenAI not taking these problems seriously.
01:19:59
Speaker
And that's because reality doesn't bring on a curve. They understand the problem exists and they're treating the problem. they think they They think they're taking the problem seriously. And I'm here to inform them that that is insufficient and they are not actually doing so, but good start.
01:20:15
Speaker
Right. And then like, but like with a meta, I'm like, you're actively mocking the problem. Like you're actively like throwing fig leaves that are the bare, bare minimum at most on top of the situation. in In a certain pessimistic mood or from a certain perspective, this situation seems quite hopeless then because what what can you do if you if you enter the industry if you enter the game and if you're trying to build an AI company that's responsible, maybe you accidentally push in the wrong direction.
01:20:42
Speaker
Is it possible to pause? What do you think of the idea of pausing AI development, coordinating around the pause? In theory, it's a great idea, at least at some point in the future.
01:20:54
Speaker
It's probably too early on several levels. Like it's definitely too early in terms of the political will, right? Like we're not gonna get a pause until the AIs are much scarier than they currently are.
01:21:05
Speaker
That's just a

The Case for a Pause in AI Development

01:21:07
Speaker
reality. And so I think pushing for a pause before that happens is not harmful for the most part. I think people who believe in it should do it anyway, because like they have different beliefs than I do.
01:21:17
Speaker
But I think the pause letter was a damaging thing for the chances of humanity's survival, because instead of pushing the Overton window, it then created this point of mockery, this point of attack,
01:21:30
Speaker
that was very harmful when you know six months later the world didn't end or whatever, which is obviously a stupid, stupid, stupid way of looking at things. But like we have to face the political realities that we face. Whereas the case letter, just acknowledging the existential risk of the situation, I think was very, very helpful in the same way.
01:21:47
Speaker
So the question of pause is we should we should be asking, how do we pause? We should be looking for a way to pause, such that if the major world leaders and governments realized that a pause was necessary, we could then have a pause.
01:22:02
Speaker
That's very different from actually trying to pause right now. And then I think the pause can only happen when something pretty scary happens that goes wrong, or at least some very, very scary capabilities are demonstrated in the future.
01:22:16
Speaker
And building the capacity to pause, is that just does that just consist of building ordinary international cooperation and making sure there's there's communication across countries? And what does that consist of?
01:22:30
Speaker
is Is it more on the technical side? I think it's large. There's a lot of on the politics side, obviously, building trust, opening communication channels, you know creating you know various organizations and structures and drafting language, and but also figuring out logistics, like figuring out how you would do it.
01:22:48
Speaker
Like, what is this? And we say, what's what is a pause? What does it mean for a pause? It doesn't mean shutting down every CPU. were or even GPU. It doesn't mean that we don't use AI. It just means that like we don't do this specific thing. What is that specific thing? How does that work?
01:23:03
Speaker
you know How would we enforce it? How would we monitor it? like How long would it last? like what is the What are the various different conditions and rules and so on? like If you have to start this this process from scratch after you realize that you need to pause, it's going to be probably too late. You have to work on that now.
01:23:19
Speaker
And there's also the question of like what can we do physically to make this easier? right if we If we concentrate, you know compute we monitor compute in ways that make it much easier to figure out where the large amounts of compute are and monitor it and shut it down.
01:23:31
Speaker
If we start putting like physical monitoring devices into new hardware, yeah know what what are the questions that we can do? And I think we should be asking these questions. right like We should be asking, like if we need to shut things down, how do we do that? If if that's a coordinated pause, voluntary pause, if that's something else, like we need these options.
01:23:49
Speaker
And then you know we hope to not need them. yeah know like of The best case scenario, from my perspective, obviously the best case scenario is that everything is fine. There was nothing to worry about. We just create superintelligence transform the world and it's paradise.
01:24:00
Speaker
But yeah barring that, obviously, like the best case scenario in important senses is that superintelligence is just very difficult and we have a lot of time to figure out how to handle that problem.
01:24:12
Speaker
And in the meantime, we get all the benefits of the AIs we have now, which I think would be pretty wonderful.

The Future of AI: Governance and Alignment Challenges

01:24:17
Speaker
Does that make it more difficult to do the kinds of communication we're engaged in right now or to talk about the risks just because the AI products we have are so great and they're so useful to people and people like them so much?
01:24:31
Speaker
Certainly the reason why there's a lot of opposition to these things is because AI has so much promise and so much upside. And a lot of that is very real. And a lot of that is still locked in future advancements, although a remarkably large amount of it isn't. And it's just a matter of exploiting what we already have.
01:24:47
Speaker
But I think genuinely a lot of the people who are basically, you know accelerationists who are just like, we need to go forward as fast as possible. We have to disregard all these safety concerns. It comes from a place of there's so much promise here.
01:25:02
Speaker
And also our society has become so opposed to progress and abundance and doing good things in other realms. And they see this as the last stand. They see this as the only place left to make a big difference.
01:25:16
Speaker
And like, you know, I'd love to be with them fighting for, you know, nuclear power and building housing where people want to live and doing all the, you know, the most of the rest of what their agenda would be in ordinary politics and to have that be the thing that they focus on and to make the world a much better place that way while we figure this problem out.
01:25:33
Speaker
But they've despaired at those problems such that they can't even believe the idea that like someone like me would want to support them in all these other places because it doesn't make sense to them.
01:25:44
Speaker
And this even come to the point where, you know, because of this motivation that they need to push forward here at any cost, they then, you know, develop all these preconceptions and, you know, essentially this misinformation spreads about like what must be the motivations and what must be the thinking of the people who are opposed.
01:26:01
Speaker
It's just madness. But what can you do? What do you say to make the case that AI is different from the other from the other technologies you mentioned there? So why why is AI different from from building new more nuclear power or building more housing? And why isn't it just another instance of we want to avoid regulating a new technology to death?
01:26:22
Speaker
there's there There are always many ah scary predictions and they don't come true. And so why AI different? Because AI is about creating new technologies. optimize ah new more powerful optimizers, new, more intelligent entities that are not human that will, in fact, like, know...
01:26:44
Speaker
Before, we were building tools, tools that humans use to make things that humans wanted, to change the world in ways that humans wanted. And that wasn't a reason for anarchism. That wasn't a reason to just prefer allow anyone to do anything they ever wanted. But we went way too but we went way too far in trying to constrain what people could do with these tools.
01:27:02
Speaker
But AI is not going to remain a mere tool. AI is going to have intelligence that already, in many ways, rivals us and will soon surpass us, probably. It will have be more competitive in an economic, ah capitalistic context.
01:27:17
Speaker
It will be a more powerful optimization agent, you know capable of rearranging the atoms in ways that you know satisfy its preferences and you know aren't necessarily attuned to what we want.
01:27:29
Speaker
And you know in general, yeah know we have every reason to believe that yeah know left to its own devices, that the things that will survive are the things that are optimized effectively.
01:27:40
Speaker
to survive in that future world. and that's not going to be us. It's going to be AIs. And of course, you know you can go into details and make this much more robust and explain exactly like how and what ways things go wrong or what paths things take. or you know But like my basic perspective on this point is like, yes, the alignment problem is impossibly difficult.
01:27:56
Speaker
The governance problem is impossibly difficult, et cetera, et cetera. But through all of that, stop pretending They're creating things that are not human, that are smarter than human, more capable than human, more competitive than human, more powerful optimizers than human, that can be copied, be run in parallel, you know that have unlimited memories,
01:28:17
Speaker
et cetera, et cetera, that that is a safe thing to do. That is a thing that's likely to turn out well for the humans, like by default, without any interventions. You should just let nature take its course.
01:28:28
Speaker
This is absurd. This is just patently absurd on its face. And we can argue about like what the solution to that is, but the idea that like anarchism is the way is really bizarre that we're like entertaining this suggestion.
01:28:42
Speaker
If we get to 2050 and we have a flourishing ah human civilization and superintelligence at the same time, why were you wrong? So I was probably wrong because, first of all, it probably took a lot, probably took a while to get to superintelligence. It probably came at the lower end of that range. It was wrong because it turned out that like the technical solutions were...
01:29:04
Speaker
easier than I thought they were. And we found them in time. We found ways to leverage the AIs and leverage like our knowledge and so on and figure them out. And also we managed to coordinate and steer the outcomes in some fashion in ways that like I am skeptical we can do, but that clearly aren't impossible.
01:29:23
Speaker
And we kind of won the parlay, right? We we got all these different things to happen. And like therefore things worked out reasonably well. It also would involve facts about like what leads to human flourishing and other detailed dynamics that like know working out well but like basically like a lot of things have to go right and nothing go that wrong along the way but yeah like it's certainly like you know 70 is not 99 I have no yeah I'm not I'm not claiming I don't get if you tell me it's 2030 and that's happened I'm like a lot more surprised
01:29:57
Speaker
I have noticed people beginning to use terms like AGI and superintelligence in ways that i don't that don't really ah jive with my understanding of those terms. For example, Tyler Cowen recently said that that o three is is AGI in his estimation ah by by his definition of that.
01:30:17
Speaker
I've also heard, you know, I've i've heard discussions of how do we do business in the best possible way in a world where we have AGI, for example. Or Sam Altman has talked has talked about AGI coming and and going and the world not.
01:30:32
Speaker
us we will will We will kind of get used to it as a society. Do these terms mean less and less over time? I think there's reasonable arguments the goalposts have moved in both directions in different ways.
01:30:44
Speaker
That like if it was 2008, say, just like well before, and you suddenly had 03, and you showed it to people, they'd go, oh, wow, that's AGI. And I think that would be an unreasonable thing for them to say.
01:30:57
Speaker
Look, O3 responds to the vast majority of queries much better than the vast majority of humans. It's not crazy to call that AGI. i just That's not what we mean by agi in our current conversation.
01:31:09
Speaker
That's not going to have the consequences that you know the thing that we're imagining as AGI would have. So I think it's wrong to call it AGI. And the majority people agree with me.
01:31:20
Speaker
But I don't think that we should mock Tyler for that statement. I think somewhat for his reasoning, but the statement is not great.
01:31:30
Speaker
As a final topic here, i would like to discuss trading because this is something you've done professionally. And in some sense, you've you've done it in your Magic the Gathering career also, in trading these cards.
01:31:41
Speaker
How is trading affected by and by ai Do you think it's it's getting more and more difficult to discover good to opportunities? That was already true. So as a general rule, like trading has gotten more difficult across the board for a long time.
01:32:00
Speaker
So like if you took a modern trader and you sent them back to the 70s or the 80s or whatever, they'd be like, free money everywhere, woo! right like yeah At some point, you're just like, oh, I have this Black Souls formula and you don't. This is going to be fun. But like yeah after a while, like it gets harder and harder because like you're competing against other people who are also advancing their technologies.
01:32:20
Speaker
And like it's not just response times get faster. It's that like the intelligence behind what everyone's doing gets better. And you know you discover various patterns and correlations and and systems, and then like they get computed out and they stop working.
01:32:33
Speaker
And then like there's all these different complexities that everyone's dealing with. And you see this in other markets too. Like with when I was doing sports betting, yeah i I found lots of... like you know we called it free money, basically.
01:32:46
Speaker
Like various different opportunities to just like take money out of the system fairly easily and like with large edges. And I learned later that like if I had gotten in a you know a year or two earlier, they would have been much bigger than the ones I found.
01:33:00
Speaker
And then several years later, they were much smaller than they they started out, but they were still there. And then like by the time I was done with that industry, like it was a lot harder. There were still ways, but it was a lot harder.
01:33:12
Speaker
And yeah the same thing is is true in crypto, where like you know when I was trading crypto in 2018, it was very easy as a trader with ordinary skill in the art to make like good returns.
01:33:25
Speaker
And then like I'm not saying it's hard necessarily now, but it's harder for sure. Stock market is the same way. And like that was before AI. And AI is only going accelerate all of this.
01:33:37
Speaker
Right. Because AI is now like in the hands of everybody at all sides. But it also can create opportunity because the market has not become situationally aware.
01:33:47
Speaker
So like i was listening to a podcast, Odd Lots recently, it's one of the best podcasts. and about economics, and you're listening to a trader talking about did China's situation and like the general trade war.
01:34:00
Speaker
And he's like, well, i saw DeepSea came out, and then I concluded, oh, AI can be copied, therefore it's worthless, so I shorted the NASDAQ. And like he's like, that was a good trade. And I'm like, it was a good trade because it turns out the market is that insufficient that that unaware of what's going on and has that level of understanding, such that people freaked out about things that were good for the companies involved.
01:34:22
Speaker
in many cases. Like, just complete misunderstanding, complete order of magnitude misunderstanding of what happened, and then complete misunderstanding of what it implied. You said that it implied anything. The whole thing should have been mostly priced in already.
01:34:34
Speaker
it was kind of insane. And so, you know, they're also not using AI properly. If they're not even aware of AI, they can't even think well about AI and its implications. They're clearly not going to use the tools the way you're using the tools, so you can have an advantage there.
01:34:47
Speaker
Yeah, makes a lot of sense. But in the long term, yeah when the AIs themselves are doing a lot of the trading and trading is going much, much more sophisticated, and it's going to be very, very difficult to accomplish much.
01:34:58
Speaker
And like reaction times are going to go way, way faster. Svi, thanks for chatting with me. It's been a real pleasure. Yeah, thanks for having me.