Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
How Close Are We to AGI? Inside Epoch’s GATE Model (with Ege Erdil) image

How Close Are We to AGI? Inside Epoch’s GATE Model (with Ege Erdil)

Future of Life Institute Podcast
Avatar
0 Plays2 seconds ago

On this episode, Ege Erdil from Epoch AI joins me to discuss their new GATE model of AI development, what evolution and brain efficiency tell us about AGI requirements, how AI might impact wages and labor markets, and what it takes to train models with long-term planning. Toward the end, we dig into Moravec’s Paradox, which jobs are most at risk of automation, and what could change Ege's current AI timelines.  

You can learn more about Ege's work at https://epoch.ai  

Timestamps:  00:00:00 – Preview and introduction 

00:02:59 – Compute scaling and automation - GATE model 

00:13:12 – Evolution, Brain Efficiency, and AGI Compute Requirements 

00:29:49 – Broad Automation vs. R&D-Focused AI Deployment 

00:47:19 – AI, Wages, and Labor Market Transitions 

00:59:54 – Training Agentic Models and Long-Term Planning Capabilities 

01:06:56 – Moravec’s Paradox and Automation of Human Skills 

01:13:59 – Which Jobs Are Most Vulnerable to AI? 

01:33:00 – Timeline Extremes: What Could Change AI Forecasts?

Recommended
Transcript

Automation and Economic Impact

00:00:00
Speaker
If you take this compute scaling view seriously, maybe like each order of magnitude of effective compute scale ups gets you some additional percentage of tasks that are automated in the economy until you've automated everything. Even if you do automate AIR&D, it's not clear how much that gets you. if like If you think the scaling of compute has been the key driver of the rapid software progress over the past 10 to 15 years,
00:00:22
Speaker
then you wouldn't necessarily be resolving that bottleneck by just scaling the researcher population much faster. Extrapolation just doesn't work very well. In 2015, 2016, self-driving cars were made the most exciting application.
00:00:35
Speaker
Well, 10 years later, but story is the LMs and reasoning models. and So OpenAI forecasts $100 billion dollars per year of revenue by 2029. But but That's not anywhere near enough for the economic feedback loops to become intense enough. If you want this sort of feedback loop that goes from increased the economic output to more AI developments, more scaling, then you need the economic output to actually be scaling quite rapidly.
00:01:04
Speaker
What matters is what are the broader incentives and forces? What are those forces pushing AI to be like? Welcome to

Introduction to Epoch AI and Research Focus

00:01:12
Speaker
the Future of Life Institute podcast. My name is Gus Stocker and I'm here with Ege Erdahl.
00:01:17
Speaker
Eger, welcome to the podcast. Eger, thanks for having me. Do you want to introduce yourself to our audience? Sure. I'm a senior researcher at Epoch AI. I've been there for maybe like a couple of years by this point.
00:01:32
Speaker
I work on like a bunch of different subjects from economics of AI to like, like feasibility of distributed training to like limits to energy efficiency of hardware.
00:01:44
Speaker
It's kind of hard to like list all those topics, like off top of my head. I would say anything interesting in AI, Epoch is like answering the the deep questions about that. It's superb stuff.
00:01:56
Speaker
Yeah, I guess what I would

Gates Model and Economic Predictions

00:01:57
Speaker
say is that we have like a couple of different like divisions by this point where we have a team that just works on collecting data and we have all these data products like where you can go to our website and see all the models that have been, like all the large scale models that have been trained, don't know, since whatever, 2015.
00:02:13
Speaker
You can see how much compute they use and how much like we have these big databases where you can see who published them. When did they publish the model? How much computer do we think they use? watch What was the size the training data sets?
00:02:24
Speaker
you know And we are also now developing a benchmarking hub where we do our own evaluations of benchmarks. Right now, it's still kind of limited in scope. Like we evaluate mostly math or reasoning benchmarks right now.
00:02:38
Speaker
Plus, we're hoping to expand that in the future. And then we have like a research arm where we publish like reports and blog posts. And now we also have a newsletter, which I'm mostly responsible for. And I would say like, I mean, I'm involved in some of the other things as well, but I mostly work on the ah research side of the organization.
00:03:00
Speaker
Yeah. One of your newest outputs is this GATE paper or interactive model, which is GATE modeling the trajectory of AI and automation. What is GATE?
00:03:12
Speaker
So there was this previous line of work done by Tom Davidson OpenFail, where he took this like he basically translated idea of compute scaling into an setting of an economic model ah where he says, okay, there are some, there's some spectrum of tasks in the economy.
00:03:29
Speaker
If he takes this compute scaling view seriously, maybe like each order of magnitude of effective compute scale up. So that's allowing for a software progress as well as just physical compute scaling. Maybe each order of magnitude gets you some additional percentage of tasks.
00:03:44
Speaker
that are automated in the economy until you automate everything. And so there's like a certain slope for like you start automating tasks at a certain point and then each order of magnitude you automate at maybe 10%, maybe 20% auto tasks economy.
00:03:56
Speaker
ah pass in the economy And then you can plug that assumption into like a relatively standard economic model, which turns us like into a non-standard model because most of these models don't have AI in them.
00:04:08
Speaker
But the rest of the model can be fairly standard, like the production function and how you assume that investments and consumption and whatever decisions in the economy are made, that can be fairly standard. And you can see what happens.
00:04:19
Speaker
So Tom Davidson's original model ah was also developed with Epoch's assistance, especially on the software side. And then we decided to develop Gates because that model has like had some shortcomings that we wanted to address.
00:04:32
Speaker
Maybe the biggest one was that in that model, economic decisions where were hard coded into the model. So they were not made by in the standard way that this would typically be done in the economics literature, which be that you would have some kind of utility or value function.
00:04:47
Speaker
And then you would be optimizing that to get like the best outcome, maybe under certain constraints. Instead of

AI Progress and Economic Impact

00:04:53
Speaker
doing that, it was just a bunch of hard code rules, the little parameters that I think the open field folks just thought made sense.
00:05:01
Speaker
ah But that meant if you like use the extreme parameter values or something, the decisions would look very unreasonable. So that was the main motivation to turn it into a scanning model. This would also allow it to answer questions that the original model could not.
00:05:15
Speaker
Because you could say, okay, i like suppose that people assign such and such credence to the like AGI being capable of full automation. let's say 10%, like they are not sure, maybe they're uncertain, then what should they do, right? so How much should they invest?
00:05:29
Speaker
There is this interesting possibilities where people might invest a bunch early if that helps them resolve the uncertainty. Like maybe they observe sort slope of the like automation curve to see, like okay does this thing look like it's going to get full automation? And maybe it does, maybe it doesn't.
00:05:46
Speaker
So they might update on that. And that might just inform their investment decisions, which we can observe in Gates, which you couldn't observe in this older framework because all the decisions had been hard-coded. So that's the basic motivation.
00:05:57
Speaker
It's important to not oversell the model in the sense that it's not really meant to be a predictive model It's meant to be more like just as I think Tom Davidson's original model was not really meant to be predictive. And in fact, a lot of economics models are not really meant to be predictive.
00:06:12
Speaker
They are more meant to like show you what happens in this simulated world where certain mechanisms and are built into the world and certain ones might be left out.
00:06:24
Speaker
But I think a lot of people's thinking about AI can be incoherent in the sense that they might believe things that are actually like incompatible with each other. And this kind of model helps clarify that.
00:06:36
Speaker
ah what What would be examples of that? What would be examples of some incoherent beliefs that fall apart in this model? Sure. Like one example would be that you might have certain beliefs about ah like how fast software progress is, how fast our progress is, how much we're going to be able to scale up investments.
00:06:51
Speaker
Or for example, you might have beliefs about how fast we can double our stock of compute or our like a number of tabs or whatever without running into severe constraints, like serial time constraints or something.
00:07:02
Speaker
And you might not notice how aggressive the combination of those beliefs should actually make you about timelines. Like when you put all of that into um into a model, you might be surprised by how fast the automation actually happens.
00:07:17
Speaker
And like the main benefit ah of the model that I see is instead of you having to reason about this in your head, where you think, okay, like, let me take this into account and this other thing and this other thing, and then let me just think about what what would happen.
00:07:30
Speaker
You might have a bias in either direction when you're doing that. When you have the model, then ah you get to, like, it it just clarifies your thinking because either you have to disagree with the model's parameters,
00:07:42
Speaker
And then you can delve into that. Okay, like I disagree with this parameter value. that Then what evidence do we have about that particular parameter value? For example, one parameter that's important in the model is if we double our instantaneous research efforts, like if you have twice the number of researchers working on software R&D or hardware R&D, then how much faster would we be making progress at least like at that moment.
00:08:06
Speaker
And that is not clear. Like it's not necessarily that if you double your researcher stock, you make twice as fast progress because some things could just be difficult to paralyze. And then there's this penalty and there might also be like, there's another question about what if you spend twice as much instead of have like literally twice as much researchers, in which case you might also have to think about the price impact.
00:08:27
Speaker
Or like maybe

Parameters Influencing AI Predictions

00:08:28
Speaker
as you try spending more, the supply is kind of limited. So you're running up against like the constraints on supply of competent researchers, in which case you you might even get like less of an impact than you thought.
00:08:39
Speaker
And then you might notice that, oh, well, this parameter, these parameters actually look quite important because if you could just, you know, increase our spending by 10x and that's got us like 10x faster software progress, maybe we would be doing that.
00:08:54
Speaker
And there's a separate effect about this fishing out where as you make progress in any domain of R&D, making further progress might become harder. So then you might, okay, like what is the size data effect?
00:09:05
Speaker
So there are lots of things like this that might become relevant when you have a model like this and you do maybe a sensitive analysis on the model. And you notice that some parameters are just very important for explaining the model's predictions about timelines, but maybe otherwise don't matter that much.
00:09:20
Speaker
And then you can focus on the ones that do matter. Which parameters are most important in this model? Yeah, so I would say the assumptions about R&D matter quite a bit. just The ones that I just mentioned, they are quite important.
00:09:33
Speaker
Other than that, the degree of complementarity in the economy at large matters. Because what this means is, suppose that AI automates 50% of tasks in the economy instead of like 100%.
00:09:43
Speaker
right How much of a boost to output does that give you? And if it gives you a lot, then that actually speeds up the process because you have this expanded pool of outputs that you can reinvest into more chips and more R&D and stuff like that.
00:09:58
Speaker
And that gets the entire feedback loop to happen faster. And it can end up shortening timelines. Other than that, I would say the what we would call like adjustment costs or time scales, like the serial time constraints and building a new fab or maybe or also just for physical ordinary kinds of physical capital investments, like building, you know, power infrastructure and whatever, things like that.
00:10:22
Speaker
Like, we don't know what i extent there are just serial time constraints in that process that like, even if you try to do it, that do that much faster and our current level of technology or maybe slightly above it, we might not be able to do it that much faster. Like there's a question about the scaling curve or like imagine you spend twice as much to get something done faster, and like get a power plant to be built faster. Well, maybe that doesn't actually get you to the finish line that much faster than you might expect. Maybe there are just some limits there.
00:10:53
Speaker
And if there are limits, then those things ah can just become bottlenecks and sort of determine the overall pace at which you can continue scaling. might become more even more of an issue in the future when we will actually need to scale all things.
00:11:06
Speaker
Like right now, I think we're only really having to construct data centers and maybe we are having to worry about power. But in the future, especially

Challenges in AI Development and Predictions

00:11:13
Speaker
if you want this sort of feedback loop that goes from increased the economic output to but more AI developments, more scaling feedback loops work, then you need the economic output to actually be scaling quite rapidly ah if you want that to be effective.
00:11:29
Speaker
And that's just going to be bottlenecked by these things if we can't do them. So it's quite important that we get scale those things reasonably fast. Those parameters also important. if you If we look at the default parameters of the model, ah what are some predictions that that comes out of that? Here I'm talking about ai investments, economic growth, scale of compute, and so on.
00:11:51
Speaker
Yeah, I think in the default parameter settings, we get to... like Another important parameter, of course, that I didn't mention is like how much effective compute is required to get the full automation.
00:12:03
Speaker
like That's just always an important parameter because that sort of tells you how much distance... is there that you have to cross. And if that number is very high, then like you might not see much economic impact from AI at all.
00:12:16
Speaker
But we have these arguments. There's whole literature about trying to estimate this number that came out of even before Tom Davis's model. There's the BioAnchors report from H.A.I.C.O.T.R.A., which tries to estimate this number in a bunch of different ways.
00:12:27
Speaker
I think it's not super plausible that this number is much larger than what we allow for in the model. like i think maybe it's like 10 to the 40 flop or something, and effective flop. But I don't think it's like much larger than that because that ends up being in tension with the evidence we have from say evolution.
00:12:48
Speaker
The fact that human brains, again, human brains are effectively AGI, you could say. And they weren't, like, there's a certain amount of computation that went into producing the human brain as that like and as an artifact.
00:13:02
Speaker
And that is not, like, enormously more than TensorFlow, according to when you do back-to-end-well-bestimates. So it's just not plausible, I think, that the number is going to be much higher than that. Yeah, evolution is probably less efficient than what we what we are like the machine learning process that that we're engaged in.
00:13:20
Speaker
but That's right. I mean, there are some ways in which it might have advantages because it has a lot of serial time, which we don't have. And it has hundreds of millions of years, or why we usually need to think do things in a much more compressed timeframe.
00:13:33
Speaker
And it also has like it it has the ability to collect real-world data, which in a bunch of ways you might be born next by in the future. So, like, I don't want to, like, totally overstate the case, but I think something like this is probably true.
00:13:48
Speaker
And for the model, if you assume, like, I think in the mainline, it may be a 10 to the 36. Flop is the default parameter sitting in the model. If you work with that and make the default assumptions about everything else, maybe you get timelines of 10 or 15 years.
00:14:01
Speaker
Timelines to what? Timelines of full automation. And we get timelines to, like like, we begin seeing, like, quite a lot of economic impacts, even, like, three or four years out.
00:14:14
Speaker
So full automation would be automation of all tasks in 2025 or automation of all tasks that humans are capable of, ah period. I would say automation of all tasks humans are capable of. But basically any task that would bottleneck the, like like any task such that if not automated, it would it would be a bottleneck.
00:14:35
Speaker
on GDP. You mentioned that this model is not really about predictions, but one thing that's interesting here is that if you play around with the parameters and change them slightly, it's difficult to avoid a picture in which we get more and more automation and perhaps faster than than is then it's acknowledged in in general. how how How far can we bend the parameters and and still still see this picture of increased automation?
00:15:03
Speaker
Well, I mean, the easiest parameter to mess with is the training requirements. I mean, if you make that high, then yeah, you can avoid whatever you want. But like the question is, ah is that possible?
00:15:13
Speaker
And you know that's the question. and Another question is software R&D. So I think this is like a tricky domain where we're using this abstraction of effective flops.
00:15:25
Speaker
Basically, we're assuming that you like software progress just acts as sort of cuts the thought cost of reaching ah building the system that is capable of automation, or in fact, building any system, like any level of capability, you get that this factor of cost saving.
00:15:39
Speaker
I think that's not necessarily a realistic model in a bunch of different ways. I mean, one of them is that software progress is not uniform across scales like that. Maybe it works more, and maybe stronger for models that are trained on more compute compared less compute.
00:15:52
Speaker
But we don't know that there's no, I don't think there's like decisive evidence about that. I would be more worried that like the our software R&D estimates are very much calibrated. Like the return software R&D estimates are calibrated on domains like computer vision and language modeling at the core, something like the current level of capabilities.
00:16:11
Speaker
Maybe that means we are making progress towards this more general level of capabilities as well, that you might call AGI at the same rate. Maybe you're even going faster. ah there But maybe we're going slower.
00:16:25
Speaker
So it's possible that ah we are making some kind of, like, for example, we wouldn't have said that if you looked at chess engines in the year, like in the 1990s, we were making some software progress.
00:16:35
Speaker
You wouldn't have said that that software progress is actually getting us closer to AGI in a meaningful way, because that like that task just seems too narrow and too limited for us to use that. you know And maybe that's the case with AGI even today.
00:16:49
Speaker
Maybe the, you know, like that would be, the I think, the ah big argument is true because if you take away the software R&D channel, that costs you quite a lot of orders magnitude of effective flop.
00:17:01
Speaker
So it it has the same effect of as if you increase the training requirements for AGI because you lose the support from software R&D. The progress we're seeing with language models seem different to me than the progress we saw with chess.
00:17:15
Speaker
Language models, we

The Future of Compute Scaling and AI

00:17:16
Speaker
are making progress in a sense on generality too there. And do you think this indicates that we that we will see this kind of software feedback loop? Well, it's more plausible than it is for chess.
00:17:29
Speaker
I'm not sure like how much the reason about... like I'm not sure if the rates of progress are right, for example. like I'm not sure if the rate of progress towards an AGI system is actually in line with the rates we estimate for language modeling, which are quite fast.
00:17:44
Speaker
Usually they're on order of 3x per year or something like that. So like, is that right? I don't know. Like that rate couldn't have continued for a very long time because if you extrapolate that backwards, then you begin to reach these implausible conclusions.
00:17:59
Speaker
Like if you extrapolate that by 30, 40 years backwards, then the you get the conclusion that back in 1980 or 90 or whatever, we would have needed more compute to come up with the algorithm of the human brain than evolution used for like all evolution.
00:18:15
Speaker
And that just doesn't seem plausible. So you can get around that by saying that we only have this rate fast rate of progress since like 2012 or 2015 or something. Like basically only in the deep learning era.
00:18:27
Speaker
And that era is short enough that you don't have this implausible conclusion. ah But it does suggest something else that's interesting, because if the rate of progress has only lasted for a short time, and the short time period coincides with the period in which we've been rapidly scaling compute, it suggests a different thing that might become the obstacle, which is that ah maybe software progress is bottlenecked on using compute for experimentation.
00:18:52
Speaker
And that's why we are seeing this rapid of progress in software just as we're scaling compute very aggressively. And why we didn't see this in other domains, or we didn't do the same thing, or we just relied on Moore's law, effectively. Our budgets, for me, fairly fixed, and then we relied on Moore's law.
00:19:08
Speaker
Yeah, we we should expect progress to stall because we can't keep scaling compute at the rate that we're currently currently scaling compute. Unless HEI is... fairly soon. And then these other feed economic feedback loops can and allow us to continue it. But yeah, if assuming that doesn't happen, we're going to probably run out of runway to do that by like 2030 at current rates.
00:19:29
Speaker
And then compute scaling will have to slow down just because, I mean, the spending on clusters and training runs will become so exorbitant that we will not be able to scale it any further with our current pool of resources and willingness to invest.
00:19:43
Speaker
So I think it It is an important question in general, anything aside from anything to do with the gate, whether we reach this point before 2030 or not.
00:19:54
Speaker
I think there's a naive sense in which you should expect the arrival rates like of AGI per year to be higher or the emergence of new capabilities, novel capabilities, to be higher if we are going through more orders of magnitude of compute scaling.
00:20:09
Speaker
And so once scaling slows down, we might see advances in AI slow down. So it's important question of what do we get until like before that point. What's the critical threshold

AI R&D and Economic Sector Scaling

00:20:20
Speaker
we'll have to cross for progress to continue? Is it that ai will be we' have to be good enough at doing AI research at that point?
00:20:30
Speaker
Well, this is not a fact that's incorporating Gates because Gates just assumes that the production function of ideas is the same production function as for the rest of the economy. We could have used different ones, but then you you have different assumptions to make. Like that one also gets automated at different rate than the other one.
00:20:47
Speaker
And then it's unclear. um But like answering you the question, divorced from that particular model, I'm not sure if you can actually automate the process of AI R&D without also automating like much of the rest of the economy, at least the remote parts of the economy.
00:21:03
Speaker
Even if you do automate AR&D, it's not clear how much that gets you if you can't automate the scaling of compute. Like if you think the scaling of compute has been the key reason behind the key driver of the rapid software progress over the past 10 to 15 years, then, well, I mean, it wouldn't necessarily, you wouldn't necessarily be resolving that bottleneck by just scaling the researcher population much faster.
00:21:26
Speaker
There's a question about to what extent research efforts from humans or possibly AIs and compute useful experiments and maybe to some extent real world data and so forth as that matters are complements.
00:21:39
Speaker
I would assume that these are complements and if you just scale one of them by orders of magnitude while the other one just remains flat then you're just going to get bottlenecked and it's not going to actually like have as big of an impact as you might have thought.
00:21:51
Speaker
So I think it is really important for us to be able to like scale all of these things at once. And for that, we need the AI to be like start to produce a lot of economic value by end of decade if you don't want to run into the slowdown.
00:22:06
Speaker
Now, I think it's probably not going to happen in the sense that we are going to see a lot of economic value by conventions of enormous technology. So OpenAI forecasts $100 billion dollars per year of revenue by 2029.
00:22:18
Speaker
twenty twenty nine That's not a small number for one AI lab, but that's not anywhere near enough for the like for the economic feedback loops to become intense enough.
00:22:32
Speaker
So I think what what's probably going to happen is we we're going to get to the end of the decade and we're going to see a slowdown in progress. I think that's the most plausible outcome. And then we're going to have to get this like later. Maybe you're going to get this 10 years later or after that or maybe later. I think people have like disagree about that, but I think the general view among people at Epoch is that this is not going to happen in like the next five years.
00:22:53
Speaker
This is different from the leaders of the AI companies. You write about this view the ad that's that's been shared by Dario Amadei and Demis Hassabis and Sam Altman about how the value of AI will come through AI automating AI research and that this will happen fairly soon.
00:23:13
Speaker
No, no, They don't say that. they They are more, i mean, they probably do put a lot of emphasis on that channel, but they also put just generally emphasis on R&D. which is different thing.
00:23:24
Speaker
So they're saying that AI will mainly produce this value by automating the process of R&D in general. And then it's going to make like a lot of scientific contributions. A paradigmatic example here might be AlphaFold.
00:23:35
Speaker
Or you might look at things like AlphaProof for AlphaGeometry, which is not really doing science, but they tend to be very optimistic. I mean, definitely Dario, he is saying that, oh, like going happen in two years.
00:23:47
Speaker
And so I'm clearly saying something else. And you might ask, so why is he so optimistic? I think Sam has sometimes expressed like views that are like more mixed. He has said, oh, like we will have AGI and that it will not be that big of a deal.
00:24:02
Speaker
like you know The world will sort of continue. and yeah and So that's a very different view. And I think Demis is much more excited about applications of AI for science revealed in DeepMind's research priorities.
00:24:16
Speaker
If you look at the like Alpha Fold was a deep mine effort. Alpha Proof, Alpha Geometry, those are deep mine efforts. Alpha Go was a deep mine effort. So I think the, i mean, Alpha Go is obviously not science, but you can see that it was informed by this view that we we need to do this, like we need to do this reinforcement learning thing.
00:24:34
Speaker
And we need AI that can do complex reasoning and solve like complex problems instead of the sort of more pre-training and post-training and whatever, like the usual training pipeline that's common today.
00:24:46
Speaker
So I think there's a sense of which they less comfortable with that. But regardless of that, I think there's a general sense that AI for R&D is going to be a big deal. It's not just for AI automating its own R&D, which is also a big deal, I think, according to a lot of these people.
00:25:02
Speaker
But and just generally, like the how will AI actually impact the economy? And they tend to think in terms of, oh, it will just speed up R&D. I think Demis definitely has this view and Dario seems to ha i have this view.
00:25:14
Speaker
I'm less sure about Sam because he's, I think he has said things that you can interpret in different ways. But I think there's a sense in which we disagree with this. In fact, Gates is a model that shows that Gates doesn't have ah general purpose R&D in it.
00:25:30
Speaker
It has software R&D, it has R&D, it has kinds of R&D that are relevant to AI development, but it doesn't have what you would call like, it is not an endogenous growth model in the sense that it drove the economists to use the term, which would mean that productivity in the general economy increases as result of research effort or investment. That's just assumed to be either flat or maybe growing at some exogenous rate. That's fairly slow.
00:25:54
Speaker
But it still produces massive economic impact. So you might ask, well, how does that happen? Well, very simple. what What happens is that you have all of these AI workers And you that effectively gives you this massive scale up in your effective labor force by like many orders of magnitude.
00:26:12
Speaker
And right now, the if you look at the key bottlenecks in the world economy, the key inputs that drive the world economy, the most important one is labor. ah Labor is paid something like 60 or 65% of all of income.
00:26:24
Speaker
all of income as wages So if you just look at all of gross domestic products or gross world products, and you look at how much of that goes to paying labor, it's like 60 to 65%. So it's the dominant contribution.
00:26:38
Speaker
And of the remaining 30% is capital. And capital, you can also scale. You can reinvest your output into producing more factories and more machine tools and whatever. So that's also something you can scale.
00:26:50
Speaker
So right now, we can't scale labor. We can't reinvest our output to produce more workers. We just have to rely on human population growth. But if we are if we would be able to do this for AI workers, which would enable us sort of dramatically scale them in a way we can do for humans, and then you would end up in a world where you have maybe like a thousand AI workers for each human, let's say.
00:27:15
Speaker
And it's very easy to see how that will produce like a dramatic change in standards of living and product productivity and everything. like yeah So I think that's that's the key mechanism that's happening in Gates.
00:27:27
Speaker
And ah so in that respect, Gates is actually a conservative model because it's now modeling the fact that you would also get this general feedback loop from this all this increased capacity and you know thinking ability and whatever back into productivity.
00:27:42
Speaker
And then that would also increase productivity a bunch, which is not an effect that small in Gates. But the point is that that effect is actually like like most of the effect is actually coming from labor force because labor force, like that economic output is much more elastic to labor force scaling than it is to R&D.
00:28:00
Speaker
um and d And so that's just the dominant thing. And then R&D thing matters for long run growth. If you don't do any R&D, eventually you will sort run into the limits of your current technology.
00:28:11
Speaker
Or if you don't do any learning by doing or certain different discoveries, whatever. Like if productivity remains flat, eventually you run into the limits of your technology. But like actually most of the impact is, economic impact is not coming from the increase in productivity in in the sense of like total factor productivity, the way an economist would use the term.
00:28:31
Speaker
It's just coming from this massive scale up. Like one thing you can simply imagine is like, imagine everyone has like a household robot in their house, right? it's a very simple thing.
00:28:42
Speaker
ah Much simpler than the scale of things you might expect to happen. But already you can see like how convenient and how valuable and how much time like that kind of thing could save. Like a lot of people still have spent doing time, like, that sorry, spending time on things like housework, like a special amount of time.
00:28:59
Speaker
And then ah this might even allow things like maybe if you have these robots, then you don't even need as much space. Maybe you just have this robot that just like rearranges your house depending on whatever you want to do because you can now afford that.
00:29:12
Speaker
It's like a cheap thing. And in which case you might actually have like small amount of space in your house that's just enough because whenever you need to do something else, the robot just like it rearranges everything.
00:29:24
Speaker
There's a space where you do that. You don't have a dedicated space where you take And you can see like how that might already just be a massive impact just by itself. And then this is only a tiny piece of what you might expect from the scale.
00:29:35
Speaker
so So most of the economic growth and most of the value produced by AI will be of that sort, kind of broad automation of tasks throughout the economy, not primarily driven by automating R&D.
00:29:49
Speaker
Yeah. but So what I would say is It's a bit more subtle than that. For long-run growth, you need to scale all factors of production at once. So you need to scale capital, you need to scale labor, you need to scale productivity, right?
00:30:04
Speaker
And out of these, the factor that is most important is labor, and productivity and capital sort of modern matter like equally about as much. like maybe half as much each as labor or something. And jointly, they matter about as much.
00:30:17
Speaker
I think that's what comes out of the parameter estimates that are typical in the literature for these questions. if you But you do need to scale all of them at once if you want like very sustained long-run growth.
00:30:28
Speaker
And that's what I think will happen. So I don't like i think the we had a newsletter issue about this that I think people just misinterpreted and they didn't understand what we were saying. They thought that we were arguing that R&D not essential or like like new...
00:30:44
Speaker
innovation that increases productivity is not essential for long-run growth, which is not what we're seeing. It is essential.

AI's Economic Impact and Deployment

00:30:49
Speaker
But you're saying that other things are also essential, like scaling your labor for scaling your amount of capital. Those are also essential for long-run growth.
00:30:57
Speaker
And if either, if any one of those is missing, then you're in trouble. And you're saying that R&D is relatively less important if you compare it to things like capital scaling and labor scaling.
00:31:10
Speaker
Because R&D only accounts for a fraction of all productivity gains. Some productivity gains are just from things like learning by doing, ah which are not not not like explicit R&D. You can call that R&D.
00:31:21
Speaker
But in our economy, it's not done by people who are whose job title is like researcher or academic or whatever. like It's just done by normal people. And so it's hard to estimate to what extent people we are spending resources on that.
00:31:35
Speaker
And it might come from better management. It might come from just like better, you know, like you figure out better, like which inputs to use in your production process. You realize that if you get materials from here instead of there, that's sort of cheaper.
00:31:48
Speaker
ah You might benefit from economies of scale at the world level. And that might that would also increase productivity. and So all these effects are packed into productivity. And R&D is only accounting for a fraction of productivity growth, which itself is a fraction.
00:32:02
Speaker
of overall economic growth because a lot of it just comes from labor and capital scaling. Why is R&D not as valuable as or as as economically valuable as people assume?
00:32:14
Speaker
For long-run economic growth, you need to be scaling all factors of production at once. And total factor of productivity is just one factor of production ah in the way we think about it in growth economics.
00:32:26
Speaker
Labor and capital are also important factors of production. Labor is probably the most important. And then capital and Total factor productivity, if you look at the actual returns that you get when you're scaling them, they seem to be about equally important or something.
00:32:39
Speaker
If you do growth accounting, then what you find is that total factor of productivity growth is only a fraction of long-run growth. And the part of that that's driven by explicit R&D instead of things like learning by doing or economies of scale, like you're just more efficient at producing things because you operate on a larger scale.
00:33:00
Speaker
ah or ah better management, or better coordination, less necessitation resources. like It's just like like all of those things going through the fact productivity. So the part of it that's driven by explicit R&D is actually a fracture. So basically, the part of economic growth that's driven by R&D directly is like a fraction of a fraction of all economic growth.
00:33:22
Speaker
So that it just ends up being fairly fairly small but i'm compared to what people think, which is that almost all it is driven by R&D. you which is not true. But I think that there's a way in which people can misunderstand this claim.
00:33:35
Speaker
Where they would say, well, I mean, if you didn't have any R&D, then like we wouldn't have grown since whatever. We wouldn't have our current level of wealth and prosperity compared to 1700s or something.
00:33:46
Speaker
And yeah, that's true. But that that just means that R&D is essential for long-run growth. But a lot of other things are also essential for long-run growth. like Imagine that we had done R&D since 1700, but we didn't scale our capital stock at all.
00:34:00
Speaker
So, well, we would, again, be very poor now. so like that So that doesn't mean that just just because like multiple things can be essential for long, long growth. And labor is also essential because labor actually, and not only you get to like just directly get a benefit from the increased output, like output just a big labor, obviously.
00:34:21
Speaker
But on top of that, you get to use labor to drive R&D. Like, you know, you can use the workers, they're researchers, or you can use this big economy that the workers provide you with to make certain discoveries and so on.
00:34:33
Speaker
And while AI is just going to scale that a bunch as well, So that just seems like a much more relevant channel than narrowly deploying the AIs to do R&D.
00:34:44
Speaker
Like I think if you have the choice, then it's just much more economically valuable to deploy the AI to do broad labor tasks. And you would also do R&D, sure.
00:34:56
Speaker
But in the end, all all the factors of production would need to scale for you to be able to get long-run growth. But if you, again, did growth accounting, like you would see that most of it is coming from the scaling of labor and capital. So little is coming from R&D. So I wouldn't expect that trend to change, particularly with AI.
00:35:12
Speaker
Isn't research in machine learning, for example, some of the most economically valuable labor there is right now? So so isn't that exactly what you would go after automating first?
00:35:24
Speaker
Initially, yes. you would go after the things that are paid the most, assuming that your sort of capability distribution is flat. I mean, it's possible that you go after some... like that What really matters is how how much profit you'll be getting.
00:35:35
Speaker
So it's possible that even if even though a job is very valuable, it's actually very expensive to do it. So you don't end up deploying the AI do that. But if you assume that the AI is... cost a certain amount and it's just capable of doing any job in the economy, then you would, yeah, then you would deploy it at the highest compensated jobs.
00:35:52
Speaker
And in fact, this bottleneck is likely to matter at least early on because we actually don't have that much compute worldwide. So if you look at how much compute the human brain uses, it's about the same as 1H100.
00:36:06
Speaker
We have like 10 billion people in the world. So that's like 10 billion H100s. We don't have any anywhere or near that number. We have like maybe 10 million. Like we're off by theories of magnitude. So we have to do some amount of prioritization where we're like, okay, like we have the deploy to deploy the AIs in the places where they would most be most economically valuable.
00:36:28
Speaker
So this will be probably in the countries where it will be most economically valuable. And then even in these countries, we don't have that much supply of compute to be able to do it like freely. So it's still be targeting the occupations that have the highest compensation because that will just make economic sense from the point of view of the labs.
00:36:45
Speaker
And in fact, ah there is a sense in which even if so, even the social value of R&D, which does exist, and as I said, it's essential for long-run growth, even if it's like not literally the most important thing that drives it, it is still a key bottleneck that we need to be able to get past.
00:37:03
Speaker
Even that social value is mostly external. Which means that if you're an AI lab, you don't get most of the value from doing the R&D. I mean, you software R&D is a bit different because you internalize the value, at least, some extent.
00:37:16
Speaker
I mean, there are multiple competitors, so you don't totally internalize it, but, you know, some extent. But if you're doing, like, biology R&D or something, like basic research, then you internalize almost none of the value.
00:37:28
Speaker
So even if that is very socially valuable, you would probably still not do it very much just because you don't internalize the value. But you would just prioritize things that actually do get you, do make your money.
00:37:39
Speaker
so like And that that that would allow you to actually increase your valuation and raise more capital and make more money to reinvest more into AI and so on. One conclusion from from looking at at your view of broader automation in the economy is that AI will be more salient and it will be more diffuse, it'll be more gradual, and we might notice that AI has improved a lot before it improves even more.
00:38:07
Speaker
So from it from the perspective of someone who's concerned about the safety of these systems, this is a great thing. we might be able to and we We might have longer to respond and we might have longer to adapt as a society.
00:38:20
Speaker
is Is that how you see the the most plausible future? I mean, compared to world in which you have a software-only singularity inside a single data center, then yes. I think like in general, the broader the scale of the changes in deployment is going to be.
00:38:37
Speaker
the more you should expect that what happens is not driven by the idiosyncrasies or like management and decisions or whatever of any particular lab. And like the more it's just shaped by the general economic or social or if the political forces.
00:38:54
Speaker
like and There's a lot more ability for of those forces to operate and and control what's happening. And I agree that that's like the most possible outcome. Now, if you're worried about risk, that's not going to be necessarily be reassuring to you because you might think that those forces themselves are going to like make decisions that you don't like. I think it's a bit unclear if you are worried about the safety aspect of it, whether that's actually better or worse. Maybe you're the kind of person who just believes in one lab, like, you know,
00:39:20
Speaker
like anthropic is great and then like it's good if they like win and then so it's better if they have this lead and it's not actually that good if they have to sort of submit to markets and political pressures and like make their systems like the way other people want them to be but on the other if you if you're not particularly impressed by the yeah track record of ai safety inside the big labs then you might actually see this as like a welcome development.
00:39:46
Speaker
Maybe you think this kind of like broader oversight and disciplining forces are going to actually, but me i mean, people care a lot about safety in other industries. So ah especially when there's like a visible, like some kind of disaster that happens, some kind of safety failure that happens.
00:40:04
Speaker
Safety can just become like a very large priority. So maybe ah that kind of thing might happen. And in which cases might just be good from your point of view. i mean, this this is not ah a super developed view, but I do have a sense of if the pace of AI development and the pace of change in the world in general is incredibly high, we just we just won't have enough time to get get our hands around these questions and to understand them in depth and to adapt as a society. And so if we have a more gradual diffusion of of AI throughout society, that might be better.
00:40:40
Speaker
But it's it's but is it's by no means a given, I think. Yeah, I mean, the diffusion could be broad and it could also be fast. For various reasons, like you should expect diffusion to be slower. If it's going to be a broad thing, you should expect development to be somewhat slower because it's just going to be well-knocked at various points by needs to make things happen in the physical world and dealing with existing social political structures and whatever.
00:41:07
Speaker
And also you need to just accumulate vast amounts of resources, which you might not have to do in the world of a software-only singularity. But at the same time, I think software-only singularity is actually like like, that might just end up privileging the, as I said, idiosyncratic features and views and like practices of some individual lab, which you might just see as not a great outcome. I guess it's a much higher variance thing.
00:41:32
Speaker
In that world, it makes a big difference if OpenAI gets their first or if XAI gets their first or whatever, or DeepSeek gets their first. right ah But in a world it's like a broader thing, then that actually doesn't matter that much.
00:41:46
Speaker
What matters is what are the broader incentives and forces that are like, what are those forces pushing AI to be like? And because people are just going to develop their AI to sort of be in line with that as much as possible.
00:41:59
Speaker
Now, maybe alignment is hard, so they're going to fail anyway. But even the way they fail is likely to be like is actually be influenced by what they are trying to do in response to these forces.
00:42:13
Speaker
Makes sense. so So even in the more gradual scenario we're talking about, in which, say, it takes 20 years or so but for for AI to diffuse throughout the economy, if you zoom out from any any kind of historical perspective, that's still a very short amount of time.
00:42:29
Speaker
You do have some advice for how policymakers can deal with with the situation. And you talk about how public perception of AI is is likely to shift if we have a more gradual scenario. So policymakers perhaps shouldn't assume that the views ah held by the public today are the views that that matter in in the in a more gradual scenario. Well, definitely not.
00:42:53
Speaker
I think when AI starts changing things a lot, you should just expect people's use to change. Like when does this issue become salient? Like a good example is actually COVID.
00:43:04
Speaker
I mean, if you had asked before COVID people for their opinions about like vaccines and vaccine mandates and lockdowns and whatever they, I mean, it's not clear what they would have told you, but it would it would not have been very informative regardless of what they said, because it just wouldn't have been a salient issue. It wouldn't have been something they thought about.
00:43:22
Speaker
But once it actually happens and the issue gains like very... very significant material relevance, then suddenly you see the political circumcised, the merger, they may be very different from what you might we would have expected in advance.
00:43:35
Speaker
So I think you might just expect something like that when AI actually starts having a lot of impact. Yeah, yeah. What about from a personal perspective, planning for a more gradual kind of AI revolution?
00:43:49
Speaker
Yeah, um and from a personal perspective, I would just say that probably just like saving more and investing more is like a good bet. If this economic transformation goes through, then I expect the marginal utility of income to go up by quite a bit.
00:44:05
Speaker
like you don't if youre If you already have a certain amount of income today, you can't really get that much more by like having more. like It's just that there's

Future Economic Opportunities and Challenges

00:44:13
Speaker
a certain scale of our economy.
00:44:16
Speaker
And the level of our technology only allows to produce goods of like a certain, like a certain variety of goods. And there are a lot of goods or services that we can't effectively offer.
00:44:28
Speaker
Like we can't offer life extension. We can't offer like a lot of services that would improve people's health ah in ways like, like, you know, like those are just still very primitive. We can't offer you like very like rich and different, like,
00:44:43
Speaker
experiences that maybe in the future we will be able to offer you when we better like better understand the brain or something. like I think these are just like like small things. like that There are probably a lot of other other variety of goods that are going to come up. Personally, when I think about the things that I would be most interested in like buying in this future world, probably the developments in healthcare care are the things that are most what exciting to me. And I don't necessarily just mean the obvious things of like treating like like a life extension, incurring cancer and aging or whatever. I think like even aside from that, there are just lots of minor sort of inconveniences and so like condition medical conditions that don't like shorten your life, but that just inconvenience you sort on a daily basis.
00:45:27
Speaker
It's not like very serious, but ah you'll probably pay a lot of money to make it go away if like someone had had a way to do that. Oh, absolutely. i would I would pay a lot to to never have the common cold, for example, or to not experience hearing loss as I age, things like that, which are which are which we can cure now, but maybe we can cure in the future.
00:45:47
Speaker
That's right. So that's the kind of thing that I will looking forward to. That's the reason why I would also recommend, like there is a sort of this research done by people on like what should happen to interest rates if people expect AGI.
00:46:01
Speaker
And there's naive arguments you can make that, well, in the future, we're going to be a lot richer. So, yeah i mean, if you think about your own life, like if you don't take into account this massive technological progress or whatever, just think about your own life.
00:46:14
Speaker
If you're going much richer in the future, then you have less of a reason to save today. Because, well, you're going to be rich in future anyway, and it won't be necessary for you. But with AI, the situation is bit different.
00:46:25
Speaker
I think the big difference is that the marginal utility of income changes. like It's not like you have more money, but just spend on the same things. you have different options to spend money on different things.
00:46:38
Speaker
And that's a big thing. And also, you obviously expect your wages to fall, at least in some future. sure It's not clear if that happens immediately. It's not clear how fast that happens because there are these competing effects where you you do have to compete with the AI workers who sort of, because of diminishing returns to labor and capital, so are going to like jointly, like labor and capital jointly. They sort of have some degree of the machine first, even though it's not very strong.
00:47:03
Speaker
So if you scale that by many orders of magnitude, you should expect wages to fall. But at the same time, you will also be getting more productive. So how does a wash out is not clear. I think in the end, you should expect wages to fall, but that could be like, that could take a long time.
00:47:17
Speaker
It's not clear how fast that happens. what when you When you say wages, we should expect wages to fall. Is this a story about unemployment or is this a story about people literally having, doing the same work, but but getting paid less to do so?
00:47:30
Speaker
I mean, unemployment just means that you don't, i mean, you're looking for a job, but you don't like the offers that are on the market. So, I mean, um that is probably what would happen in in the sense that people's wages would fall to the point where it is no longer worth it.
00:47:47
Speaker
Like your wages could just fall below subsistence. And then, well, I mean, there's no point in working in that world. Like either you have some wealth or you rely on like various kinds of transfer programs or something, and which would be able to give you way more income than you would be able to get just by working.
00:48:04
Speaker
So that's the kind of thing I would expect. But i again, i think it is ah economically, it's a bit of a complex question. Like I think it's, much easier to predict that eventually wages are going to fall below subsistence than at least for biological humans, then it is it is to predict when that will happen.
00:48:23
Speaker
Just because there are these competing effects where as you scale up your caps on labor stock, you hit diminishing returns that lowers the marginal of value of those things. And so wages are just a marginal product of labor.
00:48:36
Speaker
So you're supposed to fall. But at the same time, that is multiplied by like there's an opposing effect because you're going to be getting increased productivity, 12-hour productivity. And that could well cancel the falling wages you otherwise expect from the emission returns at least up to a certain point.
00:48:53
Speaker
Maybe eventually doesn't work, but at least up to a certain point. So like this eventually is very, like it's just unclear timescale. It also depends obviously on how fast the world economy grows because the faster we grow, the faster it stops us we reach the wall next.
00:49:09
Speaker
So if the growth is very slow, then this could take like centuries maybe. But if it's very fast, it's like as fast as we think, then maybe it only takes decades. the The story I've heard from at least some economists is that as AI begins automating a bunch of tasks, the remaining tasks become more valuable. And so they would expect wages to rise for at least a while and then collapse after after you reach something like full automation.
00:49:39
Speaker
But even after you reach full automation, It's not clear if they're below where they used to be because of this total factor productivity effect. But yeah, otherwise I agree with that story. Like initially when you get like 50%, 60%, 70% automation, the wages of the people who are still employed are going to be much higher in real terms.
00:49:56
Speaker
And there's going to be a question about to what extent can people ah reallocate from the tasks that just got automated the tasks that are not yet automated. And then and if they can do that, then you don't necessarily see a lot of unemployment early on.
00:50:10
Speaker
But maybe they can't do that very effectively. Maybe it's not worth it to that. That's another question. Like if the automation is proceeding very quickly and you need like a year to acquire the skills that are going to make you effective at this new occupation that has not yet been automated, but that thing is going to be automated in a year anyway, then you you might not even invest the effort.
00:50:32
Speaker
you might just like sort of quote unquote give up because it's not going to be worth it, you know. Yeah, this is perhaps how I've begun thinking about programming and mathematics, that it's probably not worth it for, at least myself, it might be worth it for some people, but it's probably not worth it for myself to invest in acquiring these skills just because I expect models to outperform me and to, well, they already are, but I will not pack up before they're much better, would be my mainline expectations.
00:51:04
Speaker
Yeah, i think that's probably true. I think there's an extent to which like your skills are, to some extent, a complementary package. So and just because models are better at you at X,
00:51:15
Speaker
It doesn't mean the skill X has become not valuable for humans just because like it might be non-trivial to find the person who has like supposedly have these three skills X, Y, and z It might not be easy to find a person who like has the skills Y and Z and can actually use the AI to make up for their sort of shortcoming in X compared to just a person who has all three skills. And then they can do them as like a package and they benefit from these like synergies and complement charities.
00:51:41
Speaker
So I think that's true to a large extent. Like it's a very different, so like a fat someone having those skills just makes them more productive in other ways. So I would maybe not be as quite as pessimistic. I would still recommend that you probably like have enough time for learning these things to like pay off.
00:51:58
Speaker
But I think it will pay off in like a somewhat different way than you might expect. and and And explain again, why how is it that it pays off? Oh, it's just that if you want to hire someone,
00:52:09
Speaker
then it's like much less convenient to hire a person who is good at some other skills, but then is not good at math, but then it tries to compensate for that weakness by using LLMs and to remedy that.
00:52:23
Speaker
But I think like, even if LLMs actually become superhuman at math, that that's not still doesn't mean that ah you don't want to hire for people who are good at math. Just because that like there is a way in which the LLM skill is more difficult to integrate into your normal workflow compared to a person who would just has the same skills, maybe not as good, but it will be a package, part of a complementary package or with a lot other skills, many of which the LLM can't do at all right now.
00:52:53
Speaker
So for example, such a person will probably be more effective at operating the LLM and driving it towards like the results. Well, I think this is true right now. but If you don't know any math, then it's hard to notice, for example, when the LL sort of makes a mistake.
00:53:08
Speaker
Yeah, yeah. you You basically can't evaluate the outputs of the model if you if you don't know what it what it's doing. That's right. and it's just harder to

Extrapolating AI Capabilities

00:53:16
Speaker
drive it. like Or, for example, if you're using a language model for creative writing, it's hard to, like, you need to have good taste.
00:53:23
Speaker
in order to be able say, what is a good output, what is a bad output, and like do they want to change this or do want to keep this? And even though LMs are already in some ways like better at writing than most people are, the people who are actually good at they still can leverage it and use it as multiplier. They resample a bunch of times and they like recommend edits and they're like, okay, like this doesn't look quite as good, like can you redo this part? and ah So it makes them more productive and not less productive.
00:53:53
Speaker
When you think about finding out what AI will be able to do in the future, you recommend that we do not do not look at what AIs can currently do and extrapolate from there.
00:54:04
Speaker
At least perhaps tell us about the difference between extrapolating trends and then thinking from first principles about future AI capabilities. Right.
00:54:14
Speaker
So extrapolating trends will be like, imagine that you look at AI math capabilities in 2010 and then 2015 and then 2020. and then twenty twenty And then, well, it doesn't seem that much is happening. You're basically seeing a flat line and then you're predicting. Yeah.
00:54:27
Speaker
I mean, i just a couple of years ago, I was interviewing experts, AI experts on this podcast about whether large language models would ever be capable in math and and in programming and so on, because they were at that point quite lacking in those areas.
00:54:42
Speaker
But now that's obviously changed. So the point I was making is that extrapolation just doesn't work very well. Like, in fact, doesn't work well in a lot of other ways than that, like very basic ways. For example, in 2015, 2016, it was the era of supervised learning.
00:54:59
Speaker
The tasks that AI was used to do were things like image labeling, image segmentation, predicting who is going to pay back their car loans and who is not going to pay them back. And maybe self-driving cars were maybe the most exciting application because they people thought that you could leverage the existing computer vision technologies ah in order to to them.
00:55:19
Speaker
But so were predicting maybe that would be the most impactful application of AI in 10 years. It was going to be self-driving cars. Well, mean, 10 years later, that was, I mean, self-driving cars are getting rolled out slowly.
00:55:33
Speaker
We have them in the Bay Area. But like that's just such a small fraction of what actually happened. That's not the story. The story is the like LMs and reasoning models and things that have like some basic like agency capabilities and amazing document summarization and writing and like question answering.
00:55:55
Speaker
like Natural language processing was basically solved. I think it's difficult for people to today appreciate how difficult people in 2015 thought natural language processing would be solve.
00:56:08
Speaker
like It was just considered as a completely intractable problem or that there's there have been no progress or almost no progress and there's no hope. like It was just like and maybe something will happen, but like people had no idea however it would get solved.
00:56:21
Speaker
And then, well, it just got solved by, you know. ah So I think what happens is and what was you should expect is, well, the brain still has a bunch of capabilities that language models or current models do not, current AI systems.
00:56:33
Speaker
And while the brain is not like a magical artifact, so those capabilities are actually coming from somewhere. And with each order of magnitude of compute scaling, you first have more training computes. You are able to pass more data. Those are very obvious benefits, but also you get to do more experiments.
00:56:50
Speaker
And you are like you, you have that compute is an input into the process of software R&D. And so you discover ways in which you can like do the things that the brain is doing and you discover them sort of one by one. Maybe initially natural language processing, maybe later it's math and programming, maybe later it's going to be something else.
00:57:10
Speaker
I mean, the brain still has agency and long context and whatever planning capabilities that model seems to lack. But again, like those capabilities are not out of reach. Sometimes people say that, oh, like language models are never going to be able to do X or something like that.
00:57:26
Speaker
And I think that claim is just not very interesting. Because first of all, like our reasoning models, like language models. I mean, sometimes people say they aren't because they are part of this like big chain of thoughts, whatever, scaffolding. and like And they might say, well, when I was saying language models are not going to be able to do X, I was just talking about next token prediction or something.
00:57:48
Speaker
And I'm like, okay, well, but but that you see why that's not an interesting claim. Like in this case that you were making that claim, like it's not an interesting claim. So I think what you should expect is that, yes, there's going to be changes.
00:58:01
Speaker
I mean, I don't really expect in the next five years any fundamental changes, like any big changes in architecture, because a transformer has been surprisingly long-living. So I think we should just expect that to continue.
00:58:13
Speaker
But we have seen a lot of changes in how people train transformers. So it's the same architecture, but it's used in a very different ways. So we could just well continue see that.
00:58:25
Speaker
Like we first had the pre-training on Mexican prediction. I mean, in fact, before we had the pre-training, we had things like you first, v first of yes, translation, like translate things from English to French. That was the original application of a transformer.
00:58:38
Speaker
And then with things like bursts or whatever in 2018 from Google, the trading objections were like, we give you a sentence and then we mask some of the words and then you predict the hell about words. Okay.
00:58:50
Speaker
And then it became next over prediction with GPT paradigm. And that has stuck. Then we started doing post-training, like we started doing RLHF. Now we are doing RL, a different kind of RL, with reasoning.
00:59:02
Speaker
We have like these synthetic data pipelines where people generate a bunch of synthetic data and train the model on that. And we have long context fine-tuning, which is but also a new thing that has emerged maybe the past three years or something like that.
00:59:17
Speaker
So we've seen all these changes in how transformers are trained. So I think we should just continue to expect that. And probably that's what the new capabilities that are unlocked are going to be associated with. It's not going to be like people come up with this totally different architecture.
00:59:30
Speaker
It will be more likely to come up with different different way to train it on different data using a different kind of signal for RL, or maybe they they will find some totally different way of training it. ah But I think the architecture will continue be stable just because that's just continue to be stable.
00:59:44
Speaker
any Any guesses as to how we might train our Transformers so as to get a more agentic model or a model that's capable of of long-term planning, something like that? I mean, I think the obvious thing is to try to do the thing that led to the original AlphaGo, which is to collect a lot of human data.
01:00:02
Speaker
I'm not sure what this human data would look like, ah what form it would have to take. It's bit unclear. And then... Hope that fine in tuning on that gives you enough traction for some kind of reverse learning pipeline then have enough signal to get working.
01:00:19
Speaker
Like the problem with doing reverse learning naively is that the reward signals might be too sparse. So you might not get anywhere. You need the model to have a certain level of competence from the start for a very naive reverse learning approach to work.
01:00:31
Speaker
So that would that would be my first guess. And maybe that's still gonna work. I don't know. Like I think ah it's hard to say because you could have said that the reverse learning was not gonna work for a complex reason. And because if it if it would have worked, then someone would have done it already, but it turns out works.
01:00:47
Speaker
So maybe you're gonna see something similar. The main doubt I have about that is that the human data you collect might not be what you actually need. Like it might be more like, it might be more like you need the data about what's going on in their brain and not what they are doing externally.
01:01:04
Speaker
in their behavior.

AI Progress Benchmarks

01:01:05
Speaker
That's sort of like how it doesn't work to train a reasoning model if you just fine tune it on archive papers. Because that's not actually is relevant data. Relevant data is probably something that's going on in the head. You don't see that.
01:01:18
Speaker
That would be pretty difficult to acquire. we We don't know how to extract that knowledge. Perhaps perhaps is is this is this a game of paying experts to write down their thought process? You can imagine imagine like ah an pain paying an expert to write down an explicit chain of thought for for solving a problem, something like that.
01:01:37
Speaker
I don't know. I mean, I think that's something I would have tried, but I'm not how well it works. yeah Are there any disadvantages to trying to predict the AI capabilities by thinking from first principles? So we discussed the disadvantages of extrapolating trends, but what about thinking from first principles?
01:01:55
Speaker
It's much vaguer in the sense that you can't make as precise capabilities, sorry, predictions. But I'm not sure if that is a disadvantage, because maybe you should just be uncertain. Like, I think we just have a lot of uncertainty in any pre-emergence, like regime on any benchmark, where the benchmark line just looks flat.
01:02:11
Speaker
You just have a lot of uncertainty. I mean, there are methods of trying to deal with it, where you try to re-parameterize things, you try to look at things like mean log-pass rates instead of looking at accuracy.
01:02:21
Speaker
Like, aside you try or maybe you do partial scoring. You try to just look at what the AI is doing and try to sort qualitably assess, or maybe quantitably assess, how close is it getting to actually being able to do tasks.
01:02:32
Speaker
Yeah, like, do that for operator and like now and like booking a flight or something. and Even there, it looks kind of bad. ah So like how do we assess that? It's not clear.
01:02:43
Speaker
So I would just rely on more in that kind of case where the capability doesn't seem to exist at all. But I think like that's not really a problem with the methodology. It's more of a more of that you're just in a difficult situation.
01:02:55
Speaker
And then like no method would really do that well in predicting in that situation. Yeah. like Other than that, I think it can be more more judgment-based.
01:03:05
Speaker
Like people can just reach different conclusions based on the different priors they have. and But again, I think that's just an artifact of the fact that there isn't much evidence as much as we might want. So you just fall back on priors, which might be different.
01:03:19
Speaker
In fact, even for benchmark extrapolation, I think this comes in because even if there is a trend that you're going to extrapolate, you don't necessarily know if there's going to be a trend break. Like that's sort of what happened with math. We were seeing progress in math, but then there was a trend break caused by the reasoning models.
01:03:33
Speaker
I mean, it it seems to me that we lack benchmarks that saturate at an even pace, right? it it It always seems to be that there's very little progress and then we've reached some new kind of ah mode and progress. And then the benchmark is basically saturated within a year or so.
01:03:51
Speaker
This has happened with some of the programming benchmarks, some of the math the benchmarks, some of the kind of PhD level questions and so on. and but what what What explains this phenomenon? is and and And could you, what would a benchmark look like that would kind of saturate at an even pace?
01:04:11
Speaker
I mean, I think it's an ideal thing, but it's just hard in practice to construct because it's so hard to get a handle on. like Like, even if you think you have arranged the questions on like sort of this smooth and even distribution of difficulty, well, that doesn't mean that it's going to sa get saturated. Like it's going to get solved in like an even way. It might just be that AI just acquires a video suddenly and there's a new paradigm and then people's skills all of a sudden and then you make a of progress.
01:04:39
Speaker
So I think just very hard to anticipate that when you're designing venture. But I think... There is an answer to the question of why do benchmarks get saturated so quickly. I think it's because people only build a benchmark when solving the benchmark appears to be like in sight.
01:04:56
Speaker
So they don't make a benchmark that's completely impossible for an animal. We don't make a benchmark about, okay, can GPT-4 cook me a meal? Okay, yeah, that's not that's not a benchmark. That doesn't exist.
01:05:09
Speaker
Because it would be pointless to get zeros. And then if we keep getting zeros, we're making years. So that would be pointless. But but if we can do we can already imagine a benchmark about, okay, like can GPT-4 do like computer use tasks reasonably well?
01:05:24
Speaker
Like OS world is kind of like there's an external OS world are very narrow and specific. You can imagine like a more general benchmark about that. And well, the fact that you just sort of imagine that benchmark probably means that's going to happen.
01:05:35
Speaker
Like you're imagining it because it seems plausible to you that that might happen on some level. And also labs also pay for benchmarks. when they think the payoff is going to be fairly soon.
01:05:47
Speaker
So there's also an economic selection where like the benchmarks that get lab funding are the benchmarks that labs they expect are going to be useful in the near term. That is not just useful for like demos and making their models seem impressive, but also for internal evals.
01:06:04
Speaker
like they If every single model always gets zero on a benchmark, then you get like no useful internal signal for your decisions about what to do internally. But if you you need some variation in the model performance.
01:06:18
Speaker
And that means the models already have to be somewhat competent in the you see in the benchmarks. So I think that's a lot what's going on. You wouldn't

Complexity of AI Automation

01:06:26
Speaker
expect us to ever get to the kind of this dream benchmark where you would see this kind of smooth performance over time.
01:06:34
Speaker
No. I mean, maybe maybe that would happen maybe that would happen, but I think it would be like by accident. I think it's hard to design that in advance. Okay. Let's talk about Moravex Paradox. You have an and a great post on this where you kind of update Moravex Paradox. So tell us what is Moravex Paradox and what is what is like a modern update on how to explain it?
01:06:56
Speaker
Right. So Moravec's paradox is the observation that perception and sensory motor and skills are ah much more computationally expensive and difficult than skills that typically seem to us to be associative intelligence, like playing chess whatever.
01:07:14
Speaker
It's a very old observation. The basic explanation... that Moravec himself gave for the paradox is that the capabilities that seem difficult to us are capabilities that are new and not very well optimized in us.
01:07:30
Speaker
So playing chess, well, humans were not selected for their ability to play chess. So ah we can have vast variations in the efficiency with which we play chess. For example, if you look at the point at which AI systems computers reached sort of medium strength, like amateur strength in chess, like 1,200, 1,300 ELO, that was around... And then you... and Like, we were using a certain amount of compute math.
01:07:55
Speaker
Then you look at how much... How many orders of magnitude of scaling we had to do until we reached the level of the world champion, which is around 2,800 ELO. ah We had to do around...
01:08:07
Speaker
five orders of magnitude of physical compute scaling and also a bunch of software progress. So if you look at it naively, that says that the human range in chess playing ability is like like at least but more than five orders of magnitude wide because that's even median to a right tail. like And then there's left tail, which is maybe even worse.
01:08:26
Speaker
Maybe some orders of magnitude worse. And then you're saying, well, that doesn't like imagine you had like six orders of magnitude of variation between people and how fast they could run.
01:08:38
Speaker
that That would be very strange. like That would not be something you would expect. I mean, we say some people run faster than others. Yeah, they we mean like they run like twice as fast or three times as fast.
01:08:49
Speaker
We don't mean they run like a million times as fast. So in physical tasks, if you have a bunch of able-bodied people, you you you don't expect a lot of divergence in performance, say, between ah me loading a dishwasher or you loading a dishwasher, which is a task that's incredibly difficult for, I mean, basically impossible for current robots and AIs to solve.
01:09:11
Speaker
But in something like advanced physics research, programming, and so on, you see an incredible difference in and in performance between people.
01:09:22
Speaker
Because the more theoretical, the more mathematical, and the more kind of technical tasks, um it's something that we didn't evolve to do, and it's something that we've only been doing for college thousands or hundreds of years.
01:09:37
Speaker
That's right. And the point is that if... Doing a task was very valuable in the evolutionary environment. And it has been valuable for a very long time.
01:09:47
Speaker
That's also important. Then there's just been a lot of selection pressure applied. And the way in which those capabilities are instantiated or implemented in the human brain.
01:09:59
Speaker
the algorithms that are used to do those things are very sophisticated. Like you don't get a lot of like trained parameters or whatever from evolution just because the DNA just doesn't contain that much information.
01:10:11
Speaker
So DNA only contains enough information to really code for like an algorithm and like high level details. It doesn't contain enough information to like, you can't view evolution as analogous to a training run.
01:10:22
Speaker
like And then the results go into your brain because there isn't that much information that you actually get. in the genome. But you can view it as as analogous to as the algorithm in architecture search.
01:10:33
Speaker
And in that case, the algorithm and then the algorithms are very separate. like The brain has different regions that are responsible for different things. like It has a cerebellum, which has its own responsibilities that are separate from the rest of the brain.
01:10:45
Speaker
And it has specific areas they devoted to ah visual processing and audio processing. sort general purpose thinking task, which happens more in the prefrontal cortex.
01:10:58
Speaker
It has specific areas devoted language understanding and parsing. So these are all very regional and probably they're just all done by different kinds of algorithms. like Because they are regional and they're the same areas in everyone.
01:11:11
Speaker
So it has to be sort of coded in the genome somewhere. that That's how it has to happen. But the capabilities that just very new, well, those those haven't been optimized very much. So they're probably just being done by like a fairly crappy algorithm.
01:11:23
Speaker
And in that case, because it's not been optimized, there's the law of there's a lot of room to improve the efficiency with which you make use of the resources of the brain.
01:11:34
Speaker
Like it's power usage and it's computational ah power to just get better performance on the task. So chess is like this. And in these tasks, you just see a lot of variation, like and just training to do tasks more can improve your performance dramatically.
01:11:48
Speaker
There's a lot of genetic variation. in performance, maybe because of underlying skills, maybe just for reasons they don't understand. You might have better memory. Like some people just have better memory than other people. You might have ah more of an ability to visualize and calculate things that might matter in chess.
01:12:03
Speaker
But yeah, so if you just add, stack all those up, you end up with this five, six, whatever, order of magnitude. Efficiency would be if people check play chess. But that means chess is an easy skill.
01:12:15
Speaker
Like you should... you should I mean, it's not necessarily easy, but but like there's no reason to expect that humans are particularly good at it. Humans are actually probably just bad at it. And so ah computer can just beat us. Well, that's not a surprise because where we haven't been, like, we haven't been, quote-unquote, selected to be good at that task.
01:12:34
Speaker
But for the task that we have been good at, selected to be good at. ah So things like complex object manipulation, which appears to be something that there's a very little variation in between different humans, so suggests that capability was fairly optimized.
01:12:49
Speaker
Or sensory motor tasks, more generally. ah Like, obviously, it's very important to be able to do visual processing. That's just a very important skill. Even animals are very good at visual processing in many different cases.
01:13:01
Speaker
In fact, some animals are better than humans at very specific tasks. Like, cats have a faster visual reaction time and things like that. Well, so that skill's very old. Yeah, so so so basically, the the tasks that intuitively feel very difficult for us are not the tasks that are that are that that are actually difficult and in some sense.
01:13:20
Speaker
If you ask me to memorize a hundred or million different digits, that's something I won't be able to do no matter how hard I train, but it's utterly trivial for for computer. if youre If you ask a current robot to pour water into glass and drink from it and then put the glass on the table. That's something that almost every human can do, but basically no robots can do yet.
01:13:42
Speaker
So what's interesting here is is that our intuitions about which tasks are difficult are kind of turned turn upside down. And so what does this tell us about which tasks or jobs might be most vulnerable to automation?
01:13:58
Speaker
A job having a physical component is a reason to expect it to be automated later. A job requiring a lot of agency and complex planning over long contacts, long horizons, where the feedback loops are kind of unclear.
01:14:13
Speaker
think that's also a reason i expect to expect it be automated later. At the same time, a job having vast differences in human performance is a reason to expect it to be automated early. But I think sometimes these things can be get mixed up in the sense that there might be a job where which depends on complementary skills.
01:14:30
Speaker
And some of those skills are things that are like easy for AIs to do. But then some other skills might be hard. So for example, you might need, so to be a math researcher, you might need ah bunch of math like knowledge and skill, but you might also need some kind of complex agency, which is harder for AIs.
01:14:49
Speaker
So compared to a very trivial job but where where the context lengths are very short, like a very good example might be like customer support. Where like, The context lines are fairly short.
01:15:02
Speaker
Customer support is maybe something we might be social interaction is something we might be optimized for more than we are for math. But at same time, the context length of like that task is very short.
01:15:13
Speaker
Yeah, it's it's it's quite modular. It might be at a two, five minute to call and and then there might be a script for it and so on. That's right. So I think it's difficult to do this analysis at the level of jobs just because a lot of jobs mix a lot of different skills together.
01:15:29
Speaker
But you can do the same kind of analysis like as in the paper, like this paper, GPTs are GPTs. don't know you if you know this paper. Maybe explain to our listeners the two different meanings of GPT in that title. So the first GPT is just Generative Pre-trained Transformer. It's like, you know, the the name of the model family of GPT-2, GPT-3, GPT-4.
01:15:49
Speaker
And the second one is General Purpose Technology. So they're saying like the basic thing they do in the paper is they look at how exposed different occupations in the economy are to being automated by LMS, partially automated by LMS for the most part.
01:16:04
Speaker
And they try to quantify, okay, like what are the things about an occupation that predicts whether it will be easy or hard to automate? I think you can do the same kind of analysis where you can say like something requiring a lot of agents, a lot of complex planning, a lot of sort of Creativity and adaptation to like new circumstances is the reason to expect it to ah get automated later.
01:16:27
Speaker
A task requiring advanced sensory motor skills is the reason to get automated automation later. And on the flip side, a task being like sort of modular and well-contained and not sort of like being the kind of thing that you can just outsource to contractor very easily.
01:16:45
Speaker
And the contractor works on it for like two days. And then they get back to you. I think that's the kind of thing that's actually easy automate. Because that

AI in Software Engineering and Management

01:16:53
Speaker
is a very circumcised task. It means you don't actually need lot of complex planning to tasks.
01:16:58
Speaker
You don't need integration into a very large context. ah You don't need like to be on board on a company and like be familiar with how did they do things. and like None of that is necessary.
01:17:09
Speaker
There's packaged narrow tasks. it's handed out someone, and then they do it. And that's kind of task that I think, that's kind of labor tasks, which I think you will b see the AI as ultimate.
01:17:21
Speaker
I think we're already seeing this right now. Like, for example, you can imagine the kind of request you send, the deep research being this kind of thing. Like, you just hire someone, and then you tell them, okay, like, write me a report on such and such.
01:17:33
Speaker
And then they write you a report and get back to you. like that's the thing you can easily imagine outsourcing to a contractor. While I think the jobs that you would in the current economy struggle to outsource a contractor because it's just extremely annoying and it's very tied up with your own business practice and your context. And it's just an extremely long project. in In that case, I think AI will struggle much more to do it. I mean, to be clear, like eventually having think all of these are going to get solved.
01:18:02
Speaker
But if you're talking about the relative order which things will get automated, I think this is roughly a prediction I would make. Yeah. You also write about how software engineering and math research is is kind of vulnerable to automation, or at least part of it will be.
01:18:19
Speaker
And did this this is for the reasons we've already discussed. We have a bunch of data, and this is something that's very difficult for most humans to do. There's a wide range of performance, and so you would expect automation of particularly those tasks, or specifically those tasks.
01:18:35
Speaker
Before the others, yes, but still, like it's not entirely trivial to automate software engineering because like in practice in practice, software engineering isnt is not these short snippets of tasks.
01:18:49
Speaker
it Usually, it's a longer context thing than that. Usually, you get vague instructions and you need to be familiar with what the company is doing to be able to do your job properly. And that is something that's going to be, think, trickier.
01:19:00
Speaker
It's going to be more difficult. But but it doesn't it also, I mean, some software engineering ah might be kind of more modular and... So that part should get automated. Yeah, more outsourcible and so on. But but the the parts that require large contexts or kind of context from specific companies would be automated less in your view.
01:19:20
Speaker
Yep. One interesting point you also made is that you expect high-level management to also be automatable. that this This one surprised me a bit because, I mean, management is mostly about people. it's it's It seems like something we are, humans are quite good at, and most humans are quite good at, like, social interaction, talking, and so on.
01:19:40
Speaker
But LMs are also good at social interaction. Yeah, in in some sense, yeah, that's true. So the important thing to keep mind is that social interaction is a skill that faced a lot of selection pressure, but it is not a very old skill.
01:19:54
Speaker
So you should expect it to be quite a bit easier than the sensor motor tasks for that reason. Like this is like much newer. And management is, again, like it's a complimentary thing. I don't say actually that management is going to get automated because I think management does have this, out but I say that a lot of what people do in the management is probably going to get automated.
01:20:14
Speaker
you're going to have managers who... experience like a big increase in productivity. And maybe also we'll start selecting managers on somewhat different skills because some of the things we today expect them to do good at, they will no longer need to be good at to the same extent.
01:20:28
Speaker
But the reason management is like such a new skill is because if you're doing it at an unprecedented scale, Like the management that is even like an old skill is like managing like a group of five people to like either forage or something.
01:20:45
Speaker
Like that's the skill that's old. Like managing a company of 100,000 people spread across the world. That's not an old skill. That's not something we evolved to do at all. That's right.
01:20:55
Speaker
and and And people differ like so enormously in their abilities to do management. Like if you look at, like in fact, a typical person, If you put them in charge of a company, like of any meaningful scale, they will probably be destructive. They would add negative values to the company. They would sort of interfere with what the company is doing in harmful ways and they will sort of mess things up. I think it's a very natural take.
01:21:20
Speaker
And then you have this right tail of people who are extremely effective at doing it. And the question is, okay, like, like if there's such vast variation, again, that's just a generic reason to expect that task, there is at least a big component of that task that is easy for AIs.
01:21:36
Speaker
And that doesn't mean the entire task is easy for AIs. Because, again, management does require this, like, long context understanding and things like that, which I think are loading on human skills that are older.
01:21:48
Speaker
ah But you would just try to outsource. but Like, you would try to get the AI to automate, right? as much of your work as you can, especially the parts that rely on like these newer skills that you don't actually want like you're not actually that good at.
01:22:03
Speaker
And that might improve the quality. That's sort of the thing I expect. i think in In fact, even if AIs would be capable of substituting for human managers, so I think we would probably stick with human managers for at least a while.
01:22:16
Speaker
for For social or political reasons? Legal reasons as well. ah you You want someone to be liable for decisions. And it's easier if it's human compared to if it's an AI.
01:22:27
Speaker
But yeah, also for social reasons. What about data? and We don't really have good data on on management, right? so So that might also slow things down here. But much humans also don't have good data.
01:22:39
Speaker
Yeah, that's also true. So like that's the... I mean, AI's remain less data efficient than humans. So we need more data if you want to teach something to an AI.
01:22:50
Speaker
But at the same time, like if humans are bad enough, then like maybe you don't even need that much data to do like a better job than humans. The question is, how would you and find out if this is the case? Because management is not something you can like have a good benchmark for.
01:23:04
Speaker
ah You need to actually convince people to like show up and like work for you. and I mean, it just seems difficult ah to convince people that this is actually the case. It's kind capability that might emerge and then people might only realize that AI is not just capability significantly later.
01:23:19
Speaker
But I think like people will be trying to find ways to just integrate AI into their workflow. And probably that's the channel, that's the way which it's going to happen. Like it's not going to happen them by like AI labs saying, okay, we're going to replace like management.
01:23:35
Speaker
It's going to be more like managers themselves, especially ones who are like more familiar with the technology are just going to start using it more. And they're going to like notice that it just benefits them.
01:23:45
Speaker
and And like it one very simple thing is that a manager just has to read a lot of stuff. potentially. And but you can already imagine a like a very competent AI just speeding that up enormously.
01:23:58
Speaker
Like, you're like, okay, like what is this? Just find this information in this document and and dont like check this for that. They're very simple things, which to an AI are very simple, but to a human would just be time consuming.
01:24:11
Speaker
that You can just outsource to an AI. That could be one.

Future AI Models and Economic Prospects

01:24:15
Speaker
What do you expect for AI this year? So in 2025? Yeah, I still expect progress to be faster this year than last year, mostly because of the compute scaling we're going to be seeing this year. And i think the pre-training scaling is only part of it.
01:24:32
Speaker
And we're already starting to see that. But it's also the fact that people are going to be scaling up the reasoning thing. I also expect possibly we're just going to see something new, ah like similar reasoning models. think there's a decent chance of that by end of year, I can't predict what it's going to be.
01:24:45
Speaker
ah But I expect like each world's magnitude of scaling. I expect there's like a good chance you unlock something new. So I sort of expect, and we're seeing that roughly the amount of scaling this year compared to that last year.
01:24:58
Speaker
Because all the big clusters, the Nation 100s, are coming online. I expect ah math progress to continue. Maybe like the our own Mathematonimark, Frontier Math, I think we might see 75% by end of the year.
01:25:10
Speaker
by end of the year on that benchmark. I think SWE bench is going get saturated. Explain to our listeners how kind of wild these statements are. What is Frontier Math? What is SWE bench? Just just to give some context on, on you know, these are these are fairly hard benchmarks and you would you would expect them to get saturated.
01:25:28
Speaker
Yeah. Frontier Math is the benchmark that we created in collaboration with OpenAI. It consists of Math problems that are like at least like an order of magnitude harder than you had in previous math benchmarks and maybe even more.
01:25:41
Speaker
yeah like The harder questions are maybe even towards an answer to harder in terms of like how long it would take an expert human to solve them. They also require like specialized domain knowledge for the harder questions.
01:25:52
Speaker
Maybe a way to put it is like maybe 25% of the benchmark you can do like in... like A person who has a competition background in math could do like in 30 minutes. Maybe the median question is likely you would take ah an hour or two hours.
01:26:07
Speaker
And then the hardest questions would be even longer and they would also require more band background. So the current state-of-the-art score in our own evaluations is something like 11%. having person OpenAI has their own internal evaluations where they report much higher scores.
01:26:23
Speaker
But those are for models that have not been released to the public. It's either models like O3, which have just not been released at all, or it is which... OpenAI said got 25%. Or it's models like O3 Mini High, but it evaluates it probably with much larger inference budgets and some kind of custom internal scaffolding that OpenAI doesn't actually expose to external users, we which with which they claim to get around 30%.
01:26:46
Speaker
In our own evals, were which where we just evaluate the models as they exist, like if we just subscribe to OpenAI and you go their API or something, and then you just pick O3 Mini High,
01:26:59
Speaker
And then we have our own custom prompting and so on, and we to give them more access to tool use. But I think our our way of giving access to tool use is probably not as sophisticated and not as performance as OpenAIs is.
01:27:10
Speaker
So yeah, like that that's what frontier math is. Yeah, like I think 75% on that benchmark is quite an impressive so score. But I don't think it's like unreasonable. I think it's in line with other people I expect as well.
01:27:21
Speaker
It's just because we like reasoning RL is such a new paradigm, and it's still ah nowhere near are being saturated. Unlike like in the sense that people are still spending quite a bit less on it, but maybe 10x or more or less on it than they are on pre-training.
01:27:37
Speaker
So there is still lot of room for them to scale this up. there's room for them to do it on more performance-based models. Like they could do a reasoning thing based on 4.5. Like we could see that GP5 comes out and it's going to be this mixture of many different things. And maybe one thing is that it has this reasoning capability on top of like a much bigger base model and that gives it much better performance. I would not be surprised by that.
01:27:58
Speaker
Sweebench is benchmark designed on the ability of AI systems to resolve actual GitHub issues. I think the main thing that separates Sweebench I mean, there are some other things as well, but it it ties into what I said earlier about context windows.
01:28:14
Speaker
Like most issues, you can resolve them without having like that much context. Like you do need to know something about the project, but it's not like you need to be onboarded on a company and then spend six months at the company and then interpret some vague instructions. It's usually they a lot clearer than that.
01:28:32
Speaker
So that's why i expect it get like solved. I think we've already seen a lot of progress on Sweebench. So right now, the state of the art on Sweebench verified, which is like a split but which has sort of been human validated to be like high quality.
01:28:49
Speaker
ah Not more difficult necessarily, but i mean, just they're like validated so that it doesn't have errors. The current state of the art is like 65%. So I expect like 90% or something just gets saturated by end of year. I think that's like a very reasonable prediction if you look at the pace of improvement we've seen. and I expect OpenAI's revenue estimates are probably going to be just about accurate.
01:29:07
Speaker
So OpenAI estimates on the order of like $10 billion dollars of revenue, maybe $11 billion. source I don't remember the exact number, but like this year, so in the year 2025. And i think that's probably about right. Like maybe they get $12 billion.
01:29:20
Speaker
That seems right. So that would be three times more than what they got last year. And then last year was three times more than the year before that. So it was like $1.2 billion Then it was like $4 billion in 2024.
01:29:33
Speaker
So now we're expecting like $12 billion. dollars That's fantastic growth. You mentioned that, you of course, you can't predict what the next kind of breakthrough would be, the breakthrough in the in the same ah vein as as those reasoning models.
01:29:46
Speaker
do you want to Do you want to speculate on that? do you have some some ideas? The most plausible to me looks like some kind of long context breakthrough. i mean, I'm hoping that happens, but I'm also kind of skeptical because it's been this thing where people have been saying it's going to happen for a very long time and then it keeps not happening.
01:30:02
Speaker
What would it mean to have very long context? what What would that allow us to do? it would and Currently, the models just like their performance degrades quite a lot if you try to use them in any kind of longer context. It's not just about the context window limits that you see.
01:30:18
Speaker
Like, okay, we can only handle 200,000 tokens in one context. Like, that's fake. They can't handle 200,000 tokens without loss of performance. And the benchmarks they have for testing long-context performance are very narrow. It's things like needs and haste-like evaluations, where you just have find some specific thing in a long passage of text.
01:30:37
Speaker
But the actual thing that matters more is, like, if I give you a long document, then how much do you... hallucinate about it? How much do you understand if like there are different parts of it that are interacting with each other?
01:30:49
Speaker
If you just talk with a language model over, like for example, you can have it write a story for you. That's a good way to test this capability. you You will notice that even um after you get past like tens of thousands of tokens, low tens of thousands, its coherence and ah ability to like understand what's going on, everybody to recall details and the inference is based on what has been written in the past context, starts getting worse.
01:31:17
Speaker
I don't know like how this will be solved. Maybe it's just going to be solved by fine-tuning on more long context things. like I'm not sure what is going to be the key thing, but it just seems like a big issue because like long context is required for unlocking a lot of economically valuable applications. Even 200K context is actually not that long. That's not enough, for example, to fits most like fit a good chunk of Harry Potter books or something.
01:31:41
Speaker
You can't fit a single book into a context. And so so then that just makes it much more annoying to use if these models had tens of millions of tokens of context. And they actually like could do that without noticeable loss in performance.
01:31:56
Speaker
And maybe also not like a steep increase in price, which is what that you might get today if you try to do that because of the ah complexity of attention like that i think would just be valuable and would unlock a lot of other economically valuable applications because once you have a model that's like really performance at long context you might be able to train it much better to use that long context for other things like reasoning so i just like that just seems like a very relevant bottleneck to me if they if they could just find a way to unlock that so i think that would be very valuable
01:32:30
Speaker
and Another possibility is that people make some kind of agency gets quite a bit better. But I'm not sure. Would those two be related? So context, kind of having a large context might make you a better agent?
01:32:42
Speaker
i mean, I think, yeah yeah, having a large context is necessary to be a good agent. It's not sufficient, but... Yeah, as a last question here, maybe you could talk about the kind of extremes of what you would expect on timelines. So we've talked about how you're you're more skeptical on about five-year timelines than than the leaders of the AI companies.
01:33:04
Speaker
And maybe your median scenario is something like we will see increased automation over the next 20 years. What kind of evidence would you need to see in order to be convinced that, say, we'll we'll get AGI by 2030?
01:33:16
Speaker
And in in the other direction, what kind of evidence would you need to see and to say, okay, we're not getting adi until 2100? For think it's tricky.
01:33:27
Speaker
i think probably the i will you like a long time with like scaling basically not working very well. Maybe like five years or more, maybe 10 years.
01:33:39
Speaker
And then I would probably update towards that. Before 2030, before twenty thirty I think if I saw like a lot of... First of all, I would want to see breakthroughs in the capabilities that I mentioned before.
01:33:51
Speaker
ah Like I want to see some kind of clear trend that I can extrapolate. Like these things are clearly getting much better. These things I wish they were always been bad at. Like agency, long contacts, sensory motor skills, complex planning and execution, that kind of thing.
01:34:06
Speaker
I'm not sure. Like if if I saw that happening, then I would update. But I'm not sure if I would off update all the way to before 2030. that would depend on like That would depend on the rate of issues happening.
01:34:17
Speaker
I mean, definitely if I suddenly saw like a breakthrough in robotics. That would be big deal. That would be a big deal, yeah. But do I expect it to happen? No, I don't expect a big breakthrough robotics for the end of 2030.
01:34:29
Speaker
Fantastic. Well, thanks for chatting with me. It's it's been great.