00:00:00
00:00:01
Tamay Besiroglu on AI in 2030: Scaling, Automation, and AI Agents image

Tamay Besiroglu on AI in 2030: Scaling, Automation, and AI Agents

Future of Life Institute Podcast
Avatar
2.7k Plays1 day ago

Tamay Besiroglu joins the podcast to discuss scaling, AI capabilities in 2030, breakthroughs in AI agents and planning, automating work, the uncertainties of investing in AI, and scaling laws for inference-time compute. Here's the report we discuss in the episode:  

https://epochai.org/blog/can-ai-scaling-continue-through-2030  

Timestamps: 

00:00 How important is scaling?  

08:03 How capable will AIs be in 2030?  

18:33 AI agents, reasoning, and planning 

23:39 Automating coding and mathematics  

31:26 Uncertainty about investing in AI 

40:34 Gap between investment and returns  

45:30 Compute, software and data 

51:54 Inference-time compute 

01:08:49 Returns to software R&D  

01:19:22 Limits to expanding compute

Recommended
Transcript
00:00:00
Speaker
Welcome to the Future of Life Institute podcast. My name is Gus Docker, and I'm here with Tamay Basuro-Glu, who is the Associate Director of Epoch AI. Welcome Tamay. Great to be here. Maybe you start out by telling us a bit about Epoch.
00:00:15
Speaker
Sure. So Epoch AI is a research organization that works to do research, build models and data sets to inform people in policy and in government, civil society about AI and its impacts. And i for for people who haven't seen Epoch's charts and graphs, I highly recommend going to their website and checking it out. it's It's really useful stuff and it summarizes a lot of research visually.
00:00:44
Speaker
So highly recommended. So you have a you're the co-author of a new report on scaling and whether scaling can continue ah's through 2030. So let's start there. Let's perhaps tell us about how important scaling is for ah the performance we see in current AI models.
00:01:04
Speaker
Yeah, so scaling I think has been perhaps the kind of central driver of the performance gains that we've seen in AI in the past decades on the kind of depth and breadth of the capabilities that AI systems kind of seem to be improving a lot. And so you might look at kind of key modern benchmarks, like MOU or GPQA, and for those, it seems quite clear that maybe the majority of the variation in performance is explained just purely by the amount of compute used in training. We have done some research about ah trying to estimate this more rigorously, and so I co-authored a paper that was accepted yesterday in some NeurIPS about
00:01:55
Speaker
the returns to compute scaling and and the contributions from algorithmic innovations in language modeling. And there we find that about two thirds of the kind of variation in performance is due to scaling of pre-training compute. And this is performance in the kind of language modeling task next to it.
00:02:19
Speaker
For Vision, we have done a similar work where, again, we find this kind of equal match of the contributions, this kind of roughly equal match contribution of contributions of scaling, of of training compute, and kind of non-scaling innovations. and so this architecture and perhaps other things around data quality. And this is actually a fairly common picture in computer science more broadly. So there are papers about the role of algorithmic innovations and compute scaling.
00:02:54
Speaker
in domains like computer chess, like linear integer programs, SAT solvers. And the kind of common picture there is that ah compute scaling is quite central ah to the performance improvements that we've seen in the past 30 years or more. And you know algorithmic innovations are also important and perhaps of equal importance to to compute scaling.
00:03:21
Speaker
I think the algorithmic innovations in post-training recently have been quite impressive, and perhaps that has surpassed, at least for some time, the contribution of compute scaling. What would be an example of of what you just mentioned there?
00:03:37
Speaker
So post-training and kind of inference time kind of augmentations would be things like fine tuning. So one thing would be kind of supervised fine tuning or instruction fine tuning to get the model to do the specific thing that your users want out of the model. and so Things like this have been slightly the harder to get from just purely scaling compute. And so there the contributions of compute have been ah relatively minor ah relative to some of these other contributions. um But still, even compute scaling helps in terms of getting out of the ground. So there is this contribution from compute scaling now, I think these non scale contributions are also quite important. So in the report I mentioned, you conclude that training compute quadruples every year. So what does it mean to have a forex
00:04:33
Speaker
annual growth rate. Maybe you could give us some examples of other phenomena with with growth rates. And so its just to give ah an intuitive sense of how fast training compute is scaling. Sure. So yeah, as you said, it's scaling at this extremely fast pace of 4x per year. And this has been the case since about the kind of deep learning revolution in the 2010s. And so you know that's been going on for kind of over a decade now.
00:05:02
Speaker
At this pace of about 4x per year and and and this is really extraordinarily fast so i looked at some kind of other examples of these extremely fast scale ups that we've seen in the history of technology. And you know even relative to those it seems to be you know really in in a class of its own so.
00:05:21
Speaker
the kind of peak rate of adoption of of mobile phones was about 2x per year for a period of five years or so. The kind of peak rate of solar energy capacity in solution was maybe about 50% per year, which is really quite fast, but still and not nearly as fast as a 4x per year ah for compute scaling. Then for kind of the number of human genomes sequenced,
00:05:47
Speaker
ah per year, and this gets slightly closer to about 3x per year ah for a period of maybe five years or so. So even relative to other kind of extremely fast rates of and technological capacity expansions, the scaling of compute seems just extraordinarily fast. And so I think this is really kind of a defining trend of our current era.
00:06:11
Speaker
that we're running through these orders of magnitude of compute extremely quickly and seeing this extraordinary kind of expansion of the capabilities of these AI systems as a result. And the key question then is so what you try to answer in your report, which is how for how long can this continue? The specific question you ask is is whether it can continue through 2030. And maybe let's start with the the main conclusion there. what do What do you find in the report?
00:06:40
Speaker
We find that it's quite likely that we can continue scaling up ah roughly at the rate that we've observed historically. So at this kind of 4x per year rate, and we can probably continue doing this through the decade. And so what we do is we look at kind of.
00:06:58
Speaker
planned expansions from industry reports, from claims made by but relevant companies, you know TSMC for scaling the production of chips, or and of data center companies and ah ah electricity providers for the scaling of power that is needed for training, as well as estimates of the but stock of data that's available to train these models.
00:07:24
Speaker
And but for each of these, we look at them carefully and we conclude that, yep, it probably permits this kind of 10,000-fold scaling that is kind of on trend for this decade. And this 10,000-fold scaling is roughly the difference in scale between GPT-2 and GPT-4. So then we conclude that the similar jump that we've seen from GPT-2 and GPT-4 in terms of compute is something that is on trend with this Forex per year rate that we've seen in the past 15 years. And it is also something that is quite plausible that we can continue to do and actually train models of that scale. And maybe just to remind ourselves, what could we what was possible with TPT-2 and what what is possible now with TPT-4? Maybe talk a little bit about that jump in capabilities. I don't know if
00:08:17
Speaker
people have played around with GPT-2, most people haven't because it was just really so bad that it wasn't worth at once time to actually interact with it. But it's quite interesting to do. I think it can basically write grammatically correct sentences sometimes, but there isn't much insight in those sentences that kind of, you know, the the lights seem on, but there's nobody home in GPT-2. I think GPT-4 Lots more people have tried. I've actually played around with it. like it It became the ah fastest growing kind of tech application and where there's like a million people joining extremely quickly, precisely because it's very useful for you know helping you edit your text, helping you express he yourself and you know rewrite emails, helping write code. I think that's a very common application, which I also ah use these models for. And I think it's really great at that.
00:09:15
Speaker
at least for kind of simple simple applications. And so you know that jump has has enabled i think largely enabled this difference in capabilities where you have this like ah rudimentary reasoning in GPT-2 and this really quite you know really quite useful system in GPT-4. GPT-4 is, of course, also capable of generating images and searching the the internet and doing advanced mathematics and and many many more impressive things than GPT-2.
00:09:45
Speaker
Now, if we're talking about a 2030 system, do you dare try to predict some of that system's capabilities? but Because talking about the scale of the system is is interesting, but perhaps a bit too a bit abstract still.
00:09:58
Speaker
I think it's hard to know the details of what the system is able to do and ah perhaps isn't able to do. I think it's kind of unfortunate that we don't have really good scaling laws that enable us to predict ah even the performance on what kind of benchmarks.
00:10:15
Speaker
let alone kind of predict the real world's value. and And so I think this is quite hard. Now things I would guess that the system would have. So Jacob Steinhardt has this nice kind of analogy where he compares the these kind of LMs to you know interns or co-workers where there's some time scale at which you have to give feedback and kind of engage with the outputs of the system. With current systems, every every kind of input output there, you need to be involved, you need to like steer it, you have to kind of read the output and give it feedback and, you know, on the order of like one minute you have to give, you have to kind of hold its hand and tell it what to do. I think these systems as you scale them up become more coherent over longer horizons.
00:11:01
Speaker
and are able to perform ah longer term actions and actually your plan to do kind of things that are useful. And so then it enables it to you know take these longer courses of of actions where it requires fewer interventions from its users. And so I think that might unlock some of this agency, as well as making it more like a a remote worker.
00:11:29
Speaker
Now, it's kind of unclear whether it is in 2030 models that have been scaled up by maybe 10,000-fold will quite get to this kind of drop-in remote worker where you know you can basically like a remote worker that you might hire.
00:11:45
Speaker
you know give it some task and it's able to accomplish this very well. It's kind of unclear, but I think that is something along that direction, you know something in that direction is perhaps, it's kind of greater agency is is something that might come out of this, the scaling as well as just like better complex reasoning capabilities, more knowledge and understanding of of of some of even fairly obscure topics.
00:12:11
Speaker
and obscure branches of math and kind of obscure more niche applications or more niche fields of of machine learning research. and A bunch of those things will probably appear as you scale these models up and as you provide them with more training data. so In what sense are there diminishing returns to scaling? Yeah, so the usual way we think about these performance improvements as a function of just scaling them up is that this is a kind of log linear. So it's a straight line on a plot where you have compute on the log compute, sorry, on the x-axis. And so kind of every doubling of compute gives you something like a linear improvement ah in in some kind of performance metrics.
00:12:54
Speaker
like loss or accuracy or something like that. That is kind of a common way of thinking about this. And so that tells you that there are these diminishing returns, which is that you have to scale your compute exponentially to get linear gains in performance.
00:13:11
Speaker
Now, fortunately, a compute is something that has been scaling exponentially. As we just talked about this 4x per year, a grocery, if this model of this kind of log linear performance with compute is correct, then you get this kind of linear improvement over time. So I think this picture tells you that it becomes increasingly expensive to train models and to get this kind of increment of performance improvement.
00:13:37
Speaker
Is there any sense in the data you've you've looked at of sublinear returns to to exponential amounts of compute, where for for each new generation of models, you would see less and less of an impressive jump in capabilities?
00:13:52
Speaker
Yeah, that's right. So there certainly are plenty of examples of this. And it's kind of unclear what to make of these examples. So there are examples of you know scaling on some really hard kind of benchmark where you get basically no improvement in in performance. Or because maybe maybe the task is just super hard that the model just totally is unable to to solve it. Or maybe there are some early ah cases that are kind of simple and it gets us right, but eventually that it isn't able to get and this gradient of improving performance. And, you know, sometimes this is because this task isn't well represented in the data set.
00:14:33
Speaker
And so, you know, purely scaling this up, like it learns what it's able to learn from the data, but it isn't able to do much better than that. I think this exists. It's unclear exactly how significant this is because often it's all it's it's a thing that you can unlock by just folding in the right data, the kind of relevant data, or giving it access to some basic tooling, such that it's able to kind of actually make make progress on on this problem.
00:15:02
Speaker
you know we we We talked about these potential diminishing returns, but there are some reasons to think that that there are faster than linear gains that you might sometimes see when increasing compute exponentially. And one reason is that there are kind of larger gains in terms of the value of having certain AI capabilities around human level performance.
00:15:26
Speaker
And so that's because, yeah for for example, having a self-driving car that is 10% worse than humans is not worth 10% less than having a self-driving car that matches human performance.
00:15:41
Speaker
And so you sometimes get kind of in some more important metric, like the value that this system generates, you might get kind of faster than linear gains, at least on some portions of this curve, such as maybe around human level performance. And then other reasons for maybe expecting things to increase kind of faster than linearly is that there are these kind of increasing returns in AI or there are these economies of scale. So for instance, yeah know for earlier systems like you know early language models like BERT or GPT-2, it just wasn't worth building a lot of really great tooling
00:16:24
Speaker
like you know what you have in chat GPT with a code interpreter and what cloud has with artifacts and things like this, because you just didn't have many users. And so you know spending that fixed cost didn't allow you to amortize this over a large number of of users. and And the same is true for the kind of cost that you occur incur during training. Just like this one tight cost that you can amortize over the users that you have.
00:16:49
Speaker
And, you know, early on for these very small systems, it didn't make sense to spend all that much in that on those fixed costs because you couldn't advertise it over many users. You didn't have many users who were willing to pay for it. But now, you know, you start to get.
00:17:05
Speaker
have more and more users as you improve the performance of these systems. So this kind of starts this flywheel where it starts to make sense to spend ah kind of a lot more money on the um the training, on on on post-training, on building.
00:17:22
Speaker
a really great way of interacting with these models, and which you know before didn't didn't quite make sense. And then there's kind of other reasons ah for increasing returns in AI that suggest that as we get more capable systems, it makes sense to ah spend even more on making those useful. So one basic reason is, so for instance, serving these models. There are economies of scale to serving as an API. And that's because your utilization of your hardware is better if you have large batches when you have many requests coming in. So if they don't see, ah then you can start to use your hardware much more efficiently. If you just get a one every second, then you have to run many H100s, many of these expensive data to set that you could use on just serving a single or request.
00:18:12
Speaker
And that is just very expensive. But if you have millions, then you can start kind of using your hardware more efficiently. And that kind of reduces these costs. And so you maybe are able to pass on these these savings to your to your users. And so from the user perspective, it might seem like you get faster ah improvements because of these kind of economies of scale. You mentioned earlier these kind of threshold effects where suddenly or what seems to us perhaps to be as a sudden jump from from an AI that's that's not capable of doing a task to an AI that is that is kind of capable and in a useful way.
00:18:50
Speaker
Maybe one example there could be image generation was quite bad for a long time. It didn't look real. The fingers looked weird. And so you couldn't use it and in and commercial applications. And then you see a a kind of jump to usefulness where it suddenly is competitive with, with say, a some graphic designers. Where are the next ah thresholds?
00:19:13
Speaker
Would these these be around longer term tasks or perhaps reasoning or where do you where do you think some of these breakthroughs might be? Yeah, so I think I mentioned before about this need for frequent intervention to the systems that we currently have, that you kind of need to kind of do this hand-volding and steer it in the right directions. And this is, I think, a kind of important barrier to getting these systems to become valuable. And so I do expect that once we get this degree of autonomy that's kind of similar, an agency that's kind of similar to what humans typically might have, then you might get
00:19:52
Speaker
You know this kind of flywheel where it actually starts being useful and people start really spending a lot of money on improving this. I mean I think one thing to recognize is that the human range itself is quite vast.
00:20:06
Speaker
I think this is something that the recent history of like of AI tells us that sometimes it's easy to match like medium performance, but it's just really hard to match expert level performance. And so I don't know if this is like this threshold that you hit and then all of a sudden you're good. I think this is more it's it's a bit more complicated because of this kind of span of human range being potentially quite large. And so I think there isn't this kind of discontinuity, but there could be this kind of acceleration as you start going through the human range where it starts to become more and more useful to actually substitute for for humans, at least in some tasks in some cases. so thats That's interesting, because i've I've heard arguments that the human range is is kind of narrow compared to compared to what's what's possible with intelligence. And so what what do you expect in terms of time from i'm going to median human performance ah to expert human performance? For example, when when you think of programming, I think we have moved from median performance to expert performance. Quite quickly, perhaps we're not at at the extreme end of the expert performance yet, but it seems to me that we've moved quite quickly through the human range there.
00:21:22
Speaker
I think we have moved in programming through some part of this range of human abilities. I think we're still a not quite kind of expert level of like you know actual software engineers, like top software engineers at tech companies. And I think we're starting to get like systems that are useful to those people, but not quite at the level at which they could totally substitute for what those engineers are are doing. I think another kind of example that I think about a lot is just kind of math research or or or kind of math ability. and This is because I'm currently working on an extremely hard ah math benchmark. And there I think we've, you know, it's taken
00:22:10
Speaker
you know, maybe maybe three or four years, maybe, you know, three or four orders of magnitude of compute to get from, you know, doing arithmetic and doing very basic math to kind of maybe getting close to top high schoolers. They were not quite yet at i top math researchers. And I think it's going to take at least some time until we get there.
00:22:36
Speaker
I think in part this is also because there's a lot of math in high school and and even in undergrad, which is much easier to verify correctness. But as you get to kind of the higher end of the spectrum,
00:22:53
Speaker
I think this verification becomes harder, where a lot of the contributions, at least in research, are about like introducing novel ideas that are fruitful, that lead to some generalization of of other results or other other ideas, or end up producing kind of novel insights that you can build on top of. And I think we've not yet seen examples of this actually emerge. I mean, maybe maybe some so very simple ones. But I do think we're still quite ah you know a bit a bit away from that. And so I suspect that and moving through this human range, say for math, is going to take you know at least maybe five years. um And I think maybe something similar of is also the case for kind really great software engineers that will take um some time to get there.
00:23:40
Speaker
Yeah, what do you make of the benchmarks we have specifically for for math and programming where AI currently is able to ace some of these tests for for what what we what would be kind of interview programming questions, high level interviews or, you know, competitive mathematics or competitive programming? How close do you think those benchmarks are to actual performance by by kind of working programmers and and working mathematicians?
00:24:10
Speaker
I think those benchmarks are quite good for testing these isolated kind of problems that you sometimes encounter in you know and in doing doing mathematics or doing software engineering. In in math, like many of those problems are not the problems that you might encounter in doing research. I think they're quite different. In software engineering, some of those problems are maybe somewhat related to the problems you might encounter.
00:24:34
Speaker
I think a key distinction is, again, this like this kind of time span over which these tasks are typically done in the real world and the kind of benchmark tasks, where these benchmark tasks have isolated things that you you know don't need to integrate a bunch of information about. you know you're the The existing code base, say, that you're working with and have to satisfy a bunch of requirements around how it fits into that and what like you know a bunch of stakeholders might care about, like you're just solving, like you know write some script write write some script that does this particular thing without of integrating a bunch of additional context. And I think it's quite good at this. I think in part, these tasks have been, we've seen quite a lot of performance improvements here because verification has been fairly easy. like It's often possible to do unit tasks or in some of these kind of math,
00:25:30
Speaker
ah problems you're able to gets a result and plug plug that back into the equation and see that, yep, it satisfies, you know, there's maybe a die from time equation and you get some results, you can plug this in and confirm that yes, this is indeed the correct answer. And in cases where it's possible to do this kind of verification, we've been able to ah make a lot of progress because you can use these types of verification techniques. You can also, it's also easier to generate a bunch of synthetic data that is high quality.
00:26:02
Speaker
And so we've seen really good progress there. But often in the real world, problems ah that people encounter don't have and this feature of easy verification. And perhaps for those, we've seen maybe less progress. And so yeah writing research papers and coming up with kind of ideas that are useful and you know enable you to make progress in the future by building on those ideas, I think we've seen kind of a bunch less progress on, on that domain, I think it part because it's harder to do this kind of verification and check whether this is actually useful. What is the goal with the very hard math benchmark you're putting together? What do you want to accomplish with that? Do you want to capture some of this longer term thinking in a benchmark or what's the goal here?
00:26:53
Speaker
Yeah, that's right. So this longer term thinking is is important. And so the problems we have are you know of this of this type that it takes 10 pages or something to be able to solve the problem. And so this requires this kind of longer horizon reasoning. I think math is quite unique in the sense that it's one of the few domains of science where you have arguments you know precise analytical arguments that span many pages. In extreme cases, it might span hundreds of pages to prove some result. And in in many cases, you can do this verification. You can check whether the answer is correct. And so that's why it's this kind of nice um sandbox for testing ah the ability to do kind of a research and R and&D relevant reasoning over
00:27:44
Speaker
kind of very long horizons and still being able to do automatic verification. And you don't have to get kind of a peer reviewer in to be able you know to tell whether this work is any good. and So that's why I think math is really nice. And so you can test this kind of longer horizon ability.
00:28:01
Speaker
Yeah, so we are really trying to capture kind of modern mathematical understanding across many, many domains and and branches of math. Even some that are ah fairly obscure, where only maybe 10 people in the world would know how to solve some of these problems.
00:28:17
Speaker
yeah We want to be able to test for the ability to do this kind of creative reasoning to connect disparate ideas in the literature and figure out how to use them in a way that enables you to solve this problem.
00:28:35
Speaker
And I think we've seen kind of less of that in AI so far, where we haven't seen as much progress on combining ideas creatively and coming up with these very creative flashes of of insight that you make scientific progress. And so that is something that I'm excited to try to test for and have ah you know a really good benchmark for at least some dimension of that.
00:29:02
Speaker
Yeah, sounds exciting. Okay, you mentioned this 10,000 times scale up in training compute for Frontier AI models by 2030. What do we know about how that will be financed? Who's going to pay for that?
00:29:18
Speaker
Yeah, so i I don't know who exactly would pay for this. you know I think there have been some discussions of ah some players who've been, the ah OpenAI and others have reached out to and tried to fund this. UAE Investment ah Fund and SoftBank and players like this. I don't know the specifics and and you know exactly how that will be financed. i think i am confident that a technology that has this promise of being able to flexibly substitute for human labor is just extremely valuable and it is worth spending a lot of money in terms of, you know, fraction of total output and trying to advance and kind of bring forward the date at which we can automate, I say, human labor. And so one intuition for this is that
00:30:06
Speaker
Currently, we spend about 60% roughly in the U.S., say, on wage compensation. ah Globally, around $60 trillion dollars a year is spent on just you know labor inputs. So this is the most expensive kind of factor in terms of compensation, factor of production.
00:30:28
Speaker
And so being able to, you know, automate this and even be able to capture some fraction of the value of labor is worth, you know, this this flow of trillions of dollars per year.
00:30:43
Speaker
And so you know that intuition tells you that it's worth spending really quite a lot of money on trying to, on on automating this kind of most expensive factor of of production. On top of this, it seems likely to me that you also get this kind of acceleration of growth. And so our economy eventually becomes kind of a lot larger, which is extremely valuable to make that happen soon.
00:31:09
Speaker
And so it should, I think, be valuable enough to spend on the order of trillions of dollars a year um on building out this kind of compute-related capital stock, fabs, data centers, lithography machines, the energy infrastructure in order to power this.
00:31:26
Speaker
And what about, because we're facing uncertainty about how big the impact of AI is going to be, where perhaps a lower estimate would be something like the smartphone and a high estimate would be something like automation of of human labor. Under that uncertainty, you know, what does that uncertainty do to the people that are being asked to fund this?
00:31:49
Speaker
Yeah, so I think that uncertainty is quite rough in terms of actually getting like, in fact, it's something that, you know, TSMC and others have expressed concerns around exactly like if they scale things up, will they get their money's worth? Will they be able to sell these chips? And so I do think this uncertainty is this kind of blocker right now. um No, I think it like,
00:32:12
Speaker
probably shouldn't be nearly as much as it is. I do think that the value proposition is just really great. like Again, the $60 trillion dollar per year on wages, if you can capture some 10% of this every year, you know that's $6 trillion a year for some reason. And then on top of that, your economy is growing potentially really fast.
00:32:35
Speaker
like surely it's worth spending close to a trillion dollars a year or something on on trying to make happen. Now, after accounting for this uncertainty, like maybe you want to shade that down, but that still looks like a lot more than what we are spending today. So just kind of in the kind of expected value calculation suggests that even with kind of a lot of uncertainty, it should be it sure be worth spending a lot of money on.
00:33:02
Speaker
I think there's other reasons. so what One thing is that even if you're uncertain about whether you can ah fully substitute for human labor, you might you know get partial automation. That partial automation is itself quite valuable. and so Already, OpenAI and others are making ah billions of dollars in revenue,
00:33:23
Speaker
if they're able to automate some you know fraction, you know not quite 100%, but some fraction short of that, I think that is still going to be worth maybe on the order of trillions of dollars. And so even without being certain that you're able to fully automate everything, it's worth worth doing just some unexpected value calculation. The other thing is that it is also extremely valuable to resolve this uncertainty. like There's a lot of value of information And you get a lot of information by doing your next, you know, scale up and expanding your infrastructure for handling that, that scale up. And so.
00:34:01
Speaker
The resulting this uncertainty soon is going to be really valuable because my maybe doing the next training round tells you that, oh, actually this looks a lot easier than maybe ah people expected. And if that's the case, then it would be extremely valuable to know because that would enable us to get to this kind of goal of of having AI that flexibly substitutes for human labor much more quickly.
00:34:24
Speaker
which would be extremely valuable. And so this type of information, considerations, justice, that itself is is also really quite useful. So how much would you say social factors matter? Where by social factors, I mean for each new generation of models, investors are looking at that model and and kind of judging how impressive it is and then thinking about whether they want to finance the next generation of models. So how much is is future investment dependent on each new generation of models? I think this has been important in actually justifying the next round of spending to support this further scale-up.
00:35:05
Speaker
so you know i think I think the timing of these fundraisers are often around when these next he kind of these models come out. So we've recently seen this closing of this round, which happened in the same couple weeks as the latest system from OpenAI ah that came out. So they they kind of secured this round just about then.
00:35:26
Speaker
I think accounts of the CEOs of some of these hyperscalers suggest that, yes, these demonstrations have been really quite important. I think like maybe there's too much paying attention to some of these demonstrations and maybe not enough to just purely looking at the body of research that has validated some of these ideas around scaling. And so i I think there's maybe have too much attention being paid to the the details of the capabilities of these systems that are being launched and precisely how many customers they're able to serve and how much revenue they're able to generate. like I think the real prize here is being able to build a system that flexibly substitutes for human labor.
00:36:15
Speaker
And we should kind of keep our eye on this prize and kind of not get too distracted by whether yeah know we're able to package this nice kind of business enterprise product. ah Now, maybe this gives us some insight into how successful kind of deep learning is. But I think the kind of research that we have is ah the papers on scaling laws and other things are maybe kind of more important for giving us an indication about how good these systems are that we're currently building and the techniques that we have for building them. I wonder how much the CEOs of the hyperscalers will be allowed by their investors to to invest ah before they see kind of solid returns. One example here might be Meta, and Meta invested a lot of money in trying to build out the Metaverse, and there was at least some rumors of investors losing confidence along the way. This might be an analogous situation to to what is what is promised by the CEOs of of OpenAI and Tropic and DeepMind for what's coming in the future. For how long do you think investors will be satisfied with with spending spending a lot of money until they they see great returns?
00:37:29
Speaker
All right, so first of all, like this friction around investors being kind of uncertain and hesitant has been quite important. So one particular case is with TSMC, which has been dragging its feet on kind of expanding chip production because it's kind of unsure whether AI will continue growing at the rate it has in in the past. and so Without receiving kind of advanced market commitments, ah where someone is promising to buy the output of new fabs, it's it's been somewhat reluctant to build out enough fabs to support the skill, the kind of
00:38:10
Speaker
ah projected scaling that kind of happens. and There's some rumors where so Sam Altman you know has asked for a bunch more production and they've ah apparently been been reluctant to kind of get on board with with with that. um And so like there's this kind of dragging of of its feet that TSMC is doing. I think like the more efficient allocation would probably involve a lot more spending and a lot more kind of production of chips and related infrastructure right now. Like I expect that we will have these extremely large training rounds and have a lot of compute that is going to be needed. And it is kind of inefficient for that scaling to happen
00:38:57
Speaker
exactly when you need it. You ideally want to scale this up beforehand because there are these sort of premiums that you otherwise have to pay, whereas you very suddenly want to expand your chip production.
00:39:12
Speaker
Then you have to pay this large premium on on making that happen. The more efficient way of doing this would be to kind of more smoothly ramp up production of chips and kind of other capital stock like the energy infrastructure, the lithography machines and and other things.
00:39:28
Speaker
and so I think in order to avoid this large adjustment cost, it makes more sense to do this initial kind of expanded scaling even well in advance of when AI generates a lot of value. and And that's because you want to kind of avoid, like it's not possible to perfectly time this.
00:39:47
Speaker
you you have to kind of prepare your supply chain ahead of time. An analogy for this is something like space colonization. In order to achieve like very successful space colonization, you might want to get really good at landing on the moon or something like that. Not because you care about landing on the moon, but because it helps you flex those muscles.
00:40:09
Speaker
And so i I think that the kind of more efficient thing we could be doing is is in fact, expand this kind of compute related capital stock and do these training runs to kind of flex our muscles on being able to have the supply chain running, as well as learn a lot about how to do this expansion, how to do these large training runs, and how to provide the energy and all the things that we need to support that.
00:40:34
Speaker
Yeah, could you could you elaborate on why we need this kind of slow buildup and and why why the this mistiming between investment and return is actually ah economically rational? Why is it that if we were to build up the compute we need closer to the to the time that it would generate a massive return, we would have to pay premiums?
00:40:56
Speaker
Yeah, so the reason for this is that it takes time for these FABs to be built. you know It is takes, on average, maybe two years to build a FAB, and you can speed this up by kind of spending more money on paying everyone a premium to get people to work longer.
00:41:13
Speaker
yo pay people to drop out of related jobs and join TSMC or ASML or OpenAI or whatever, or you know pay for ah these very large kind of advanced market commitments and occur a bunch of risks this way. and And this all comes with this premium that it's harder to finance, ah that it's you know you you have to just pay for these types of adjustment costs. And and this kind of is inefficient because you could kind of avoid this by spending things earlier.
00:41:44
Speaker
By you know investors you know shouldn't want to be caught you know out of the blue and suddenly realize that AI is going to be a big deal and suddenly have this frenzy of investment. Instead, the better thing would be for this investment to kind of ramp up much more gradually. And that that means that like if you think that this investment will eventually happen, that you eventually will get you know trillion-dollar training runs or whatever, then it makes sense for quite you know a lot of this spending already to be happening. And you know we aren't seeing the spending. And so I think there is this ah frenzy that might happen later where yeah people people will be disappointed that they didn't you know invest in this sooner because it turned out to be very valuable.
00:42:30
Speaker
I think there's already some of this like updating that's happened. where If you look at valuations of Nvidia, like this is way beyond what people expected before, and you know, only a couple of years ago. And so I think there is already this kind of this updating, this quite substantial updating that that has happened. And I suspect more of this will have happened in the future. you Do you think the build up of the internet is a good analogy here? I'm thinking about There was a lot of investment in the late 90s in building up internet infrastructure kind of before there was a lot of of revenue to be generated by the internet. And then you have the the dot-com bust, and then afterwards the internet turns out to be a really economically valuable kind of part of the part of the economy. Do you foresee something similar for AI? Is this a good analogy?
00:43:22
Speaker
the analogy of the internet is probably not a great analogy because like AI just fundamentally seems like a much bigger deal because it promises the stability to flexibly substitute for human labor, which is like this very important input into a kind of most economic production. And I think there's been very few of these technologies like the computer or the internet or what have you that where there is like this good reason to expect that you get this kind of growth acceleration or something. And so there isn't this very good reason that one would have had at the time to expect that the internet would have been like an extremely large deal that I think we do have with AI today. So I think the spending that we will do for AI eventually will kind of far outstrip the spending that it would have made sense to do for for the internet, for infrastructure there.
00:44:18
Speaker
So for the internet, the fact that we did have this infrastructure investment ah that was front loaded that occurred before the internet Internet-generated is kind of massive value. It suggests that for AI, we might also have this kind of investment that proceeds when a lot of revenue is generated. And this, I think, will predictably result in people pointing out and being kind of mystified by the fact that we're spending more than AI is currently generating, which is a common talking point among VCs. And I think i think the the kind of
00:44:57
Speaker
response to that is that this investment is in large part made in the belief that AI in the future will generate this this this large payoff, not that kind of current systems necessarily will generate a lot of value. And so there might be this kind of massive mismatch between this value generation and the investment that we might see.
00:45:16
Speaker
Okay, so we talked about how compute is the main driver of increasing AI capabilities, but we also have data as a factor. We also have software or algorithms as a factor. Maybe you can describe how the process works. how is What is the relative importance of compute, of software and of data?
00:45:40
Speaker
Yeah, so I think compute has been a very important driver of the performance improvements and the kind of depth and breadth of capabilities that we've seen emerge in AI. So we have, you know, does some ah some research on this topic and found that in, you know, large language models for this kind of language model, next token prediction tasks, we've seen this kind of two thirds of the contributions being from the scaling of compute.
00:46:07
Speaker
And then maybe one third being other contributions. So these non-scale innovations, which involve maybe improvements in the architecture, insights about how to efficiently use this compute and scale your model and scale the data sets. So these are things like scaling laws for, you know, scaling your parameter counts and and and the number of tokens that you're using to train, as well as kind of scaling other things, vocabulary or or other parts of your model. And then there are improvements in the data quality. I think this has been also really quite important, especially for things like complex reasoning tasks.
00:46:45
Speaker
and being able to do really well on you know specific tasks like following instructions. So this kind of fell instruction data has been extremely valuable for that. In in computer vision, it seems kind of similar. So we have we had this paper where we looked at computer vision and we found that there's kind roughly this equal split of the contribution of compute scaling and algorithmic innovations. I think this is something that is a common pattern in computer science.
00:47:15
Speaker
So there have been quite a few papers on the value of scaling of you know of of getting better hardware and the value of better software.
00:47:27
Speaker
in domains like linear integer programs or SAT solvers or chess engines and so on. And for those, the literature seems to find that there's kind of roughly this equal split of the contributions of hardware, better hardware and more hardware and and better software. And so I do expect that you know something similar is broadly true for AI and might you know continue being and of broadly true given this fairly robust pattern that you see in in many other domains of computer science. yeah How do you expect the relative importance of compute data and algorithms to change over time? Do you do expect, for example, data to become more important than compute or algorithms to become more important than compute?
00:48:13
Speaker
Yeah, I think this is quite tricky. i I think for many specific tasks, it might well be the case that data is by far the most important ah factor for advancing certain capabilities. And I think we're already kind of seeing this in, it's kind of unclear exactly what these big labs are doing, but I think one driver of the complex reasoning abilities of some of these systems has been you know really good ah data and perhaps a synthetic data. And this is very important for doing something like reinforcement learning, where you might want to yeah give the model access to kind of reasoning traces and train a model to get really good at doing the type of reasoning that results in the right answer.
00:49:01
Speaker
yeah what is ah What is a reasoning trace? So this is just, you know, if you ask like a language model to solve a problem, It like does a lot of reasoning and then gives you an answer. And and that that that that is the reasoning choice. So just the reasoning and you know that it outputs and before it gives you the answer if you ask it to think step by step. So you know that that is something that yeah OpenAI has been working on ah based on you know the O1 model that they launched previously in this model card.
00:49:32
Speaker
So I think data for complex reasoning tasks is is extremely important. I think at the same time, scaling ah just the size of the model and just generic internet techs that you give is also quite useful. And in fact, and these things complement each other. So having a larger model means that you can do better reasoning, which means that You can generate better quality reasoning traces to then power this you know reinforcement learning cycle. So there is this kind of complementarity there. Now, algorithms and innovations around scaling and other things also are quite important. One kind of reason to expect maybe that that might be more important in the future.
00:50:15
Speaker
is that just the kind of time scale at which you can make advances there is is much shorter than you know the scaling you can do in hardware. So it takes two years or so to build a fab. But in the world of software, you can push things ahead much faster.
00:50:31
Speaker
And so that suggests that you know there's that it's possible to advance software much faster than to do this for hardware. And so it does seem plausible that the kind of contributions from the software side might end up dominating in the future. And this is especially the case if you have this ability to do, ah to automate some of the research that needs, you know, that, that, that produces these algorithmic innovations. So if you have AI systems being able to substantially contribute to kind of R and&D, then maybe this, the software part becomes more important. Though I think there's a lot of uncertainty, but precisely whether that is possible and what exactly this, you know, how large those, those kinds of improvements might end up being.
00:51:18
Speaker
Um, so, you know, what one idea is that there are these compute bottlenecks in research that in order to validate some kind of research innovation, you have to do a training run and this training run requires compute. And so you can just kind of scale.
00:51:35
Speaker
you one without the other. You can't get a lot of the scaling of the inputs to researchers, like scaling up the number of researchers and AI is working on coming up with new innovations without also scaling up the hardware that you are using to run experiments.
00:51:54
Speaker
All right, you mentioned OpenAI's O1. Would it be fair to say that that system uses more compute at inference time? So so when it's when it's thinking about an answer and thereby generates a a better answer, and maybe are there some, does this indicate that there might be some scaling laws or the equivalence of scaling laws for inference time compute, similar to those we see for training compute?
00:52:20
Speaker
Yeah, so there are scaling rules for inference compute, and these have been around for some time. It's at some level somewhat surprising that it has taken this long for OpenAI to kind of implement something like this. like ah So Andy Jones, who's a researcher at Anthrope many years ago,
00:52:38
Speaker
had this really nice paper where you showed these ah scaling laws for the game of hex for scaling inference and you showed that you can trade off inference of trading and compute. You can spend less training compute but more inference compute and get the same performance as a model with more training compute with less inference compute and so you can there's this trade-off where you can I have a larger model that is slower and you know more expensive to run, being matched by a kind of smaller model that is just run for longer, where maybe it does more reasoning or you sample from it more times and do some aggregation. ah So there are these scaling laws for inference.
00:53:20
Speaker
And you can do this kind of trade-off. So you know we we had written about this in 2022 at epoch and we made this prediction in our in our work where we said, you know it seems possible that you can scale up inference and get the equivalent performance boost of one to two orders of magnitude of training compute with language models. And someone pointed out recently that like you know this aged fairly well given given some of this 01 stuff.
00:53:49
Speaker
and So what are the limits to to using more inference time compute? Why can't I ask a chat DBT to think for five hours and get an amazing answer, amazingly high quality answer? So there was this funny Twitter poll that was, you know, went viral on Twitter, which asked the question, if you had infinite that time, would you be able to beat Gary Kasparov playing chess?
00:54:16
Speaker
and and And you know, a lot of people in AI tweeted about this, including Davos Sosabas. And then I said that the answer was no, that like the average person just wouldn't be able to beat Gerikos Barov in chess, even with infinite and from a time, because there's just no amount of inference scaling.
00:54:35
Speaker
you know You can think of having more time as like scaling the inference budget that you give to a brain. And there's just no amount of scaling that gets you the same improvement in performance that you know something like better pre-training does.
00:54:51
Speaker
like having this this brain that's just much better at doing this task that Gary Kasparov has for chess, there's no kind of getting a smaller, a worse brain for chess as scaled up with inference compute to get it to perform at the same level. And and I think the but the intuition here is quite right, though I i disagree with the answer. There's like a kind of simple way of of solving this, which is you just repeat the chess moves to Gary Kasparov each time. so you play one move, you respond, you resign, and then you play, you know, you switch sides and you play exactly Gary's response back. And so you could solve this, but but that's besides the point. The point is that I think this intuition that ah there's a limit to the value of scaling inference compute is quite, is quite accurate. And we do see this in, you know, examples for math and someone at skill AI had reproduced some of the inference scaling laws that open AI
00:55:48
Speaker
ah showed where you know that there there was this kind of plateauing, but like yeah giving infinite time, it's just not go to like this there's some point at which it plateaus such that you know even though it might be increasing, it's not going to it's not going to get you kind of all the way there. It's going to get really expensive and maybe not give you much much in terms of improvement. I think part of this is because ah for some of these tasks, we don't have like perfect verification. so If you had some Oracle that tells you whether the answer is correct, then you can just scale up the number of samples to extremely large numbers and you can continue getting kind of some increments in performance.
00:56:30
Speaker
But without that, I think it becomes a lot harder and you start to hit these diminishing turns. and so we kind of In this paper, yeah so we we wrote something about this at a book and we had done some estimates about the kind of span of this tradeoff. and We summarized some evidence and then we you know estimated that you could get maybe two orders of magnitude of equivalence in training compute.
00:56:55
Speaker
so you can scale up your inference by some some large amount. And that gives you maybe at most two orders of magnitude of training compute equivalent. But that is that is quite significant, right? Because two orders of magnitude of training compute could be incredibly expensive. And if ah if you can replace that by spending more compute at inference time, then that might be a ah massive cost saving. Indeed, I think the spending on inference compute is kind of too little, such that it makes sense to spend more on on inference compute. There's a related blog post that we wrote recently where we made this point that
00:57:35
Speaker
the optimal amount of inference compute spending should be roughly half, or should be roughly equivalent ah to the amount of training compute you're spending. Depends on precisely the kind of slope of the stream off, but the basic intuition is that if you spend kind of 99% of your budget on um training compute and 1% on inference compute, you can double the amount of inference compute you're spending very cheaply and get like a pretty substantial performance improvement. And this doubling is very cheap, because it's only 1%. And so it doesn't make sense for this allocation to be ah very lopsided, ah because then there's a reallocation you can make that keeps your overall budget the same, but gives you just better performance, or allows you to achieve the same performance, but with a smaller compute budget. And so we kind of made this point that,
00:58:29
Speaker
you should expect ah inference and training compute spending to be on the on the same order um because of because of this ability to trade these two things off against each other. When we have the next generation of models, say a TPT-5, will it then make sense to spend more inference compute running those models on questions because they simply have more kind of raw intelligence and so they can it it's worth it in terms of the performance and the output to to have them think for longer. So so so for the reasons that I explained earlier, I think it makes sense to kind of set your inference budget and your training budget something on on par with each other. And so you know you do get these returns to spending more inference, but you also get these returns to spending more training compute. I think the the the the kind of ratio between these two gets determined by
00:59:24
Speaker
this kind of slope of this of this kind of trade-off between inference and training compute. And there's a question of whether if you scale your model up, whether this kind of slope changes, like whether inference compute becomes relatively more valuable than training compute. And I think it's kind of unclear. like It might be the case that new techniques for doing inference appear and become feasible. So you know there's this result, that chain of thought is only useful at a particular scale. Like if you give GPT-2 chain of thought, it doesn't improve performance because it just loses track extremely quickly. And it just doesn't have the long context reasoning ability to make use of this. Whereas GPT-4 like certainly can make use of chain of thought and that improves performance. So maybe you unlock these new techniques for using inference more officially, that
01:00:19
Speaker
increase the returns to doing more inference. And and and that might result um in slightly more inference spending. But again, I think it doesn't make sense for these allocations to be imbalanced, because when they are, then it starts to become fairly cheap to increase the one that is smaller. ah And you can often get you know performance improvements that way.
01:00:43
Speaker
Is this what the the kind of leading AGI corporations are doing? Are they spending half of their compute budget on training and half on inference right now? What do we know about this?
01:00:56
Speaker
um Unfortunately, we know very little. I think there you can do some kind of back of the envelope calculations based on yeah things that Sam Altman says about how many tokens they're serving. I think that suggests that probably no, they're probably not doing this optimally. Early on, when chat GPT first started,
01:01:16
Speaker
they didn't expect the the model to be as popular as it ended up being. And so you know they they kind of didn't make the correct decisions about precisely how to scale things. So they didn't, for instance, over-train the model. So this over-training ah reduces the kind of inference costs that you end up paying because the you can you can train a smaller model that has you know really good ah capabilities.
01:01:43
Speaker
does this mean that ah Google and OpenAI are not spending their compute in the most efficient way possible? And and if that's the case, why wouldn't they? to Why wouldn't they read your your research paper and discover that they're, they're making mistakes here? Yeah, i so I think first of all, it's kind of unclear exactly how they're spending their compute. I it's we We don't have enough you know data or information about exactly how many ah requests are surveying and how big these models are and how much it takes to serve these models and how much you're spending on on training. um So it's kind of hard to tell. I think there's like a bunch of other kind of nitty gritty you know details that kind of affect things here. So yeah these model releases. So first of all, this kind of result that you should spend if you want to optimally allocate
01:02:34
Speaker
compute that it should be maybe very equal in both inference and training. um It's like true on average, and it's like fairly rough ah that, you know, maybe 75 to 25 or something like that. It might also be optimal, doesn't need to be exactly 50-50. It's also something that's kind of true on average, like over ah multiple years or averaged over like a specific model generation.
01:02:59
Speaker
And there's like reasons why this might not be the case. So the releases of models can be kind of staggered, where you know first you have GPT-40, and then like months later you have GPT-40 mini. And so first you had to use this like very bulky, slow kind of model for doing even very simple tasks. like That's kind of an inefficient use of it your inference compute. And ideally you want to do some kind of routing.
01:03:25
Speaker
of the of the simple requests to kind of smaller models. This is kind of now sometimes done by using what's known as speculative decoding, where a model, a smaller model ah kind of stands in for the larger model and like produces the tokens. And then the larger model just does like one glance and see, okay, is this is roughly correct or not. So this is kind of one way of reducing your inference compute budget, but If you don't have the smaller model online yet, cause it's still training, then and the in the you can't quite use it. Obviously other things are like labs kind of underestimate how much demand there is for their models. So this happened with chat GPT initially or open AI. I saw this more as a research output rather than a a commercial product. And so yeah, a bunch of things that they could have done to, to kind of reduce the inference cause they didn't do like over training is a common technique where you
01:04:20
Speaker
train more on more data than it's like actually optimal from a point of view of minimizing your training compute budget, getting the best performance from your training compute budget. You might want to spend more on your training, on more tokens, on more data, rather than expanding the size of your model. And that's because smaller models are cheaper to serve, and they have yeah lower compute, inference compute costs.
01:04:48
Speaker
And so right now we're seeing this trend towards ah smaller models where you know it seems like the original GPT-4 was likely on the only order of like trillion parameters. And now these kind of frontier models are likely like quite a bit smaller. And that's to save this kind of inference cost because they're being served to very many people. The other thing is that you know training costs is kind of front loaded, that you pay this kind of initially and then you serve them all for a long time. And so over that training period, you're going to be spending kind of most of your compute on training, rather than on inference. But then once training is done, you're going to be spending a long inference. And so there's this front loading back loading of training and inference. So this is quite messy. um But I do think this direction that we're going in with O1, which is using a more inference compute dynamically, depending on
01:05:45
Speaker
the difficulty of the of the problem is, I think, a kind of direction that I expect we will continue to move towards. And I think like a bunch of other innovations that we've recently seen, I expect it's decoding, are you know you kind of of this flavor of dynamically using more or less inference to get a good performance. And and so that this is kind of one of these innovations that I think makes the models that we're able to train a much more powerful and cheaper and kind of more usable because of its lower costs. And so you can deploy them more widely. You can get better performance on demand by spending more money. I think that that is just going to be ah in service of widespread adoption and getting people to get more value out of
01:06:32
Speaker
Yeah. Do we know how how the models judge how much inference time compute to spend on a given question? So is there some evaluation of how complex the question is before the model decides how much time to to think about it? So unfortunately, it's it's very unclear what these models are doing because, you know, OpenAI and Google are so are so kind of tight-lipped. So, you know, I think things I would speculate might might happen is like maybe there's a classifier that like figures this out. I think with speculative decoding, which is where you have a smaller model occasionally step in and produce the tokens instead of the storage model. Yeah, the thing that happens there is that
01:07:14
Speaker
the The larger model takes a look at the first token that's generated by the smaller model and sees if it matches what it would say. And then if it does, then it just you know delivers and just delivers the the tokens of the smaller model. I think in in that case, that is ah kind of dynamically determining based on the difficulty where you have more difficult problems, it is more likely to disagree.
01:07:38
Speaker
with the smaller model. And so it's more likely to actually take the outputs of the of the larger model. And so there is this kind of dynamic allocation, which I think is is quite clever. Do you expect that the models are spending too much kind of inference time on very simple questions?
01:07:57
Speaker
So we i could ah i could imagine that a lot of the prompts to to chat tvt are about you know maybe homework questions maybe when was this president born or how is the weather and kind of very simple math questions perhaps. Are we all over spending in france compute on on those types of questions and do we even know.
01:08:16
Speaker
Yeah, I think it's definitely right that we haven't quite you know fully optimized the way that we're kind of handling these types of things and routing them to the right models and spending the optimal and allocation of inference compute. And you know I think at some level, this is not something that might make sense to like fully optimize at this point, given that it's like not a huge cost or something. um But in the future, if AI becomes this like much larger part of the economy,
01:08:42
Speaker
I expect that we will do a much better job of of actually optimizing how we are spending our compute. You talk about increasing AI capabilities leading to near complete automation of labor, but that the timing of this depends on mainly two factors. What the returns are to software research and development and how quickly we can expand the compute that's available for for training these models. Maybe we could discuss these these two factors, starting with the returns to software R and&D. What does algorithmic progress ah currently look like? Yeah, so I agree this these kind of two factors are quite important and I think are like sources of disagreement about this question of when we might have AI systems that are capable of widely substituting for human matching or surpassing human capabilities.
01:09:42
Speaker
Now these are not the only ones, I think there are like a bunch of other uncertainties you have about like how much compute really do we need in order to get there and whether you need some some fundamental insights about intelligence and AI in order to get to systems that are able to do this. But if you kind of, if you buy that, you know, all the things we need is just like scaling and some kind of algorithmic innovations and that the scaling isn't like, you know, astronomical, it's just, you know, maybe five orders of magnitude
01:10:16
Speaker
maybe less or maybe slightly more. Then I think these two factors are really quite important for determining the the pace at which we kind of cross this finish line. So the software R and&D side, I think the the kind of crucial, like one crucial question is,
01:10:31
Speaker
if you're able to automate R and&D, if you have AI systems that are kind of as good as humans and doing R and&D and software engineering, or maybe even better, then how quickly does this ah result in ah capabilities that, you know, are are um enough to broadly substitute for humans across any possible task? You get like incredibly capable systems. And I think there the kind of key questions are,
01:11:00
Speaker
If you can automate AR&D, like what about like these compute bottlenecks? Is it the case that just having a bunch of AI researchers work on and ML R and&D, do you get like very fast ah progress and in the relevant software? And maybe it's the case that in order to do really good research, you need to have compute in order to run experiments. And so the returns to this R and&D diminish and the effect of automating R and&D are maybe quite modest because you're running into these bottlenecks. I think there's another question of what the nature of this software progress, this kind of progress in algorithmic efficiency exactly looks like. So I think we've seen progress in terms of kind of algorithmic improvements
01:11:48
Speaker
of broadly two types. So one has been of the type of and of expanding the range of capabilities that models can do and just making them better at pushing out the boundaries of what you know models can do. And the other type of algorithmic innovation has been of kind of reducing the cost of achieving have already attained levels of capabilities.
01:12:14
Speaker
So this would be things like distillation, where you take a large model and you you distill it by you know generating synthetic data and training a smaller model to make that smaller model really quite capable and you know attain the existing capabilities.
01:12:33
Speaker
And so we've seen kind of progress of both types, but maybe it's easier to achieve algorithmic progress that enables you to attain already attain levels of capabilities. So to kind of make it easier to imitate GPT-4. And if it's the case that algorithmic innovations have been mostly ah making possible to get the abilities of these larger models but more cheaply,
01:13:00
Speaker
then having a lot of automated R and&D might not actually expand the set of capabilities you can do. And so there's one way in which it does, which is that if you make the model kind of cheaper and smaller, you could run more inference and then you can kind of amplify your model. You can get some boost using this kind of inference scaling. But I think the argument that automating R and&D ah results in this like explosion of capabilities looks a lot weaker if algorithmic progress has been mostly of the sort that it kind of reduces the cost of achieving kind of already attained capabilities. so and so and Then there's this kind of second, this other notion of algorithmic progress where it just expands what you can do. and I think if
01:13:45
Speaker
If that is something that and we can automate doing, we can automate the R and&D process that expands what these systems are able to do, then I think the argument that we get this feedback loop that results in this accelerating progress, I think that looks slightly stronger. And so there's this kind of question about what the nature of algorithmic progress really is. I think one reason to expect that you know algorithmic innovations um have been a lot about reducing the costs of attaining existing capabilities is because you know if you look at something like LAMA 3.1, it actually looks quite similar to GFT2. It isn't that different. right It's just much larger. And so maybe ah expanding the frontier has been mostly about compute scaling and less about like clever algorithmic tricks or something.
01:14:36
Speaker
I think that seems quite plausible. At the same time, you now have 8 billion or 1 billion parameter models that match the performance of 100 billion parameter models that we had. and Years ago, GPT-3 was 135 billion parameters. We now have billion parameter models that You get basically the same performance. And so it seems like we're making a lot more progress on reducing costs, but not really expanding what we can do ah because llama 3.1 looks pretty similar up to GPT 2. And yet it's pretty close to GPT 4 and what like labs are doing. And so.
01:15:11
Speaker
that suggests that maybe the algorithmic innovations that we're you know best at making is just reducing costs. And that is, I think, less likely to give rise to this kind of accelerating a flywheel of like automated R and&D, giving you better systems that are more capable, which enables you to make even more progress and and get this kind of accelerating in capabilities. I think i think this this question of you know, whether we can automate R and&D is quite important for some other strategic and governance related questions. So for one is related to how much automation we should expect to see before we get very capable AI systems. If it turns out that automating R&D works really well and we get just like much better systems by spending
01:15:59
Speaker
are compute budgets on you know AI systems working on improving kind of AI research and of producing innovations. you know OpenAI and other labs might decide that that's the best use of their compute budgets, that they should just spend their resources on running these systems internally rather than serving them to others because it's really valuable for them to do this.
01:16:23
Speaker
And so maybe yo know you maybe you don't see a lot of automation happen ah if the returns to the software R&D and this like flywheel is possible, such that it makes sense for for these apps to spend a lot of their compute budgets. That could be a pretty dangerous situation, would be my guess. If it turns out that automating software R and&D is so valuable that labs keep it internal,
01:16:47
Speaker
the world might be surprised at what comes out at the end of that process. So kind of it looks like not a lot is is happening perhaps, and then suddenly you have incredibly capable models.
01:16:58
Speaker
That's right. So the other question is, ah the other factor that you mentioned is the pace at which we're able to expand kind of compute and related capital. So, you know, compute related capital like lithography machines and and fit data centers and fabs. I think this question is kind of important for determining the pace at which we're able to do the scaling and the pace at which we're able to have this automation happen.
01:17:25
Speaker
build a new fab takes about two years. And so this is kind of the timescale over which we should expect to get maybe a doubling in the amount of compute. And so this is like, you know, fairly slow compared to the rate at which we're currently scaling training compute, which is a doubling every six months. And so maybe it's not possible to expand The amount of compute and the data centers and other things at the rate at which, you know, things are currently scale being scaled up. And so we might expect things to slow down and as like most fabs are ah running to produce AI chips. And at at that point.
01:18:03
Speaker
we might get a tapering of the rate at which we're doing scaling. And so that might slow down the rate at which we're seeing advances and and other other things. Now, it might be possible to accelerate the pace at which we're scaling fabs, building new fabs and scaling kind of our capacity to do ah bigger training runs. And this becomes ah more likely in a world in which The kind of margins for AI are very large. You are able to train this model and charge us a lot of money and generate a lot of value.
01:18:39
Speaker
At that point, it might be worth it to ah pay this very large premium on making things move a lot faster and like but ah spend money on these kind of advanced market commitments to get TSMC to build as many fabs as they can as quickly as they can and like absorb the risks that they face so that they're happy.
01:19:01
Speaker
and and maybe then you're able to kind of get this going much faster. And and if if that's possible, then then maybe this kind of pace of automation might be quite a bit higher. And the the timing of when we might achieve some of these milestones might be much sooner if it turns out that we can ah kind of accelerate this process.
01:19:22
Speaker
And what are the main limiting factors to expanding a compute? is it Is it mostly on the political side? We could talk about regulation or geopolitics, or is it mostly on the more technical side? Say limits to how you can manufacture these chips or how ah you can build the buildings to to host the the the data centers?
01:19:43
Speaker
I think it's all of these things. ah it's It's like this kind of complicated supply chain where there are a lot of things in the physical world which are heavily regulated. and So on the energy side, this kind of regulation is a bit of an issue. There are just difficulties in procuring the specialized equipment and it just takes a long time You know to produce the world's flattest mirrors that need to go into the lithography machines and there are many kind of companies involved. There's this kind of very long kind of vertical.
01:20:21
Speaker
where it takes time to propagate these signals through them, that if you want to do the scaling, you know, OpenAI has to convince Microsoft, Microsoft has to convince USMC, USMC has to order things from ASML, ASML has to order things from Zeiss, in order to propagate these signals that just kind of take some time. And at each stage, there's, you know, there's some,
01:20:48
Speaker
dependence on some very specialized talent or specialized input. And and it's just kind of difficult and expensive to make that move faster. and so So I think there's a combination of kind of regulatory and just like supply chain things that make it hard to to do this faster. Now, I recall reading some quotes from an ASML manager who said something like, with enough money, everything becomes kind of flexible and fast. And so I do think that yeah know money is this kind of very important driver of the pace. And so I think if you have these much higher margins,
01:21:27
Speaker
If you're able to say, you know, we'll we'll spend just like 10 times cost on making this, making this happen or whatever, then I think that is not something that is its currently done. And so that might push the needle and making things move a little faster. And, you know, I think once you have the software, once you have like really good AI architecture and training data and so on. And you have this model, you have these weights. And the weights, you know, you only need to run them on the hardware and you can generate human wages. ah You know, there are human levels, you can generate, you know, $100,000 a year, ah maybe more. And all you need is just like the chip to run it. And then you can spend a lot of money way more than we do today on procuring those chips.
01:22:14
Speaker
Yeah, although ah so the the valuations of of some of these companies that you just mentioned like TSMC, Nvidia, ASML are already kind of sky high. So they've already been giving ah ah given a bunch of money. And the question is, who can provide them with but who can who can afford to overpay for for for for getting more chips faster?
01:22:37
Speaker
So again, and I described the state of like our current capital investment, which is 20% of GDP. Right now, we're spending hundreds of billions of dollars on this like compute-related capital, which is orders of magnitude smaller than we are spending on but conventional capital. And so there's this this large room for expanding the amount that we're spending on this.
01:23:02
Speaker
And then the other thing is like valuations just reflect kind of expectations of ah future cash flows. And so current spending is not something that is kind of on par with those valuations. Like we're not spending that much. It's true that the valuations are quite high, but we could spend just like a lot more. And and in fact, investors do expect this spending to escalate, which is why their values are high.
01:23:26
Speaker
so You mentioned energy and how this might might be a bottleneck. Which forms of energy do you think will be used for for large training runs in 2030? Are we talking about nuclear energy or solar energy? And related to that, will i say Microsoft and Google be able to do training runs using ah fossil fuels, given that they've signed these kind of climate pledges?
01:23:53
Speaker
One kind of interesting issue was with energy or power is that for training right now, the the way that training usually happens is that a single kind of campus or data center does a training run. And then you can serve this, you can inference a model across many different data centers. And so the the power demands for training are very highly concentrated.
01:24:19
Speaker
And this makes it run into these bottlenecks much more quickly because there's only so much power you can get in Northern Virginia at a specific data center. And so that's why there's this kind of struggle and companies are figuring out, you know, can we get contracts with solar farms and nuclear power plants? And there's this kind of three mile island that's, there's plans for reopening it.
01:24:44
Speaker
for powering data centers. One thing you can do is you can geographically distribute your training. And you can distribute your training where you involve a data center in Northern Virginia, a data center in Pennsylvania, and in many other places. And then you can effectively kind of distribute that the the power that you need in order to train a model and tap into more energy sources.
01:25:11
Speaker
And you know in this report about can we scale through 2030, we kind of worked out that yes, indeed you can do this geographic distribution of your training room. Now it requires running a lot of fiber optic cables, but you know that's something AT&T and other people kind of do and you know we have fiber optic cable cables across kind of oceans and so you know running fiber optic cables is certainly something that is feasible to do and that already kind of reduces that that kind of gets you it kind of gets you more breathing room and enables you to tap into more energy infrastructure and so you know even without building a lot of energy infrastructure I suspect that you can already do quite a bit of scaling
01:25:52
Speaker
a model that's 10,000 times more greater than GPT-4 that is kind of on trend for 2030, that's about 5 gigawatts or something. In the US, the total installed capacity is about 1,000 gigawatts. This is 0.5%.
01:26:08
Speaker
quite a lot, but it in some in some sense, it's if it's like the most important industrial technology or whatever, then then I think this seems well justified. In order to get much beyond that, you do start to want to kind of expand your power production. And and I think things like natural gas and and solar and and and other things all seem kind of reasonable. i do I do think this like climate pledges, these climate pledges are are a little tricky. And so maybe there's this regret that these companies might have for citing them because they didn't quite anticipate that they would have this much demand for the energy for for for running these you know training runs. And so you know I think it's possible that they weasel themselves out or they just kind of tried to scale up a nuclear and and solar and other things as much as possible. but Natural gas seems like
01:27:01
Speaker
something that would be ah really quite, like maybe maybe easier ah to do and has as fewer issues around intermittency and so on. And so maybe scaling that up is something we we we can expect to see. And I know some people who are kind of more excited about natural ah gas specifically. So ah like Leopold Usher burner is more excited about natural gas as ah as a way of powering this. But there is this kind of climate commitment of saying that kind of binds these companies at least a little bit. There may be ways of whistling out of that. All right. As a final question,
01:27:37
Speaker
We've been talking quite casually about perhaps getting to and models that are competitive with human labor within the next say six six years. How do you think, and this is a broad question of course, how do you think society will handle something like that?
01:27:53
Speaker
well One thing to say is that you know I'm i'm i really quite uncertain about both the question of whether you know in the next five or ten years we will get such systems that are broadly able to substitute for for for for humans. I think have a fair amount of uncertainty about that, so I don't i don't think this is I'm something that's um sure to happen. I'm also quite uncertain about well what the precise response, like if this were to happen, what exactly, you know, the effects on society would be. I have some guesses about what happens to the kind of wages. I think there will be
01:28:28
Speaker
In some cases, much higher wages, and then there will be a lot of displacement. I think there are these costs associated with this transition and labor displacement. There's also a lot of value that's generated. like AI just has this ability to boost our wealth and technology, and that is just extremely valuable.
01:28:54
Speaker
And I suspect people will, like if if and when we get much more capable AI systems that generate a lot of economic value, I think people will get a lot of ah you really excited about some of this. you know the The medical innovations that you might get are really quite tremendous, and so there will be kind of a lot of pressure to like adopt and build some of these technologies.
01:29:16
Speaker
you know i I also think that like yeah people will be quite concerned about this displacement of human labor, which potentially creates unemployment. That's not always the case. You can find jobs elsewhere and new new tasks are generated. And so there's going to be a lot of upheaval and movement and people will have a lot of concerns about this. There's also this concern around sort of disempowerment of humans and humans having kind of less control and oversight and things like this. And you know I think some of those concerns are also valid, and I suspect that there will be this kind of societal response that is driven by some of these concerns around empowerment, control, oversight. I think it's worth thinking a lot more about precisely how this is going to be handled
01:30:04
Speaker
Which is kind of precisely why I'm kind of working on this and why you know we found a deep walk to improve and inform these discussions and the kind of policy that ends up responding to yeah what's happening in AI will will continue happening and and the effects that this will have. Tamay, thanks for talking with me. It's been really interesting. Great to chat.