Introduction and Guest Welcome
00:00:00
Speaker
All right, welcome to the Future of Life Institute podcast. I'm here with Leonard Heim. Leonard, could you please introduce yourself?
Leonard Heim's Background and GOVAI Mission
00:00:08
Speaker
Sure, yeah, thanks for having me. As just said, my name is Leonard Heim. I'm a researcher at the Center for the Governance of AI, in short, GOVAI. That's what we usually say. GOVAI's mission is like roughly we're trying to build like a global research community where we're trying to help humanity to navigate this transition to advanced AI, I think the thing which might have seen over the last few weeks.
00:00:26
Speaker
And what I'm mostly doing, I'm working on this research stream, which we call compute
Research Focus: AI Governance and Compute
00:00:30
Speaker
governance. So I'm thinking about computational infrastructure. I'm particularly interested just like, is compute a promising node of AI governance? What are the sub-nodes of compute we can use to achieve beneficial AI outcomes? What are hardware-enabled mechanisms we can use to support these regimes? So everything compute in general, but over time, this became more narrow. My background is in hardware engineering. So I studied computer engineering in school.
00:00:52
Speaker
spend a lot of time actually figuring out how computers work. And now I'm trying to build on top of this knowledge, trying to use compute to steer towards more beneficial AI outcomes. Fantastic. All right. So what I imagine I was talking about here is how we can forecast AI progress specifically by looking at compute.
Understanding the AI Triad: Algorithms, Data, Compute
00:01:10
Speaker
But perhaps, before we get there, we should probably introduce the key factors that's driving AI progress, which you call the AI triad. What is the AI triad and what are the factors involved here?
00:01:23
Speaker
AI triad 3 factors, I think another way how to think about this, I think is which sometimes useful concept is we can also call it the AI production function. It's a function where we have certain inputs and we have certain outputs and questions like what are these inputs. And one way how to think about the inputs is like to split it up into three components which we describe as algorithms, data and compute.
00:01:42
Speaker
What do we mean with algorithms? Well, when we think about AI nowadays, we mostly talk about machine learning. Within machine learning, we talk about deep learning. And even within there, you can go too deeper to the specifics of these algorithms, be it to transform architectures, how do you train these systems, all these kinds of things. They're all important for the image output.
00:01:59
Speaker
The next thing is the data, right? Those systems, as we do machine learning, we have some kind of learning, so we give them some data on which they should learn on.
The Role of Compute in AI Development
00:02:07
Speaker
This is nowadays like big data sets, be it text, be it images, whatever we're training the systems on, which is another input, right? And lastly, then we have the factor which actually enables all of this, which is compute or computational infrastructure. This is eventually the system, this physical infrastructure we need to then train these systems and also later like to execute them.
00:02:28
Speaker
Um, sometimes people also think about talent as another factor there. I usually would describe it as a secondary input, you know, talent helps with better algorithms, maybe crying more data, maybe making compute better. But like, if you think about like, yeah, those three are like fundamental to like break them down and think about like.
00:02:44
Speaker
How much do they matter for AI progress? What is the output there and how have they developed over time? Does it make sense to separate these factors and talk about how much each factor contributes to AI progress? Or is it simply so interrelated that it doesn't make sense to separate them out like that?
00:03:01
Speaker
Yeah, that's definitely an ongoing question. I think it makes sense trying to separate them, right? I'm like, I'm claiming I do compute governance. I think maybe in the future people will be claiming they do algorithm governance or data governance. And again, there are always downsides to put things into boxes, but sometimes they're like some apps like having these kinds of models.
00:03:16
Speaker
Yeah, if we try to separate them, I think what I've been interested in is just like, well, what was the historic role over time? And I think in general, we can say they all trade off with each other, right? If I want to build an AI system, I can just like try to spend a lot of times figuring out the better algorithms, getting more data, or just like throwing more compute at the problem and eventually will turn out to be like a better AI system with better capabilities.
00:03:37
Speaker
What we've seen over time with data is we just acquire more over time. So roughly doubled, I think it was every 1.4 years. So the amount of data we used. I think this 1.4 years was not for text data. This again would look different for image data, for voice models, but I think text data is the thing we're currently thinking about when we talk about large language models.
00:03:57
Speaker
Whereas in contrast, if you think about compute, and what I mean here is like the compute used for training these systems, right? And when I say how much compute we use for training the systems, I refer to how many floating point operations for you needed to train the system that it's eventually finished and it can be run. We did an analysis on this and we roughly saw that the training compute is doubling every six months. So this is like faster progress, like faster doubling compared to data.
00:04:23
Speaker
And how has this doubling rate evolved over time? So has compute always been doubling every six months or has the rate of doubling also increased over time?
00:04:34
Speaker
Yeah, when we did this investigation on compute, the first ones to do it was actually an old man. They had a blog post called an compute. They did this, I think, in 2018, 19, something along these lines. And what they found, well, this AI training compute has been doubling every 3.4 months. So twice as fast as I just said. So we did rerun the same analysis with just way more ML systems at the beginning of 2022, added some more new systems like, well, if we now look at the trend, it's actually just doubling every six months.
00:05:03
Speaker
And this is partially, they picked back a cutoff point where just like new systems, like pretty compute intensive systems like AlphaGo and these kinds of system came out, which yeah, we're like really at the top of this trend and like skewed this whole trend upwards. I think if we would like look at the data right now, which everybody can do, if you just go to epoc.ai.org, I think the doubling time would sit somewhere point 6.5 months or seven months right now. We don't, haven't added GPT-4 yet because nobody told us yet how much computers is. If we could add GPT-4, we might have better insights there.
00:05:33
Speaker
Okay, so if we were to put some numbers on it, how much of AI progress would you say is attributable to compute? Yeah, I don't feel like I would like to put numbers on it. Maybe we can describe it with like what Richard Sutton calls this bitter lesson, where he was saying like, oh, people have been trying to develop new fancy algorithms trying to learn from the brain. But what we've just historically seen over the last year is basically we just have like,
00:05:58
Speaker
search algorithms and we just throw more compute at them, right? And like, of course, I mean, you're talking to a guy who claims he's doing compute governance. So I think it's like an important note. Am I not saying like, oh, 50% of the progress of AI is due to compute or it's 90? I can't tell. There might be more analysis on the future. That's up for the economists. I'm saying, seems like an important note.
Complexities in Chip Production and Supply Chains
00:06:19
Speaker
I think it has some unique properties independent of how important it is for AI. It has some unique properties, which we can use to achieve better and beneficial AI outcomes.
00:06:27
Speaker
I think it's important for us to talk a little bit about how modern advanced chips are produced. Perhaps you could talk about where are they produced, how difficult it is to produce these chips, who are the key players involved, and so on.
00:06:43
Speaker
Yeah, I think the disclaimer is here. I think chips or integrated circuits are probably the most complex device, product, whatever we as humankind have ever produced. So whatever I say here, take a grain of salt, I'm like trying to expand on like a really high abstraction layer. This is basically an effort of humankind that we have these kinds of chips. It's a global supply chain.
00:07:03
Speaker
What is one way of thinking about it? I think it's like useful to think about like free processors. We can say that's like the design phase. We just think about like, well, how do we actually, we have all of these transistors. We put them somewhere. How do we, how do we put them that it actually does some math, right? I think that's like abstractionary, which most people actually do not get. Like us talking right now here, everybody having a smartphone just relies on a device, which is like switches and we have trillions or billions on them on a chip.
00:07:27
Speaker
So that's the design phase. Who are people who design these chips? Apple, for example, is designing these chips. They're just like, hey, here's our new M1, A17, whatever they put in their MacBooks or iPhones.
00:07:36
Speaker
Once they've designed such a chip, this is just like some piece of code, like eventually it goes on to like how this chip eventually looks like. We need to fabricate this chip, right? This is where we then edge this chip. Years of history there, how this has been done. I think the important actors to know here is like TSMC, Samsung and Intel. Those are the ones which are leading the cutting edge chip production and
00:07:58
Speaker
An important number of factors there. There's the company called ASML, which people might have heard about. It's this obscure company sitting in the Netherlands, which sells the machines to TSMC, Intel, and Samsung, which then produces these chips. Then we get our chips out of it. And the last thing we need to do is we need to assemble this chip. We test and package them. This is sometimes done at another provider or it's directly done at the fab.
00:08:20
Speaker
So the important thing is like Apple sells these shirts, but they're not actually producing them. They just think about how to design them and then you send it off elsewhere. It's kind of, if you guys think about like, oh, I'm printing a t-shirt for my local, I don't know, football club, you're not going to produce it. You're going to send it up to this other person who's like producing a bunch of t-shirts and then I print your t-shirt with your local thing. That's the same as Apple is doing it. And then eventually you get it back and then you sell these. So like, yeah, the three steps.
00:08:46
Speaker
designing, fabrication, and lastly assembly testing and manufacturing. That's the chip supply chain. And what about certain bottlenecks in these supply chains? What would happen, for example, if ASML ceased to exist or the TSMC ceased to exist? How much depends on specific companies here? As I just said, it's a really complex product. It's the effort over all of
00:09:10
Speaker
All of us, all of the whole world coming together. So it's a global supply chain, right? I think there's explicit examples of ASML and TSMC are interesting. So if we look at ASML, ASML is the only company in the world which is producing these EUV machines, which are used for producing cutting edge chips.
00:09:26
Speaker
So if ASML is used to exist, I think there are going to be like a big shortage and like we're going to take a probably going to hit a recession or something along these lines. Right. I mean, we can continue using the machines that currently chip or we want to keep on going. Right. That's like the history of computing. Just keep on pushing, making this chip smaller, smaller and smaller over time. ASML is the strongest case where we literally just have one company who's producing these UV machines. Right. But even if you then move to the fabrication, maybe have ASML, Intel and Samsung for cutting edge chips.
00:09:53
Speaker
TSMC makes like roughly 70 to 80% of the whole revenue in this domain. So TSMC is like the really important actor there, right, which is producing all of these chips, like all of our iPhones, and MacBooks, and whatever kinds of chips we are using. This might look different for chips which are sitting in your, I don't know, in your dishwasher or in your car. These are usually like older note chips, which I use, so they wouldn't be hit by ASMR directly, but I mean, ASMR starts producing the older chips.
00:10:20
Speaker
So we have what people call bottlenecks or as other people call it also chalk points, right? Which I guess we're going to be talking about later at some point, which you can then use to achieve.
00:10:30
Speaker
Certain golds. Yeah. I think that's like roughly the bottlenecks and just like, it's just really complicated to use it. It just costs a lot of money. There's a limited number of paths and it's just really, really, really hard to use these kinds of chips. So the stories I've heard about, yeah, how hard this would use the chips and which kind of like obscure things happened, which would use the yields, like how many good chips you get out of this. Yeah. It's crazy.
00:10:53
Speaker
how far are the nearest competitors behind the very cutting edge companies here? I like Intel might be like the interesting case to look at, right? Like who was once leading this and then got overtaken by TSMC. Intel was this company who designed the chips and also produced them and also packaged them, right? They were like doing this across the whole thing. Whereas like at some point AMD came along to something like revolutionized. It was like, hey, we actually just designed the chips and somebody else is going to be producing them.
00:11:19
Speaker
How much behind are there? I mean, I just said like a TSMC makes up the majority of the share of these cutting edge chips and TSMC is just leading the field there. There are some forecasting questions and it looks like this TSMC and Samsung are going to be achieving three nanometers this year and will start mass producing this. Intel probably won't achieve this, but probably next year. So I guess Intel is like one year behind, but that just describes
00:11:42
Speaker
the node, the transistor size or like the fabrication process, it does not describe how much they actually produce. I think there's still a big difference where like TSMC is just like way more fabs to just produce way more. For ASML I guess it's more trickier. I guess a bunch of the other competitors just kind of gave up and they're just hoping like well we can might not be able to produce EV machines but maybe we can make the next thing after the EV machines. I haven't seen good analysis there like how far the other ones are behind
00:12:08
Speaker
I guess it's just fair to say it's just really, really hard. And we have like certain countries who like trying really, really hard right now to produce these kinds of machines to produce the chips, but also like the fabs which produce the chips.
Forecasting AI Progress and AGI Timelines
00:12:18
Speaker
And I guess it will be hard for them up to maybe impossible if there's like not new paradigms coming along.
00:12:24
Speaker
Is there an interesting difference between computer chips in general and the cutting edge computer chips used for specifically AI purposes and in terms of production and supply chain? Or is it simply the same companies leading both chips in general and AI specific chips?
00:12:44
Speaker
I think it's useful to think again about our three steps. We have the design, we have the fabrication, and the packaging. The design companies are different for AI chips, at least if we think about which AI chips I mostly think about when I think about AI governance. So I think about AI chips ending up in data centers, AI accelerators, mostly produced by NVIDIA with, for example, the A100, H100 are names referred there, but also Google with their TPUs, their tensor processing units, which they only keep in-house.
00:13:12
Speaker
So they are designing them, right? They're some like equivalent to Apple. Apple also has like AI co-processors on their smartphone chip or on their laptop. But those are not the chips which I'm talking about when we talk about like training large scale systems. When we talk about training GPT-4, you're not using your smartphone, you're not using your computer there. There's a different use case. So the design phase is different, but actually fabrication phase is the same. They both send it off to eventually to TSMC or any other and they're going to be then producing it for you.
00:13:39
Speaker
And the same goes for the packaging process. So if we talk about cutting edge chips, we talk about chips like yep, they have like the three nanometer chips, they're sitting in your smartphone, they're sitting in your laptop, but also sitting in like these AI accelerators, for example, GPUs, which then go to the data centers. The difference is design fast there, right? The chips are like eventually turn out to be physically different. They implement certain functions. For example, like a smartphone is still a
00:14:04
Speaker
general processing unit, a CPU, right? It can do a bunch of stuff. Whereas for AI accelerators, we move along the spectrum more towards more specialized chip, right? It's a specialized chip, which is really, really good at solving parallel problems. What we've learned is that chip production is extremely advanced and extremely complex. And we've learned that it plays a key role in driving AI progress. So let's talk about what we can learn if we try to extrapolate progress in compute and
00:14:33
Speaker
find out what this can tell us about progress in AI in general. This is what you might refer to as AI timeline forecasting. Just thinking about, based on what we know about compute, about data, about algorithms,
00:14:48
Speaker
When might we expect, for example, artificial general intelligence to arise? What can we say specifically based on thinking about compute here? So it's like a question which many people are thinking about, just like, well, when is this artificial general intelligence, transformative intelligence, human level, whatever you name it, right? And I think that we already go into the details. I think it's really important for people to actually say what they mean with these kinds of terms, and not just say like, oh, my AGI timelines are X, Y, Z.
00:15:17
Speaker
independent of that, some reports are trying to operationalize this and trying to forecast this and figure out when this has happened. And compute just plays an important role there. I just tried to describe the AI production function. I was like, well, we have these inputs. Then we have this AI production function, and we get certain outputs. And now I'm asking, when is this output in AGI? What does this mean about my inputs? And I think there are two famous apprentice ways of trying to do about it. I think one of them is actually from Agiocultural, which is this biological anchors report, which was, I think, on this podcast here before.
00:15:46
Speaker
And this is actually also something like a compute-centric framework where she's actively asking us like, well, if I would need to rerun evolution, how much compute is this? How much floating point operations is this? If I need to rerun the childhood development, how much compute is this? So we have like these different compute milestones, right?
00:16:03
Speaker
what she then tries to do as the next step is like we have this compute milestones we're like well if I have a deep learning model which roughly uses as much compute as these milestones it might have similar capabilities right and then she actually tries to forecast like compute available in a given year and then you try to figure out like well what is the price performance of compute this is like roughly look into most law right how many chips will be produced how many flops can I calculate
00:16:26
Speaker
And then you also look into how much you're actually willing to spend, right? It's an important question where it's like, well, do I actually want to spend that much money on doing this one training run to achieve a certain milestone? And the last thing is, you need to adapt this to some type of algorithmic efficiency, right? Over time, you achieve
00:16:43
Speaker
you need less compute to achieve the same capabilities. So you discount it over time, and this is what is described as algorithmic efficiency. That's one way of thinking about it, and that's one way how compute and compute forecasting feeds directly into trying to forecast transformative AI. Given that people use different definitions of, for example, human level AI or transformative AI or artificial general intelligence,
00:17:08
Speaker
Is there something useful we can gain from aggregating predictions across a number of reports? Say that these reports use slightly different definitions of what we're trying to predict. Will this introduce too much noise for us to say something useful? I think it's still useful to just maybe look at them and just see what
00:17:28
Speaker
Different forecast thing we should like like trying to think about are they actually trying to forecast the same and what are the different biases and play there? I think it's like hard to just throw together a survey.
00:17:39
Speaker
with this bio anchors report, right? But if you have like two empirical approaches, be it the bio anchors report and like maybe like another approach, I think it's fair to just like, like look at them, right? To something where you do a survey of all the different methods and then you can weigh it to your own needs. I think what I'm more interested in is like, actually, well, it's cool to have like these timelines, you know, when is this thing gonna happen or something? I'm excited about like the intermediate inputs to these kinds of systems, right? If somebody is just like thinking about AI timelines and that's on the way to think about AI timelines,
00:18:09
Speaker
they figure out what is the growth of compute this is important right just for the things we've just discussed for example there's a guy like me who looks at the growth computes like well this looks interesting maybe you can do compute governance maybe can use this node to eventually achieve something so you figure out better timelines but you have like also intermediate outputs which are useful and this is part of the big reason why i'm excited by the attendance researchers like there's a bunch of intermediate inputs
00:18:32
Speaker
to these kinds of models, which are useful. And again, for empirical models, like the bio anchors report, they are way more of this intermediate input compared to a survey. In a survey, I'm like, all I have is like maybe the qualitative entry from like some researcher, but they tend to be vague. And most of the times you just get a number. So I don't know what credence to put on this. So in the process of trying to forecast transformative AI, we might learn something that will turn out to be super useful for us.
00:18:56
Speaker
Let's get your take on this whole issue. Based on all the reports you've read, based on your deep dives into compute, how would you think about when we might expect artificial general intelligence or transformative AI?
00:19:11
Speaker
For what it's worth, I don't think there's something special about just Leonard telling his timelines or his numbers or something. I think it really depends. More like the intermediate inputs are more interesting. Yeah, how would I just collect my views on this? I do think there is a significant chance that AI turns out to be a big deal. You can define it as transformative AI as AGI or something. I do think I have a 50% chance, within the next 20 years or so, there might be something what we might call an AGI or transformative AI. What do I mean by this?
00:19:41
Speaker
Well, maybe we can measure it on benchmarks. There's like this famous MMLU benchmarks, like, yeah, there's something which like scores like 95% on this.
00:19:49
Speaker
Maybe the system would also pass a really long Turing test, but somebody's really drilling down on these systems. And maybe the system also wins a math Olympiad. And also the system is able to control a robot, which takes care of my dishwasher, so I think it's a big example. These are ways that we try to operationalize this, for example, with an epoch if we try to think about forecasting or timelines.
Cognitive AI vs Robotics: Challenges and Progress
00:20:11
Speaker
If I try to take this visualization, I'm like, yeah.
00:20:13
Speaker
I guess within the next 20 years, there's like 50% of what's happening. And then I have a really long tail, right? If it's like not happening soon, I was like, well, I don't know, maybe there's like some magic sauce to this. So it just really, yeah, it just will take a longer time. We need to reinvent like the whole thing again, to eventually to get to AGI.
00:20:32
Speaker
Do you think robotics is in the same category as more cognitive labor performed by AI? It seems to me, just looking at the landscape, that we have amazing developments in what you could call cognitive tasks, but less progress in robotics. So if we define AGI or transformative AI to include robotics, that might significantly delay our prediction of when it will arrive.
00:20:57
Speaker
It's actually part of the key thing. I was like, man, maybe we have those really smart systems that can do all of these things, but it just got really bad hands. And me as active electrical engineer who did some robotics, I was like, it's really disappointing how slow progress is there. It's still progress, don't get me wrong. I think everybody here watches these Boston Dynamics videos. You see those robots doing crazy stuff.
00:21:22
Speaker
But there's just still moving physical things. Our hands are pretty, pretty good on a bunch of stuff. And it turns out this is really, really hard to get this on a computer. So I'm pretty confident there might be right now a system which knows how to clean up my dishwasher and it might know how to do it really efficiently. But just the sensitivity or the logistics of moving your hands fast enough and precise enough is just going to be a really hard thing.
00:21:45
Speaker
I actually think like robotics is like a thing which people just sometimes anchor on too much where they just think about like, oh, AI is only dangerous if it if it can move or has legs or something was like, no, actually, our life is like so digital anyways, most of our critical infrastructure is controlled by computing systems, you don't need hands to do a bunch of stuff, right? Us having this conversation right now has nothing to do with us having physical hands. This could happen completely by us being a simulator in this whole conversation being made up.
00:22:11
Speaker
True, true. Okay, so how much do you think robotics depends on compute? Is there a compute bottleneck that's holding robotics back? Or is it simply, or not simply, but is it more about designing accurate hands or accurate sensors or something else?
00:22:29
Speaker
Yeah, I think it's mostly about the letter. So funny enough, if I say today's systems use more and more compute, you also have the problem these systems are big, deep learning, big systems, billions of parameters. If you want to run them locally, you sometimes can't because they're just too big, they're too computationally intensive.
00:22:47
Speaker
So if I just put a GPT-4 in a Boston Dynamics Robotor, this thing is going to be out of battery kind of quickly, because you just need four A100s to run the system at a speed which is good enough. So this is what we've seen from a paper from Google, where it's like, yeah, we had this really small model because, yeah, we couldn't use a bigger model. And I guess most of the stuff still boils down to just how good are robotics hands and these kinds of stuff. You would be good enough at controlling it, but you're simply not sensitive enough and precise enough with these hands.
00:23:15
Speaker
I do expect AI can help on this. Just AI can accelerate research. There are some researchers right now how to make better robotic hands, how to build better robots in general. They now use AI tools to summarize their research or maybe the AI tools to bring some new ideas and maybe new ideas come along. Maybe in the future, we even have AI systems which have like completely new ideas, how we will be building robots, right? Looks like evolution figured it out. Looks like evolution figured out good movement way earlier than good brains, right? We could walk before we, I mean, probably not walking, but at least crawl and like,
00:23:45
Speaker
jump from two to three before we could actually think really well. So maybe we have some reason to believe there's actually not such a hard problem. I think I'm generally a bit confused why we haven't figured it out yet. Maybe we have an option of simulating physical environments and learning how to do robotics in simulation before deploying them. And that, I think, would make our robotics progress more dependent on compute.
00:24:10
Speaker
So if we could run huge simulations, these simulations would be extremely computationally intensive. But that is a possible path forward. I don't know whether that actually gets us all the way there to real world robotic interaction. What do you think about this? Do you have any insight here? It seems right. Definitely can help with this. I think we generally have to from like this sim to real, just like, yeah, like our simulations are complex, but you've seen reality with even more complex.
00:24:37
Speaker
And I guess this will just continue being like some kind of barrier. And like, yeah, I'm definitely excited of people trying to think more about this. Like, how would this actually translate? You know, if I have this robot moving in my simulation, can I actually also move in the world, right? It might be easier for your Roomba, you know, who's like driving around and taking it only as like a 2D space. But like a walking robot with legs seems way harder.
00:24:59
Speaker
Yeah, okay, so we talked about the AI triad of data algorithms and compute. And perhaps one useful thing here is to talk about how these can be traded off against each other. Because this might tell us something about how important compute is. So what do you think, for example, if we had the perfect data set, how much would this mean in terms of AI progress? Combine the perfect data set with the compute and algorithms available today?
00:25:28
Speaker
Yeah, I would love to know the answer to this. Just like, what is the perfect data set? One thing we've, yeah, like ran into when we like try to collect just like, what are the data trends? How much have we've been using? It's like, what actually different dimensions of data, right? I can measure data is like how many data points, how many tokens, how many pictures, how many gigabyte, but we all know it has a different quality in data, right? And now we get into the tricky things like, well, how do we measure quality, right? On these types of data.
00:25:56
Speaker
One thing you can see, just like the data quality matters, if you, for example, I think it was the Chinchilla paper, which is a paper by DeepMind. So you have a big data of text, right? And then you train the system on its text and usually it only gets to see that the text only wants, it only has like one epoch per text. But interesting enough, for example, for given Chinchilla that it needed a lot of data because of all new scaling laws, they actually did run twice about the Wikipedia text.
00:26:20
Speaker
I was like, well, Wikipedia has really great text. It's really high quality data. It's maybe closer to the truth than just Reddit or a bunch of other stuff on the internet. So let's train the system twice on this, because it's better. And we have good reasons to believe, if we look at humans, that actually data efficiency can be way better. I'm claiming I'm more data efficient than GPT-4, at least for the moment. I don't need to. I mean, I haven't read as much. And some things I can maybe outpace GPT-4.
00:26:45
Speaker
So there's lots to gain there. And if you use less data, this also means we have like smaller training runs to some degree, right? Unless we just show the data way more time. So like it's not clear how directly translate, but in general means like more high quality data probably is useful, probably better than low quality data. I don't know how much better and less data means also less compute for these kinds of systems. So I think there's definitely progress to be made over time. And I think this is also partially where we've seen where we've seen progress over time.
00:27:14
Speaker
if you just had better data when more data became available. My impression is that we are actually using something close to all the data available online to train the biggest models. Is that also your impression? Do you think that's true? That we are reaching limits of how much data we have available online?
00:27:32
Speaker
Yeah, I think that's more data. So we try to look into this and try to be like, well, how much more text data is out there? And you start becoming a bit creative. Like, oh, we don't produce text data right now. We produce voice data. But guess what? We can run another AI system on this to produce more text data.
00:27:49
Speaker
Cool. I don't think this podcast is the thing that should, I mean, maybe that's the thing that should be training on. I don't know. But there is YouTube out there. Let's just take all the YouTube videos in the world. Let's transcribe them. Here we go. We have way more text. Is this high quality data? Probably not. Most of it is just like, probably not that good. Some of it might be better, right? Like some pretty good podcasts out there.
00:28:08
Speaker
So there's like more data to be acquired and we try to forecast is like how much videos are getting produced? If we'd like transcribe then how much my data do we get out of this? And then we were like figuring like looking at it's like, well, it looks like at some point we might run off data. Just not for the simple reason. There is lots of it out there. Don't get me wrong, but we also just using lots of it.
00:28:27
Speaker
And we try to predict this time of like scaling loss, which is like this thing which describes where we have a network of this size, how much data do we need and how much compute do we need to train it like pretty much optimal. And it's unclear if these scaling laws can continue to the scale, which we discovered there. But it's definitely a thing where you might lack high quality data. And I think right now, I think reinforcement learning from human feedback is already such an example. It's like, actually quality matters. You know, like this human feedback is like actually way more useful than training you're on like a bunch of scrappy data on the internet.
00:28:57
Speaker
I think I'm more interested in just like, there's more data out there, but you want to get high-quality data. How do you get more high-quality data for the things you actually care about, for the things you actually want to do? When I've played around with GPT-4, for example, some of the output text is very clever.
AI-generated Data and Algorithm Efficiency
00:29:15
Speaker
We might be tempted to use that output text to train new language models.
00:29:20
Speaker
This sounds like some kind of Ponzi scheme or some kind of magic where we make up data out of thin air. Do you think that would work? Do you think output from previous language models would be interesting as training data for new language models?
00:29:33
Speaker
I don't know if it will work. I think my guess just like somebody will like figure this out because incentives are definitely there because here we go. This is just like the easiest way for us to like acquire more data. And I guess companies will look into this. I'm personally probably not excited about it. I was like, AI systems feeding into AI systems. I don't know. I think this is like how failure could look like. You know, you just have like these loops of AI systems. We have like barely an idea where it did go wrong, right? And we have like this two layered black boxes, well, congrats. Now we make it even harder to understand everything.
00:30:03
Speaker
So I guess we'll definitely go for this. Yeah, trying to use this sympathetic data to something you just like trying to make it up. But also just like maybe have like, I mean, I think maybe an example is like actually this this llama model, which came out of matter and then Stanford made a pack out of it, right. And I think if I understand it correctly, correctly, they like fine tune it based on feedback from chat GPT.
00:30:23
Speaker
Here we go. Already an example. You don't need human labor anymore. You can skip your Amazon Terracors. You just use chat GPT. It's way cheaper. Turns out it was actually pretty good. And this model is performing really well. So I expected at least for fine tuning, it will be useful. But I think we should really be cautious there. And just again, black boxes, feeding into black boxes, I think is not an ideal scheme for my lens who wants to understand these systems and make sure they actually, yeah, well understood.
00:30:47
Speaker
What about trading off algorithmic progress against data or compute? So imagine if we had optimal algorithms, how much would this matter? For example, I saw DeepMind recently made an advance in algorithmic matrix multiplication, I believe, which is kind of a core process of machine learning. How much does this matter? Could this make, say, compute less relevant because algorithms would be so efficient that they would need less compute to run?
00:31:14
Speaker
There's definitely history. I think it's just like, I mean, if we go back to the ITRI, algorithms is one thing. And the question is like, well, how have algorithmic efficiency developed over time? The problem is just like, it's really hard to measure. And the best ways of measuring it is right now we look at the benchmark, which we use in this case. People used ImageNet and looked at it and it's like, well, how much compute do I need to achieve the capability X on this benchmark over time? And what we historically seen there
00:31:39
Speaker
The compute used is halving, I think, every nine to one year. So in a year from now, I can achieve the same capability right now for half of the compute of the previous system, which achieves this capability. That's a big deal. But again, compute trends have been a bit faster. Because eventually, it's not about achieving capability X. It's about you want to keep on pushing a frontier.
00:32:01
Speaker
I think just like every percentage point of accuracy, every percentage point of like better capabilities is actually useful because we know entering this new era where we're actually trying to make money out of these systems. And sometimes single digit percent points just matter. It's just the difference between we all have autonomous cars or we don't have any autonomous cars right now. But it's like, yep, we have like really high standards for these kinds of systems.
00:32:21
Speaker
So, algorithmic efficiency definitely reduces the compute you use. The question is like, where do we continue pushing our AI systems? And historically, we've seen just bigger and bigger systems, and then you have better gains due to algorithmic efficiency, but also due to because we just throw more compute at the problem and more data.
Compute Growth Trends and Future Technologies
00:32:37
Speaker
OK, you have this graph with different errors of compute usage in machine learning systems. We're starting with the pre-deep learning error, then the deep learning, and then large-scale systems. Try to describe this graph for us. What's the key lessons from this?
00:32:55
Speaker
Yeah, so this is coming from our papers of three errors of compute usage and machine learning or something along these lines. So the key thing is what we looked at is we looked at training compute, which we've just discussed, and tried to figure out how this has been developing over time. And before over time, we mean, I think we started in 1958 or something, when the first advantage of AI has been happening.
00:33:14
Speaker
And this is what we call the pre-deep learning area. The deep learning area roughly emerged in 2010 to 2012. And within this area, we roughly see the training compute is doubling every two years. This reminds us of another law, which is called Moore's Law.
00:33:29
Speaker
Moore's law exactly describes the transistor density on chips, but I think in this case, we can just roughly say, well, the price performance doubles every two years. Every two years, you get a chip which is twice as good for the same price. This basically means in this pre-deep learning area, people just always spend a constant budget on training your systems. Maybe this constant budget was just like, yep, the CPU, the processor, which was right at the desk of this researcher.
00:33:56
Speaker
But at some point in 2010-2012, the deep learning area emerged. Famously, with AlexNet, maybe with our systems before, they did one new thing. They used GPUs, graphics processing units, which are really, really good at computation of parallel problems at matrix multiplication, which is the key thing which we do for training these AI systems.
00:34:14
Speaker
And with this area, we basically see like, wow, people are starting building bigger and bigger systems, deeper systems, right deep learning. That's what we do here. And to compute growth just like skyrocket, right. And I think this is like, it's really important like doubling every six months, right from 2010 to 2012.
00:34:30
Speaker
That's a big deal. If you double something for six months, this cannot go on forever. Let's talk about this question of whether this can go on, because so far we've talked about companies now reaching three nanometers, which is extremely tiny, probably close to the physical limits. You tell me. What does this mean? Are we running out of possibility to create denser computer chips?
00:34:55
Speaker
Yeah, let's take a step back and just like, well, computers doubling every six months, can this go on forever?
00:35:02
Speaker
What I've just, I think, yeah, when we do, when we say like, we spent more compute on these systems, what are the factors which enable us to spend more compute on these systems? Well, it looks like computers get better over time. How much better do they get over time? I think Moore's law says they double every two years. A recent investigation by my colleagues Tamay and Marius looked at the price performance of GPUs and they roughly found, well, the price performance doubles every 2.5 years, right? So every 2.5 years, you get double the amount of floating point operations for the same cost.
00:35:29
Speaker
When compute then doubles every six months, this basically means we just spend more money. That's what we do, right? We spend more money on the system and it looks like people did it because it paid off with the capabilities.
00:35:39
Speaker
There are limits to how much money we can spend if something doubles like every six months. So if you crunch the numbers and you roughly assume like hey it's doubling every six months and price performance halves like every 2.5 years so this buys you more over time. At some point you hit like you spend 1% of the US GDP or you spend like similar amounts as the Apollo project on these kinds of training systems.
00:36:00
Speaker
Is this likely? I don't know. It depends what the economic return is of these systems. Are people actually incentivized to do this? It depends if they actually have so much compute out there. Do we have data centers that are actually big enough that you just can't burn so much money? It's not like if I wave with money, immediately compute appears. Maybe a couple of years later, it appears if I wave with the money. And as you said, well, I've just been saying, price performance has been like halving every 2.5 years or doubling every 2.5 years.
00:36:29
Speaker
We'll just continue, right? And now we get into the weeds where just like every electric engine hates you for this. Like, well, will Moore's law continue? Is this actually true? I'm just like, man, I don't care about Moore's law. I care about just like price performance. Will it continue? My rough guess is just like, yeah, there are roadmaps out there. I think we're like probably at least fine for another five years of making transistor smaller. You're right. At some point.
00:36:50
Speaker
We hit new barriers, just like how small we can these transistors make. But for us, human intuitively, it's been great. We've just continued doing this. We throw exponential more money out at this, but we also get exponential more gains. So I guess there is more to do there. But even if Moore's law stops, I think that's an important concept to understand.
00:37:10
Speaker
Right now, we have this really short R&D cycles. People want new iPhone every year, right? Every year we want the new processor. So your R&D spending for these chips is immense. If instead we have like longer R&D cycles, this means we spend more money of developing this, we might have better economies of scale. So while your performance per chip stays the same,
00:37:29
Speaker
you can sell the same chip over five years compared to usually one year. So we might see better coming to your scale. So the performance stays the same, but the price might drop, which then eventually meets the price performance. It might continue to go up over time that you just get more for your money. But yeah, key question for people to figure out what's going on there. Really hard question because you're just trying to forecast basically just on the future of humanity, the future of computing.
The Importance of Compute Governance
00:37:54
Speaker
And historically, I mean, Moore's law was pretty good at this. I don't know about the other forecasts on this.
00:37:59
Speaker
Do you think there's some other paradigm coming after this continual hunt or continual attempt to try to squeeze more and more transistors onto a chip? Do you think there's some quantum phenomena or some 3D computing or
00:38:14
Speaker
Is there anything you could see as an interesting heir to the present paradigm? Yeah, I'm guessing there will be like, in the beginning, we'll have like hybrid approaches or something, right? It was like, well, our chips, you know, like, they're like, we're hitting the limits of small we can make, but maybe we can stick them. And you can just like put them all together. Like what Apple did with the M1 Ultra is like, well,
00:38:36
Speaker
We only can make a chip this big, but what if we just put them next to each other with a high-performance interconnect? Or what AMD is going to do? It's like, well, chips only get this good, but let's just put four next to each other and try to connect them. So there's a bunch of hybrid mechanisms and how you're trying to work your way around this. Putting more chips next to each other doesn't sound to me like the same kind of progress as making transistors smaller and fitting more of them on a chip. That seems right.
00:39:01
Speaker
it's definitely a different out of progress there. But I think what I'm saying is, yeah, there are different ways around this. And there's one way how you use hybrid techniques or new techniques. And there are ways people start thinking about completely new computing paradigms. In the beginning, we had those big relays and then eventually moved to integrated circuits. And while integrated circuits have been powering us for the last 60 years or something, or 80 or something,
00:39:25
Speaker
The question is like, what's next? Yeah, you mentioned quantum computing. I'm not expert on this. Whenever I talk to people, like, probably it's overhyped as everything. And even like, I think it's different, it's trying to solve different problems there. But I guess I'm the wrong person to talk there. What else is out there? There's something like neuromorphic computing, a lot more like analog computing is rain inspired computing. I expect us to make like more progress there. And this is particularly interesting when you think about AI cases. My rough guess right now is like, at least the things which I've seen, it's pretty interesting for inference, but not for training systems yet.
00:39:55
Speaker
And I also don't know my other things like this is still processed on silicon. I just don't know if you need cutting edge chips, right? I would rather call it like to some degree, it's like design innovation, you could use different types of chips, but it's still silicon chips. And other things which are the optical computing, I have no clue about optical computing, if this is promising or not, people will like figure this out over time. And at some point, it might become cost competitive.
00:40:19
Speaker
I'm pretty bullish that it's not going to be happening in the next eight to 10 years or something. But then ask me again and we can see where we are. So to what extent, if we're trying to predict what's going to happen with AI progress based on compute, how much should we take into account these kind of wildcards or black swan events or whatever you want to call it, where suddenly perhaps we get a much more intense progress in compute based on something like optical computing or quantum computing or something that
00:40:48
Speaker
an advance that we hadn't anticipated, an advance that's not part of the trend line that we're extrapolating. We should take an account. I would like to put relatively low probability on this. It's not like the case nobody is trying right now to just build a predecessor, right? There's money to be made. If you figure out like something better than what TSMC is doing and just to replace this billion
00:41:09
Speaker
dollar industry with something cheaper which you can build in your garage. Here we go. I think you just want a lottery. Congrats. I think a bunch of people are trying this. The same goes for algorithmic innovation, everything around computation. Eventually, I think a bunch of stuff just boils down to just like this exponential growth or like some kind of growth. Maybe I'm just the economist here like, yep, it's always 3% GDP growth. That's always the thing. That's always what's been happening. I was like, yep, it's always Moore's law.
00:41:32
Speaker
just like this the self-fulfilling thing which you're like trying to do. So do I put a lot of probabilities on these kinds of events? Most of the time not because I care about like price performance. You might tell me like well it's not going to be a flop in the future, it might be something else, it might be light bending, whatever. It's like yeah it might be the case but eventually what it like I care about the output or something right. That's the thing we we gotta look out for there. I'm probably
00:41:57
Speaker
Compared, if you look at the eye trend, like do I expect like major breakthroughs and like better computing performance? Probably not. Do I expect major breakthroughs for algorithms? Yeah, I have like a higher probability there because we have systems like our brain where it's like, there, there is some good algorithm out there, you know, it's like, it's running here right now and it's pretty energy efficient and it's more energy efficient than GPT-4. So I'm like, there is something out there, right? It's physically possible. There I have some proof. So I have like higher probability that these kinds of things might be happening.
00:42:26
Speaker
Tell us the difference between AI timeline forecasting and AI takeoff speeds. With timelines we're trying to usually, I mean, sometimes people just say one year, but we're trying to figure out like the time and point when transformative AI, AGI, whatever you want to define is emerging. So it's a point in time. And usually people try to put like, probabilities on this, right? There's like some kind of normal distribution around this or some kind of distribution around it.
00:42:51
Speaker
Whereas take off speeds is a time duration where people are asking like, how long does it take to go from A to B? And then we already start getting to the weeds. How do you define eventually A to B? One way how to go about this is just like what a recent report by Tom Davidson tried to do is they think about just like, when can we automate most of human labor in the world, like most cognitive tasks or something.
00:43:11
Speaker
It's like, well, takeoff is, and I'm not sure if it's exactly its numbers, but something like from 20% to 100% of like having all human cognitive or like I think all human labor eventually automated. Other people think about takeoff differently. Others are just like, well, takeoff is the thing from the first artificial general intelligence defined by some metrics to the point where this thing is going to take over the world.
00:43:32
Speaker
something along these lines. So like take off just always consideration from A to B and what these A's and B's are. Everybody has like different ideas there and I think all of them are like somewhat useful, but all of these models end up looking different and some of these models are more tractable. Some of the models are just way harder.
00:43:47
Speaker
Yeah, we could have two people with the same AI timelines say that they predict AGI by 2050. But different AI takeoff speed predictions say that one person predicts that we will have a slow takeoff, say that we will have
00:44:03
Speaker
progress beginning in the 2030s and being steady all the way up to 2050, or another person predicting that we will have much slower progress until 2048, from which we will have extreme progress. So that's kind of the difference we have there. Is there anything to learn from studying compute specifically about takeoff speeds? And here I'm thinking, for example, this is just my kind of uneducated guess that
00:44:32
Speaker
If we know something about how compute is produced and how much compute is available, this puts a limit to how fast a takeoff we could have. Do you think there's something to that? Absolutely. I think there's something to it, right? We just look at compute, just like, what does it matter for takeoff?
00:44:50
Speaker
We might have discussed a bit of scale like maybe we can go back to the bio encore model You might be saying well takeoff is from the childhood milestone to the evolution milestone, right? Because they both describe the amount of labor you can automate or something
00:45:03
Speaker
then both of them are defined by compute. Here we go, compute is useful. I think where it's more interesting to just actually look at a compute as a limiting factor for takeoff. We need to deploy the systems. We need to train more systems. So do we actually have enough compute for this? A lot of people are just like, well, if current threads continue and we have another 10 years of this progress, we use this much compute. I'm asking like, well, do we actually have so much compute? How much chips is TSMC actually producing?
00:45:28
Speaker
So the production capabilities of like fabs, how much chips are going to be out there, just, just matter for these types of questions. It also just matters like, what about other things? Like at the moment, some of these like mostly producing like chips for Mac books, for smartphones and AI chips, like just on the small minority of these chips.
00:45:43
Speaker
Will AI be economically important enough where people believe this, that they actually re-steer and produce less smartphones and more AI accelerators? Or will we just build more fabs? And there are limits to this, right? If the US is trying to do build fabs right now, it just takes a long time. And it just really, really hard and just costs a lot of money. And all these kinds of things eventually can then inform how do you feel about takeoff and just compute being a bottleneck there, right? And not too important, like important geopolitical events, right? If you don't have access to compute,
00:46:13
Speaker
Or if you think about TSMC, which is sitting in Taiwan and there is a China-Taiwan invasion, this plays a really important role. Just like how many AI chips are out there? Are they accessible? And this then feeds into just like, are we actually doing these large training runs? Are we actually running enough AI models? So we have like some certain percentages of the economy being automated.
00:46:32
Speaker
Perhaps a very fast takeoff would require something that we haven't seen before in terms of algorithmic efficiency for it to overcome this bottleneck of compute. We talked about what we can learn from trying to forecast AI progress because it might seem like a sort of, you know, whether we have transformative AI by 2050 or 2055,
00:46:58
Speaker
I don't know how much that matters, but you talked about how there are important insights to be gained from trying to forecast AI. And the thing you've landed on is just the importance of compute. So perhaps you could here introduce this notion of compute governance that you've landed on.
00:47:17
Speaker
Yeah, I mean, as you just said, like compute is one of these inputs and I learn like, well, it looks like an important input. What can we do? What is my definition of compute governance? So like when we go back to this AI production function, we have like these inputs, we have algorithms, state and compute. And my claim is, well, if I wiggle this compute node, I can do something about like AI down here, like the deployment of AI systems, the training of AI systems, eventually also just being used in a beneficial way.
00:47:42
Speaker
That's a rough claim. That's what I describe as compute governance. What the outcomes are, there are various different things you can eventually achieve with this. That's what I'm trying to figure out. Fantastic.