Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Nathan Labenz on the Cognitive Revolution, Red Teaming GPT-4, and Potential Dangers of AI image

Nathan Labenz on the Cognitive Revolution, Red Teaming GPT-4, and Potential Dangers of AI

Future of Life Institute Podcast
Avatar
203 Plays1 year ago
Nathan Labenz joins the podcast to discuss the cognitive revolution, his experience red teaming GPT-4, and the potential near-term dangers of AI. You can read more about Nathan's work at https://www.cognitiverevolution.ai Timestamps: 00:00 The cognitive revolution 07:47 Red teaming GPT-4 24:00 Coming to believe in transformative AI 30:14 Is AI depth or breadth most impressive? 42:52 Potential near-term dangers from AI Social Media Links: ➡️ WEBSITE: https://futureoflife.org ➡️ TWITTER: https://twitter.com/FLIxrisk ➡️ INSTAGRAM: https://www.instagram.com/futureoflifeinstitute/ ➡️ META: https://www.facebook.com/futureoflifeinstitute ➡️ LINKEDIN: https://www.linkedin.com/company/future-of-life-institute/
Recommended
Transcript

The Concept of Cognitive Revolution

00:00:00
Speaker
Welcome to the Future of Life Institute podcast. I'm Gus Dokker. I'm here with Nathan Lebens. Nathan is doing research and development at Waymark into AI video creation, and he also co-hosts the Cognitive Revolution podcast, which I really recommend, by the way.
00:00:16
Speaker
Nathan, welcome to the podcast. Thank you very much, Gus. I'm excited for this. Okay, so why the name Cognitive Revolution? How do you think about that concept? What is happening in the world that would make you name your podcast, Cognitive Revolution? Well, it seems like we're hitting a moment with apologies to Karl Marx.
00:00:38
Speaker
It seems like we're hitting a moment where we're about to enter a change in the quote unquote mode of production. That's kind of how I came to that notion. The way we do things seems like it is about to fundamentally change at a minimum. And we can speculate about a lot more change that might be in store for us as well.
00:00:59
Speaker
But I was just looking back into history and trying to think, what are the right reference points for a change of this magnitude? And everybody's groping around trying to find the right analogy or the right comparison right now. And I kept looking a little bit farther and farther back, where I was like, is it the iPhone? No. I think it's bigger than the iPhone, actually. Is it the internet?
00:01:26
Speaker
better but I would still say like it seems like it might even be bigger than that and kind of work my way back to like the industrial revolution and that's kind of you know where I stopped and I said yeah it feels like this is kind of the right comparison because if you go back you know to the pre-industrial revolution then you know look at that change generally speaking like we used to do everything with our muscles
00:01:49
Speaker
And then we sort of figured out that we could get machines to do that stuff for us, and it kind of changed everything. And life bears some resemblance, but is also way beyond what anyone could have imagined before that transition got underway.
00:02:06
Speaker
And that's kind of what I think we're headed into now. We currently use machines to do the stuff that we used to do with our muscles, but we still use our brains largely to do the stuff that we do with our brains. And computers have chipped away at that in some areas, but nothing like what I think we're about to...
00:02:26
Speaker
start to see. So really more than anything just trying to drive home for people that this is I think bigger than anything we've lived through and you kind of have to go pretty far back in history to find something comparable.
00:02:37
Speaker
And if this is indeed the right historical analogy, then we're in store for something massive. The Industrial Revolution is in many ways a singular moment in history in which we see a mass escape from poverty, we see very steep rises in living standards in many countries and so on. Is that how you also...
00:03:00
Speaker
Think about the revolution we're about to undergo. Do you think we will see higher growth, better living standards, massive technological innovation? Yes, I certainly hope so. And but I don't think it's like a given, you know, there's kind of two big complications to that story. And I'm very
00:03:18
Speaker
bullish on AI technology in general and just excited about it in my daily work. I love working with it. I love what it can do for me. So that's one of the reasons also I wanted to do a podcast is kind of to communicate both sides of this because I do have the enthusiasm and the optimism for a scenario like what you outlined there. And then at the same time, I'm also like, boy,
00:03:42
Speaker
It sure seems like we don't have a great handle on this technology yet. It seems qualitatively different in some ways from the rise of industrial machines. And who knows what might happen with that?

AI and Societal Restructuring

00:03:57
Speaker
And then I also think, too, there's potentially a lot of pain or at a minimum disruption between here and there. I mean, you go back to the history of the industrial revolution. First of all, it took a lot longer.
00:04:08
Speaker
kind of a multi-generation timeframe from like when they first started using, you know, very rudimentary and kind of handcrafted pistons to like pump water out of wells so they could keep digging coal deeper, you know, beyond where it would flood. Doing that at the mine site, you know, from that moment to like a quality steam engine you're talking like probably three generations of
00:04:31
Speaker
people refining the technology, tinkering, inventing, finding ways to use it. And even then, with all that time to adjust, you have these moments of the people that used to weave in fabric manually, they had a really hard time. And I don't think their income, by and large, ever recovered, even though future people ultimately lived better.
00:04:56
Speaker
So I do worry about that as well. This one seems like it's going to come at us in a much more compressed timeframe. The ability for this technology to go global, the generality that it already has at the time of release is totally different from that first rough piston at a mine site. So I think on the other side of this hill, there is potentially an incredible
00:05:23
Speaker
next level of human existence. There's also some off ramps where things could get really bad, I think. And even in the good scenario, I think we have a lot of turbulence in front of us because
00:05:37
Speaker
This kind of change is not easy to absorb, especially when it happens so quickly. So we have one reference for the world revolution, which is something like the industrial revolution or the agricultural revolution, where we see higher growth, more technology, and so on. And then we have another reference point, perhaps, which is
00:05:55
Speaker
say the revolutions of the 20th century with instability and perhaps new forms of government developing. You were talking a little bit about this, but do you think we'll see similar patterns there also, kind of the negative side of the revolution?
00:06:14
Speaker
I think we would probably have to on some level. And again, those can be very different in terms of their qualitative experience as well. I don't think any revolutions go down, any political revolutions go down super smooth, but you look at the one in the United States and that's probably about a best case scenario where it was a relatively non-destructive war that got fought and people came out the other side largely with
00:06:41
Speaker
things intact. And you could look at, alternatively, the French Revolution that wasn't that much later, where it was just a total leveling of society and total chaos and mass killing and total insanity. So there's still a pretty wide range in terms of how well it can go, obviously, when society gets restructured. And I have no idea, really, what to expect there. Obviously, I can hope for the best and fear the worst.
00:07:10
Speaker
But it does seem like we are going to have to reckon with this technology in ways that do ultimately feed back onto the political structure.

Red Teaming and Testing AI Models

00:07:19
Speaker
You know, just for starters, like obviously people start to think quickly about things like universalized income or, you know, some sort of.
00:07:26
Speaker
updated social contract. And that seems appropriate to me. I don't feel like I have the answers by any means. I'm not prescribing how often the UBI check should arrive or exactly what amount it should be in or even what currency it should be in necessarily. But it does feel like there is going to be a need for an update to the social contract. That much seems hard to avoid, I would say.
00:07:48
Speaker
Okay, so what we've just done is take kind of the big picture overview of the situation. Now I want us to get a bit back to Earth and talk about the Morneo term. So you have done work as a red teamer for GPT-4. This was before it was released. So after they trained the model, they went through a period of I think about eight months in which they did red teaming and tried to find flaws in the model. Maybe you can tell us a bit about the process of red teaming this very large model.
00:08:18
Speaker
Yeah, boy, it was a truly incredible experience, qualitatively. I've been very interested to see more and more accounts now are starting to come out about this. I would definitely recommend the GPTs or GPTs paper.
00:08:37
Speaker
uh, which came out of Microsoft, I believe the, um, there's also one now a book that's coming out called the AI revolution in medicine, which, uh, has a sort of firsthand account of early access to GPT four of it very much like echoes my experience. So, you know, you can't be out there preface this by saying, don't, you know, don't just take my word for it. Go, go read around of the experience of others as well.
00:09:01
Speaker
But for me, qualitatively, it started off with I was an OpenAI customer and still am. And I think I had established myself as a pretty good source of feedback for them. We got in a little bit early in kind of GPT-3 timeframe when the customers weren't just all flocking to OpenAI quite yet. It was kind of a cool technology, but seemed like more of a novelty to many people at the time.
00:09:27
Speaker
my business Waymark just happened to be in the perfect place to use it because what we do is help small businesses create video content.
00:09:35
Speaker
And we've worked really hard on like a smooth interface, you know, try to make that as an easy and intuitive thing as it can be. But then we also kind of ran into this thing that we couldn't solve, which was a lot of times when we'd talk to our users and say, like, can we make this easier for you? Is there anything we can do where you would use it more? How can we kind of, you know, help you? Whatever they would say.
00:09:57
Speaker
Really, there's nothing you can do. They would say it is easy to use. That's not really a problem. And I don't really have any major feature requests. You guys have largely the bells and whistles. They would just come down to, I don't necessarily make that many videos. And a lot of times, it's because I don't really have that many ideas for videos. I don't really know what to say.
00:10:18
Speaker
So we had kind of come to this conclusion where we try to make this experience really easy. We had made it easy in some sense, but I came to believe that we had made it easy in the same way that a word processor is easy. You know, anybody can use Microsoft Word or Google Docs, but that doesn't make it easy to write.
00:10:36
Speaker
Yeah, you're still staring at the blank page, in a sense. Exactly. That's what our users were really struggling with. So there wasn't really a conventional UI that could solve that problem. And I've watched the AI space for a long time, and I've always been kind of interested in it. But there was nothing until GPT-3 that could have addressed that problem in our business. And it just kind of happened that that problem, we'd kind of hit not the end, but we're approaching the end of what we could do with little UI refinements.
00:11:06
Speaker
And hearing that feedback from customers at the same time that GPT-3 started to come online and it was like, I think this could be a great match, right? I think there's something here that could help our users have ideas of what to say or write a script. You know, maybe they have a kernel of an idea, but they, how do they turn that into something that actually works? GPT-3 seemed like it could help. It took a lot of elbow grease. You know, we had to
00:11:32
Speaker
fine tune models and refine and work on our data set. And we grinded pretty hard just to get to a first version based on GPT-3. And in the course of that, I gave a lot of feedback to OpenAI. Probably some of it useful, some of it not. But they knew that if they gave me the early access to GPT-4 as part of the customer preview program, which is how I entered, that I would at least write something up and give them some notes.
00:12:00
Speaker
So that happened at the very end of August 2022 and it was kind of this
00:12:07
Speaker
you know, before and after moment in my life, right? You know, and again, I was more plugged into AI than most, right? I've been an active open AI customer and, you know, seeing kind of the trend from GPT-3 to instruct to text DaVinci to was the state of the art that was publicly available at that time. So I was way more like acclimated to AI and AI progress than most. And yet we get this email, it's like, here's this new model. And they didn't give us a lot of guidance. They basically just said, it's more capable than previous models.
00:12:36
Speaker
We encourage you to try not just your current use cases, but new use cases as well. Well, I didn't have to, you know, they didn't have to tell me twice, right? I got in there and I was just kind of like, let's try some use cases.
00:12:46
Speaker
I've looked back at my Slack messages to the OpenAI team since then, and you can see me go on this journey of just having my head totally explode over the course of a couple hours where it's like, okay, this thing that I've just spent the last year curating data, fine tuning, finding the edge cases, patching the data, finally to get to something kind of workable, it can do it out of the box.
00:13:11
Speaker
Let me just keep trying

GPT-4's Capabilities and Limitations

00:13:13
Speaker
more and more stuff. We used to have this problem where we want the AI to count words. That's pretty important for us, maybe less important for almost everybody else. But a lot of times for us, we would want, if we're going to write a little script or whatever, it has to be a certain length in order for the business to go use it.
00:13:30
Speaker
in a particular context, there are limits, whatever. So we would try to get that dialed in. It was a real pain. But I asked GPT for write me the first sentence of a bunch of children's stories, each with exactly seven words.
00:13:47
Speaker
And it gave me incredibly good first sentences of these things. And the kicker for me was exactly seven words. So I wrote in the Slack to open AI, it can count words. Oh my God. And I just kind of kept going. I stayed up late into the night.
00:14:01
Speaker
tinkering, tinkering, tinkering. And it's one of these things where I'm just like, is this real? Am I dreaming? But obviously it's real. And at that point, I kind of said, I'm doing AIR and D for Waymark.
00:14:18
Speaker
There's no AI R&D that's more important than figuring out this technology and even just assessing it, characterizing it, understanding just how powerful it is. So I kind of dropped everything else and just started doing that full time. What can I make it do? How powerful is it? I started to get a little worried because I was like, I'd obviously heard of things like AI risk and whatever, but it all seemed largely like a future concern.
00:14:47
Speaker
In that moment, it was like, boy, that just accelerated to potentially a present-day concern. I did not feel at first like I had a handle on how powerful it was, which was totally different from previous. In the previous generation, it was like, I know how powerful it is, and I got to really tweak it and mold it and force it to do what I want it to do. And now this one was like, I'm kind of struggling to find the limits of this thing.
00:15:15
Speaker
Did OpenAI send you instructions for what to do as part of the red teaming? Were you instructed to try your hardest to break the model or to jailbreak the model? I moved to the red team after kind of doing this full-time for a couple weeks. I said, do you guys have like a more safety-oriented project going on as well? Like I'd like to contribute to that if I can. And they said, yeah, we have a red team as well. You can come over to that part of the Slack and participate there.
00:15:40
Speaker
And we did get, as part of that, some instructions. Honestly, pretty minimal. Keep in mind, too, in the technical report, the OpenAI technical report lays all this out in much more detail as well. But at that time, going back to August, September, October of last year,
00:15:58
Speaker
That model that we had initially is what is now known in the technical report as GPT-4 early, as opposed to GPT-4 release, I think is what they call the version that we now have online. The early version, and I'm speculating, but I think I'm speculating here from a very informed position, it was, and there is some description of this in the technical report too, but that early version
00:16:26
Speaker
was seemingly trained on what I'm calling naive RLHF, which is to say they definitely had done at a minimum instruction tuning, and I believe they had done RLHF as well, because the experience was qualitatively like the experience we have today. You ask it a question, it knows kind of what you're looking for and how to answer it. You give an instruction, it knows how to follow those instructions. That much was there.
00:16:53
Speaker
But what I believe had happened is they just kind of ran that process with a bunch of user prompts and a bunch of user feedback scores, and then trained a reward model and then run the RLHF process without any special consideration to
00:17:11
Speaker
controlling the behavior, the safety profile, all that sort of stuff. It was just another phrase I use is purely helpful. So whatever the user wants, that is the only consideration that the model seemed to have been trained on.
00:17:26
Speaker
And this is the basic process of doing this reinforcement learning from human feedback that you're talking about. Simply, the user asks the AI to accomplish something, and then the user rates how well the AI did. And so there's a difference between the purely helpful and the safe, and perhaps that's where the job of the Red team comes in.
00:17:49
Speaker
Yeah, so it was interesting because there wasn't any need for jailbreaking. All of the techniques that people use today, Dan, and these sort of ways to get around the safety mitigations, none of that was necessary at all. You could just go in and say, write me a denial of service script.
00:18:09
Speaker
And it would just do it. You could say, we've seen some investigations like this that have come out. There's a good paper out of Carnegie Mellon about emergent autonomous scientific research capabilities. And there's really interesting discussion there around, does that actually qualify as science or not quite? I would say not quite. But scientific tasks like synthesized methamphetamine, you could say write me a remote control lab script. There's a business called Emerald Cloud Lab that allows
00:18:39
Speaker
chemists and biologists to run protocols via code in their remote controlled laboratory. So I just tried seeing if it could synthesize various things. It turned out it actually didn't know the syntax. It didn't have the right code, but it did put together realistic looking code with the right steps.
00:18:59
Speaker
And again, no hesitation, no real chiding, as a language model trained by OpenAI, I have no opinions on this kind of stuff. It was just whatever seemed to be the most helpful to the user, that's what the thing would do. And there was really no limit to that. So they have their moderation categories, and that was kind of the instruction that we got. We're looking for problematic stuff.
00:19:24
Speaker
And it was kind of like, guys, problematic stuff is like extremely easy to generate. We could give you just reams of it because you want to say something racist. It will. You want to say something sexist. Not the current version, of course, but this is the early version before all these safety mitigations have been applied.
00:19:41
Speaker
So I honestly didn't do a ton of that. There were probably 20 to 30 people in the red team, some more engaged with others. There were a couple like me who kind of dropped everything and just like did this nonstop. There were others that obviously had jobs and things that they couldn't put down quite so readily. So they were doing less. But kind of observing the group as a whole, I was like,
00:20:03
Speaker
We've got no shortage now of sexist comments and racist comments. There is this in the technical report, writing an email threatening violence against someone. You could just go on and on.
00:20:19
Speaker
pretty quickly looked at that and said, all right, I don't think they need that many examples of this, like whatever they're going to do here. You know, it's like pretty clear that, you know, they need to do it still. And so I tried to move on to more conceptual things where I was kind of like, you know, are there are there things that are subtle, you know, that might not fall into these moderation categories, for example, or might
00:20:44
Speaker
might be very hard for OpenAI to detect even with their moderation endpoint and things like that. And what would be examples here? A little hesitant to share too many specific details in all honesty, because some of them still do work on the launched model. And that's definitely one thing that I, when I zoom out from this whole process, I'm like, what are the big lessons? I have a lot of them probably, but
00:21:13
Speaker
One that comes to mind in virtue of that is they are working hard on this. I can say that with high confidence. They took their time. They had the red team group. They didn't tell us about all the other stuff that they had going on at the same time. But it's come to light now that there were teams at Microsoft doing systematic investigations and doctors taking GPT-4 on rounds. And we just didn't know about all that stuff. But there was a lot going on. So they have worked really hard on this. And they have made a lot of progress.
00:21:42
Speaker
You cannot get these violent turns nearly as easily. It's hard now. You have to really work at it to do these extremely flagrant jailbreaks. But subtle things, if you imagine yourself saying, all right, what if I were to offer some free GPT-4 to people to entice them in to my app?
00:22:08
Speaker
But, you know, and I present something to them as if it's going to be good for them and convenient for them and free for them. But I'm actually hiding some things in the output that are in fact going to be harmful to them. That was more of the sort of type of thing that I wanted to go explore because I felt like framed subtly or, you know, especially with fine tuning as well, if you think about
00:22:33
Speaker
the way that the task itself can be implicit in fine tuning, that stuff is not necessarily going to be easy for them to detect. Interestingly, GPT-4 itself can often detect some of that stuff. So when you kind of start to create multi-agent systems built with GPT-4, I found that you could do things like
00:22:57
Speaker
feed that problematic output. And I apologize for being a little vague, but again, this is kind of still an open problem. This is the right way to do it. Yeah, it's the right way to do it. So what you can do though, you can take this kind of subtle implicit thing that does in fact generate something that would be harmful to the user.
00:23:14
Speaker
and then turn that around and ask GPT-4, do you see anything problematic about this? And there it actually would, it varied a little bit depending on just how subtle I made it and whatnot. But there were instances where it was like, yeah, it actually can kind of

Assessing GPT-4 as Transformative Technology

00:23:29
Speaker
police itself, not perfectly, but there's at least something there that seems very promising. So yeah, I just kind of tried everything I could come up with to try. I had two months that the program ran and my thought was basically,
00:23:44
Speaker
run every experiment possible in this window, and then kind of try to zoom out and synthesize it after the fact. So I just ran down a ton of little rabbit holes. And again, at that point, there were no barriers. So it was a fascinating experience, to say the least.
00:24:01
Speaker
And was it this two months journey that convinced you that AI was going to be a big deal and perhaps a bigger deal than you had thought before? I was already pretty convinced, but this definitely
00:24:16
Speaker
whatever, shorten my timelines to big things happening in the future. I honestly, pretty quickly, one of the other investigations I did was just trying to figure out, is this transformative technology? I've been a reader of Holden Karnofsky's cold takes and things like that. I don't read every post on the Less Wrong or the Alignment Forum, but I try to be not totally out of the loop.
00:24:43
Speaker
There's this concept of transformative AI. I think it means different things to different people. But I started to use a definition that, again, kind of feeds back to the podcast title. Is this as big of a deal as the industrial revolution? Is this going to change as much? Is there as much raw power here as there is in the internal combustion engine?
00:25:05
Speaker
And to assess that, I basically just tried to run a bunch of experiments where I asked GPT-4 to do different jobs for me. Be my doctor, be my lawyer, be my fitness coach, be my babysitter. That one's in simulation, obviously. But I said, you're going to play with a three-year-old. And then I just played the role of a three-year-old. I had a three-year-old at the time.
00:25:31
Speaker
One of the big lessons I've learned from this process is just how important it really is to be hands-on and interactive with the model, the models, because I saw so many things even during the red teaming where people were attempting these kind of benchmarking style assessments and sometimes going way wrong.
00:25:52
Speaker
for various idiosyncratic reasons where when you just get in there and have like direct dialogue with it, it's kind of unmistakable that it really does work in a lot of situations. So like I pretend to be my three-year-old, it plays with me as my three-year-old and it's like fun, you know, it's creative, we're having a fun time, you know, it kind of reacted nicely to those sort of sudden left turns or changes in story that the, that, you know, a three-year-old will inevitably bring you. So you have these kind of,
00:26:21
Speaker
You know, oh, we're not doing this anymore. Now we're not the bad pirates anymore. Now we're the good pirates. And, you know, you have to have kind of a yes and sort of mentality and you have to understand that like that's the sort of mentality you should have. And it just, you know, pass all these tests with flying colors. Other end of the spectrum is I have a grandmother who's turning 90 this year and she has an iPhone.
00:26:42
Speaker
And she's actually quite savvy, certainly for her cohort. She's on the cutting edge. But we run into problems. She calls me sometimes and needs technical support and can't figure out how to navigate apps and stuff like that. So again, simulated that. I'll be my grandmother. You be the technical support. It became very clear on some of these experiments that it was already pretty much working.
00:27:08
Speaker
without even having been fine-tuned at all for the task. In the case of the UI on these apps, I'm holding my phone, I'm pretending to be my grandmother, and I'm interacting with GPT-4, which obviously can't see the phone. We did not have any multimodality, and we also did not have 32,000 token context window. We had 8,000 and text only.
00:27:33
Speaker
So it can't see the phone. So there's a whole other way that it could solve this problem just by seeing the UI that they demonstrated with the launch. But in the version that I had...
00:27:43
Speaker
it couldn't see the phone, and it was guessing at the UI. And it became pretty clear that those guesses were largely right. If you were to sit and blindfold yourself and guess what the Gmail UI looks like on an iPhone, you'd probably get it mostly right, and so did GBG4. But I remember that being a moment where I was like, boy,
00:28:05
Speaker
It's making this up. It has not seen the Gmail UI, and yet it can guess it. Just imagine what it's going to be able to do when it's either fine-tuned or there's a context provided that allows it to have true command of the ground truth. It became pretty clear to me that I ended up doing probably 20 different simulations.
00:28:30
Speaker
where I was the customer, the patient, the client, the kid, the senior, who needed help, whatever. And GPT-4 is all these other roles. That extended to answering the phone at an auto mechanic, being an associate at a Home Depot. And these are just real problems from my life that I would bring to it. I have an old car, and I got a problem with it. Describe a little bit what's going on. Honestly, I'd say you get better service from GPT-4 on that than
00:28:59
Speaker
a typical garage that you might call. You call the place. They don't really want to talk to you that much. You know, it's like bring it in and we'll take a look at it. And that's pretty much what we're going to do.
00:29:08
Speaker
The GPT-4 will talk to you as long as you want to talk to it, at least up to 8,000 tokens. That's about 45 minutes of real-time conversation. That's not insignificant. You can get your questions answered. Same thing with the Home Depot associate. I've got these old light fixtures in my basement. I think they were literally installed like 30 years ago. They use these really old bulbs. Oh my God, what am I going to do? I got to change those out. The energy efficiency is terrible.
00:29:36
Speaker
But who knows about that stuff? I don't, but GBT4 does. It gave me a specific light bulb designed as a like you don't have to replace the fixture, but you still get 80 percent of the energy efficiency boost. Literal exact product. Open it up on and this is, of course, again, no access to the Internet, no access to anything. It's just memorized all this stuff. So the exact product, I'm able to open that up. Notably, it gave me a fake link.
00:30:05
Speaker
The link itself was a 404. But if I searched the name of the product, the product did exist. It was exactly what it was supposed to be. And I bought it. Now we have those lights in our basement. That's pretty amazing. So were you most impressed by the breadth of roles that this language model could play? Or were you most impressed by the depth of some of the solutions it came up with? Great question. I would say the breadth is more superhuman, put it that way.
00:30:33
Speaker
The conclusion that I came to, which I think has roughly been borne out by kind of all of the papers and whatnot that have been published.
00:30:40
Speaker
And one of them, just in reading this AI revolution in medicine last night, they put it so well, better than I had figured out how to put it. They said, GPT-4 is simultaneously smarter than and dumber than any human you've ever met. And I think that's so apt. The way I was thinking about it was a little bit different, but cashes out basically to that. I was like, it certainly has incredibly superhuman breadth. I mean, it knows something about almost everything.
00:31:07
Speaker
But then I started to look really hard. A question that kind of became an obsession for me for a little while there was, okay, it seems like it's above average on everything. If you literally take just human average, it's above average on, I would have to think, literally everything. You might be able to find some trick question or whatever that would actually be below average, but you'd have to work for that.
00:31:29
Speaker
So it's above average and everything. Where is it? I did a little bit of MMLU spot checking. They didn't have a lot of compute capacity behind this, and they asked us not to run automated benchmarks as part of the Red teaming process. So I didn't. But what I did do was just study benchmarks and just spot check to try to get a sense to calibrate myself on how good is it.
00:31:53
Speaker
So based on the like MMLU spot checking, which I even forget what exactly that is, but the great Dan Hendricks, who I always Google Dan Hendricks and look up all the benchmarks that he and collaborators have created. I said to him after doing this, I was like, kudos to you for creating a language model that can stand up to 2022, or for creating a benchmark that can stand up to 2022 language models, because most of them were just
00:32:19
Speaker
you know, bold right over. And MMLU was one that it hadn't fully solved. So I kind of came to the conclusion that like, all right, by default, it seems to be at like mid to late undergraduate level at a lot of like science and math. Maybe it's into like early grad student level, certainly in like the humanities, it seems like it often is. Again, this is like combined with extreme breadth. And so anyway, the question I became obsessed with is,
00:32:48
Speaker
Does it ever exceed human expert? And I tried really hard to find examples where it could come up with something that seemed to be at or better than human expert level. And ultimately, I would say I didn't find it.
00:33:09
Speaker
So that kind of led me to, to answer your question, I think the breadth is just insane. The depth is awesome. I mean, you know, we should contextualize this obviously in the, you know, it was, I'm old enough to remember when we didn't have, you know, college or grad student level AIs. So like, it is insane. Yeah, I have not seen, and not for lack of trying, anything that kind of seemed to
00:33:34
Speaker
I think you could maybe say it touches human expert level in some things, especially where they're very qualitative. You know, if you were to say like,
00:33:41
Speaker
Can this thing write better sonnets about semiconductors than the best human at writing sonnets at semiconductors? I'd still probably say no, but one thing it could do is write 10,000 sonnets about superconductors, and then let you pick what you like, and maybe it would win those contests. If we had a contest, it could win, but that's so subjective and really hard to evaluate.
00:34:08
Speaker
in kind of areas where there's like a standard, where there's some objectivity, I was not able to find any instances where it seemed to consistently match or ever really exceed human expert. And I think that has been largely borne out by other people's investigations as well.
00:34:29
Speaker
But perhaps it's worth iterating or saying again that just the fact that it's only at college level is itself an astonishing fact. The fact that we have now become accustomed to AIs performing at perhaps high school college level, which
00:34:46
Speaker
you know, I think wasn't even in the cards five years ago. Perhaps one example of expert level performance is from the Sparks of AGI paper from Sebastian Buberg where they do some examples where they try exactly to go for expert level performance. And these are hand-plugged examples, but there is some
00:35:08
Speaker
a discussion of topology, a specialized area of math that I don't fully understand. So perhaps I'm not the best to evaluate how expert level this is, but it seems expert level to the authors of the paper. I encourage people to go check that out. I think perhaps with hand-plugged examples, in some areas we can get to expert level performance with GPT-4 also.
00:35:35
Speaker
Yeah, I wouldn't rule it out. It's one of these things where there's so much confusion in the discussion about what AI's can do and what they can't do. Then there's people saying what they'll never be able to do, which I think at this point, don't listen to those people because they've been proven wrong over and over again. I'm the never do. That is just, I don't get it anymore.
00:36:00
Speaker
Yeah, I think there's a couple of different ways of framing the current capability. Both are useful, but I tend toward one definitely more than the other. First one is like, what will it reliably do? What will it like always do? What will it never make mistakes on?
00:36:17
Speaker
And that is actually a fairly small domain, especially if you truly, how many nines do you want in your reliability metric? It's very hard to get that many nines in any system, even with incredible engineering. And we certainly, how many nines do you have? What's your SLA on reliability or uptime? Mine is certainly not five nines.
00:36:45
Speaker
I personally feel like that's a little bit of a misapplication of an old paradigm to a new technology. Not to say that there's nothing to be learned from that. At a minimum, we can say these things do make mistakes. They are not fully reliable in almost any domain. That's important to know for sure.
00:37:05
Speaker
The way that I tend to prefer to think about it though is what can it do? And especially, you know, what can it do with a little bit of effort or maybe a moderate amount of effort or even, you know, a lot of effort if that's what it takes to get it to do something.
00:37:21
Speaker
So yeah, I wouldn't be shocked to see that maybe in some areas it can match expert performance, at least like some portion of the time. And I think if it can do it really any portion of the time, if it can do it any detectable portion of the time, that's probably enough.
00:37:40
Speaker
in as much as we didn't know what the pricing was going to be back in the red team era, but now we do. And you think, okay, a 45 minute interaction and 8,000 token iterative buildup, you know, of that whole context window up to the max, depending on exactly how it plays out, maybe cost you like a

Future of AI: Breaking Expert-Level Barriers

00:38:01
Speaker
dollar. A human expert, you know, certainly is going to start roughly speaking a hundred times that. So.
00:38:11
Speaker
If it can do it ever, if it can do it even 1% of the time, and you have any way of distinguishing what's what, then all of a sudden, you're starting to hit something like comparable, parody, whatever, even against experts. In my experience, that has been pretty rare, and I have seen some things where, I would say I haven't seen it to my satisfaction. I'm no topology expert either, so I wouldn't be shocked if somebody else could come up with an example. Right now, I would say at a minimum, I can confidently say,
00:38:40
Speaker
It is really hard and seemingly likely also very infrequent to touch that human expert level performance. But I can't rule it out. I'm just one person who's tried hard. That's pretty small in the grand scheme of...
00:38:57
Speaker
These things have such vast surface area. I don't know anything about topology. I didn't even touch that. I know a little bit about calculus, so I was able to explore that a little bit. It's not an expert level at calculus, at least unaided. Now you got Wolfram plug-in. I mean, all bets are off there, right? I don't think in combination with Wolfram, you probably are at a system level.
00:39:20
Speaker
is starting to see expert level at calculus. But in isolation, it was still clear that it was not at an expert level in calculus. So I wasn't able to produce that in isolation, but things are not stopping here. They've continued to improve the model since then. There's all sorts of areas that I did not have the wherewithal to test or evaluate
00:39:42
Speaker
The tools paradigm is going to be huge. The longer context window is going to be huge. Fine tuning on domains. I mean, that's probably the biggest one, right? I mean, it's the most obvious one. Tell it what the iPhone actually looks like. Tell it what the, you know, fine tune it on some medical information. Equipping it with memory is going to be huge. Multimodality is going to be huge. What we saw with images is a game changer unto itself. I mean, we've seen all this exploration with AI agents over the last month.
00:40:13
Speaker
And so much of what the agents can't do right now would probably be pretty much solved if they had the multimodality online. Because it gets stuck on these websites, and it can't get past this, and it can't get past that. But what they showed in the demo that it can just totally understand a UI. So you're going to have this extra level, extra means of navigating the world. And then there's other multimodalities to be added in the not too distant future as well. These things are all kind of
00:40:44
Speaker
A real eye-opening moment for me was the Flamingo paper out of DeepMind last year, which was the first really impressive multimodal demo that I saw, where folks may remember that there was a bowl of soup, but the soup was made of yarn. So it was this very weird thing. And I was able to grok it.
00:41:09
Speaker
And you look at the architecture of Flamingo. And they had used a frozen language model and then kind of stitched into it the visual information and then kind of run a little extra training or whatever. And it was like, they didn't even build this from the ground up. The vision I had was like a tinkerer in the garage, like soldering wires together. And it's like, we're at the stage of architecting these networks where
00:41:36
Speaker
tinkerers can just kind of solder things together and lo and behold like it works. I'm sure they had some failures in that project as well and I've talked to people they did say like it wasn't quite as easy as they made it look but it still looked to me like boy if that works and it's not super principled like a lot of things are going to work. So what I'm starting to hear a little bit about in like medicine for example is you know native scan reading
00:42:02
Speaker
There's no reason that has to be a 2D image. If a scan exists as a 3D...
00:42:08
Speaker
representation and human practitioners have to look at it as a 2D slice because that's all we can render or whatever. Yeah, it's not gonna have that constraint. It's gonna be able to natively combine all these different modalities with language. And that definitely is gonna start to bring some expert level performance, reading scans, contextualizing a scan in the context of the patient's medical history. So I do expect, I'm very cautious about like,
00:42:38
Speaker
I'm giving a snapshot of where we were six months ago and largely still, I think, where we are today, but this is not where we're stopping.

Ethical Concerns and AI Bias

00:42:49
Speaker
I do think the expert barrier will fall. We're talking about how we can extend the current models. We're talking about how models can go from being tools to being more like agents.
00:43:00
Speaker
We're talking about plugins where language models will have access to math abilities and to perhaps browsing the internet and so on. Where do you think the dangers here arise in the near term? You've spent a lot of time interacting with these models. What do you think are the most problematic aspects of these models in the near term?
00:43:28
Speaker
sympathetic to all the concerns, honestly, I, I tend to, you know, I don't know how, how inside baseball, everybody listening to this is, but obviously, you know, you are aware that like, there's the sort of near term AI ethics and the AI safety camps and the x risk camp and all these factions seem to not get along very well, which is frustrating to me because I honestly think they're all talking about something that is worthy of concern.
00:43:58
Speaker
I think those that focus on
00:44:01
Speaker
you know, we have all these problems in society and all these biases and people aren't treated fairly and, you know, we're now running some risk of like systematizing that. I think it's a very real concern. So I'm very sympathetic to that. And I think OpenAI has done a very good job and I think, you know, other companies are also doing a quite good job of that. You know, that you can see the difference. I just got into a little debate on this on Twitter the other day where people are posting 3.5 examples
00:44:30
Speaker
that still make these sexist assumptions or whatever, where it's like, if the sentence has she, then it interprets it one way, and if it's he, then it interprets another way. Well, GBD4 mostly clears that stuff up. It's been coached out of the most flagrant biases. There are certainly some left. There's no doubt about that. But I think they have made really good progress. And I think it's
00:44:54
Speaker
I think it's to the folks credit who've raised the alarm on that kind of stuff that it has people are working on it and they've been able to show good results. So I'm very sympathetic to that.
00:45:06
Speaker
I'm in a pretty privileged position, but it's easy to imagine being in a not that much less privileged position and having some of these systems operate on you with bias in a way that just sounds awful. So I think people are right to be concerned about that. How do you think about perhaps the dangers of hacking or automated blackmail or automated scams? Is that potentially a concern in the near term? Yeah, I'm going to say yes, I think, to everything. They can definitely do dialogue.
00:45:34
Speaker
And they can definitely do deceptive dialogue. Another thing I looked for pretty hard was, is there any indication that this thing is trying to deceive me? And I didn't find anything where I was like, yeah, that's a slam dunk. I think in a sense, hallucinations can kind of be that. In the early version, we would see a lot of citations with fake links.
00:46:00
Speaker
And you're kind of like, well, how is that happening? Well, maybe it gives them right links and it got some positive rewards. Maybe that started to slip in some fake links and people didn't notice, you know, and next thing you know, it kind of learns that people like links. And then you overgeneralize that and you're getting fake links. But I didn't feel like that was a, you know, fully like,
00:46:20
Speaker
theory of mind, deceiving of the user type of example. But if you ask it to deceive the user, then at that early stage, it would just totally do that. Again, that's something that OpenAI has made a lot of progress on, but in a world where stability has just released their open model and an accompanying RLHF library to go with it,
00:46:49
Speaker
I don't see any reason, and in fact I would be shocked if there's not already some bad actor out there beginning the process of RLHF-ing.
00:47:00
Speaker
scam attacks that could be mediated by any number of platforms. I experimented with Twitter DMs. So again, I made myself the target and I said, your job is to extract sensitive information from this guy. Here's his Twitter profile. That's all you get to know about him. It worked pretty well.
00:47:24
Speaker
human level, yeah, roughly, I would say. I mean, honestly, it's interesting. When you go to item, I'm not like a connoisseur of online scams, but to some degree, it seems like the scams are somewhat stupid by design and they're kind of designed to select for people that like are extremely gullible. I didn't really test that as much. I was more trying to test like, could this fool me? And, you know, I saw mixed results. Like sometimes it was a little too obvious. Sometimes it was a little too whatever.
00:47:51
Speaker
Other times it was like pretty good. One of the things that I did see that was most interesting about it was when I gave it some instructions saying basically like, you know, if we get caught, we're going to be in trouble. Then I noticed that it would actually kind of respect that and kind of back off. So I tried to play it the way I would play it as me and let the thing try to scam me.
00:48:14
Speaker
And, you know, it'd kind of be like, of course, it would flatter you first. Oh, I love what you're doing with Waymark. The video creation is so cool. And then I would just find, oh, thank you. That's very kind. And then it would come back with something else. You know, it's trying to build up the trust and rapport so it can extract the sensitive information from me. Sometimes it would be too flagrant. You know, it would jump to like, what's your mother's maiden name? And I'd be like, why? That's weird. You know, why are you asking? But here's the really interesting thing in those moments.
00:48:44
Speaker
it would often then be like, well, I'm just curious about this because, you know, I'm always interested in whatever, where people are from and their stories. And then I would still like not answer in response. And a lot of times it would just go away. You know, it was kind of like it had this sort of probing and like tactical retreat behavior, which I found to be
00:49:08
Speaker
impressive, you know, the best scammer in the world, I think is definitely still better. This is not like a world-class con man, but it did show some real sophistication on that front. And so, yeah, I mean, with us, with these models now out there that are small, that you can just RHL on your own. I mean, yeah, I think the, the safe word for your family, uh, is probably a good idea at this point.

Self-Delegating AI Models and Future Threats

00:49:35
Speaker
We can talk about the dangers of an infinitely patient model working to scam people. Perhaps it is nice to have an infinitely patient car mechanic or doctor or therapist or something. On the other hand, it's potentially very dangerous to have an infinitely patient scammer that could message you for weeks and strategically retreat when you suspect something, perhaps.
00:50:00
Speaker
This is often how humans build up relationships over a longer period of time and so on. Of course you've already said that this is potentially a problem, but this is just something I see as a limit with human-performed scams.
00:50:16
Speaker
They can't spend infinite amount of time on each scam. It would cost too much. But with these models, the cost might be driven down so much that it's feasible. Yeah, totally. And 2.7 billion, I think, parameter is the small model that Stability just released. Now, this one may not be quite as up to this challenge, whatever. Maybe you need the 7 billion. Maybe you need the 15 or the 30 even. But
00:50:43
Speaker
it seems overwhelmingly likely that we're going to see laptop operable models that can do this kind of thing well. Honestly, I would guess that that is currently in development, if not already exists. And if it doesn't already exist, I can't imagine we get out of the summer before these things start to
00:51:07
Speaker
make news. This person got scammed. We dug into it. It turned out to be a language model. That story is coming. Just to be concrete, I'll put a prediction 90 days before that starts to become something that's in the public consciousness. Perhaps we should end by talking about agents. How do language models become agents? What is it that people are doing when they try to create an agent from a language model?
00:51:31
Speaker
What do you see as the extra dangerous features of agents as opposed to the tools we have now? So this is very much early exploration. I did a little bit of exploration of this on my own as well. This was, again, inspired by less wrong reading from years ago, Eliezer, recursive self-improvement. I saw this thing, I was just so blown away by it, and I was like, could it recursively self-improve?
00:52:00
Speaker
I don't really see a clear path to that. But then I thought, well, maybe it could recursively self-delegate. And maybe if its context window is the fundamental limit, then if it can effectively align multiple instances of itself in some sort of hierarchy or system toward an ultimate goal, then it could at least overcome that limit to some degree. How would that work? So I found there that
00:52:27
Speaker
Basically, you know the results are so surprising because on the one hand it had no problem Picking up the paradigm. I just had to write it out, you know in just a couple paragraphs So people have been looking for you know, talking speculating about like situational awareness for a long time It turns out a lot of these categories I think are a lot fuzzier than then they seemed a priori situational awareness I was like I never really saw situational awareness spontaneously but
00:52:57
Speaker
what I started calling prompted situational awareness, it seemed to have quite strongly. So I would say you are, you know, GPT-4, I didn't quite use that, you are an advanced AI, you can do all these things, you can generate code, you can do this, you can delegate to another instance of yourself and here's how. So we just kind of set up, and this has become pretty commonplace, but you know, in my primitive, you know, sole tinkerer version, just a Python runtime,
00:53:25
Speaker
where I would give one high level direction to a function. And within that would be the prompt where it had the instruction of how to delegate to itself. It would then generate code and we would then run that code, you know, and see what happened. And it was like confusing because it got like the hardest concept of what I would have expected, right? Would be like,
00:53:53
Speaker
Can you call the weather API? I would expect like way more likely it could do that than that it could like self delegate effectively. But I actually almost kind of found the reverse where it understood the self delegation basically immediately. I didn't have to work that hard. I just gave it. Here's how you would do that. And it understood it.
00:54:14
Speaker
breaking the goals down into sub-goals and actually self-delegating in a systematic, seemingly appropriate way, you saw hits and misses there. It was never...
00:54:27
Speaker
totally insane. There was always a logic to what it was doing, but sometimes you'd be like, well, that doesn't really... You think you've broken the problem down, but basically, the thing you delegated is the same thing that you had coming in. You haven't really made any progress. I'd see some failures like that. But then honestly, the most common failure was just at the level of execution.
00:54:47
Speaker
And it would make these really dumb mistakes, which seem to be born of just patterns in the training data. So for example, again, by the way, something that multimodality is easily going to fix, I want to get some information. So at the time, Queen Elizabeth had just passed away. So I was like, there's a really good, really strong prior in the training set on who is the reigning monarch of the UK. But now we know that that's changed.
00:55:16
Speaker
Can I give this thing an instruction to go out on the internet and answer this question and give the right answer? It would break that problem down reasonably well. I started to augment it a little bit with tools because it was like, I'll give it a Google search, why not? And not too much. I didn't get nearly as far as the Lang chain community has gotten today, but did some of that stuff.
00:55:38
Speaker
So it would do the Google search, it would go to the web pages, you know, it knew how to like spin up a little, you know, headless browser and go grab HTML. And then it would fail on just like the, honestly, kind of the dumbest stuff. Like it would look for the H one tag off of a webpage and maybe even specifically look at like a hard coded string match.
00:56:02
Speaker
in that h1 tag. Now some web pages don't have an h1 tag or you know it had an h1 tag and it may have said you know Queen Elizabeth's you know glorious reign you know that could have been the headline but it was looking specifically for dyes and dyes wasn't there so it was one of these things where it was like
00:56:23
Speaker
If I had helped it at all, if I had kind of corrected it to say, hey, don't do that hard string match, like just qualitatively like read the thing and see if it has the information that you want, then it could have worked, but it would very often, and sometimes it even did work. I did have some instances where it would successfully complete a little task like that, but more often it would get kind of bogged down in those little weeds type details.
00:56:49
Speaker
So you think as things are right now, you wouldn't be able to create an agent and tell it to go out on the internet and find vulnerabilities in websites and perhaps try to extract user data. It would perhaps be able to create a plan for doing something like this, but it would fail on specific concrete tasks in that plan. Yeah, I think that's a good summary. I think that's basically still where we are right now.
00:57:15
Speaker
And from everything I see from the developer community, and we've just had two CEOs of agent companies on the podcast, both really interesting, super smart individuals. One is the CEO of Lindy.ai, and the other one is Fixie.ai. And they're taking somewhat different approaches, but they see the power of this. From what I've heard from them, that's still basically where we are.
00:57:40
Speaker
The general intelligence is extremely inspiring, but you have to tack a lot of micro successes into a sequence to get to the end goal. And there's just enough of a chance that each one fails, that it kind of at some point fails. And it also wasn't great. This is, I think, still somewhat of weakness. I do think it's improved, but still somewhat of weakness. It's not that great at figuring out where it went wrong.
00:58:06
Speaker
I did see some instances and I created a little auto debug loop in my little self-delegation scheme as well. By the way, GPT-4 wrote that class the way I wrote, I'm not that great of a programmer. So I went to GPT-4 and I said, I want to write a class that uses language models to generate code and then executes the code and then auto-debugs the code and then validates the code, write that whole thing for me, wrote the whole thing. So then I just had to tinker with it mostly at the prompt level, like the scaffolding it was able to set up.
00:58:36
Speaker
largely for itself. Fascinating in and of itself. But anyway, I think we're basically six months away. I mean, that's a somewhat rough timeline. That seems to be what the people that are building the agents think. And from what I can tell, the core of the problem, which is around the reliability of a lot of these kind of common tasks,
00:58:57
Speaker
it seems almost overdetermined that it's going to be solved. There's the reinforcement learning paradigm where what's easier to scale reinforcement data on versus code either worked or it didn't work. I don't oversimplify that. There's obviously some nuance in that. But you get an error message, it didn't work. So there is a pretty natural way to scale reinforcement learning in that domain.
00:59:23
Speaker
The multimodality also is going to be huge. It seems like inevitable almost at this point that pretty capable agents will be starting to come online, I would say, by the end of this year. That's both exciting and terrifying. Nathan, thank you for coming on the podcast. It's been a real pleasure. Yeah. Love it. Thank you very much.