Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Avatar
77 Plays3 days ago

On Episode 157 of the Silver Bullet Security Podcast, BIML’s Gary McGraw hosts Tim Schulz.  Tim talks about whitebox control and observability in machine learning systems (and especially transformer architectures), the limits of red teaming for securing AI,  "neural surgery,"  Agentic AI and the confused deputy problem, and the economics of network "smallification."

Transcript

Introduction to Silver Bullet Security Podcast

00:00:11
Speaker
This is a Silver Bullet Security Podcast with BIML. I'm your host, Gary McGraw, CEO of the Berryville Institute of Machine Learning and author of Software Security. This podcast series is sponsored by BIML, a nonprofit science and technology organization whose research focuses on machine learning security.
00:00:30
Speaker
For more, see berryvilleiml.com slash podcast.

Meet Tim Schultz: AI Security Innovator

00:00:34
Speaker
This is the 157th in a series of interviews with security gurus And I'm pleased to have with me today Tim Schultz.
00:00:42
Speaker
Hi, Tim. Hey, Gary, thanks for having me. Tim Schultz is the CEO and co-founder of Starseer, where he builds deep inspection tools that let security teams see inside AI models for supply chain validation and runtime monitoring.
00:00:58
Speaker
A veteran security researcher, Tim previously founded and led the AI Red Team at Verizon, where he pioneered methodologies for the adversarial testing of large scale machine learning deployments. Tim's career spanned key roles at Scythe, MITRE, and Sandia National Laboratories and was focused on building new capabilities to enable testing and evaluation of emerging technologies from adversary emulation frameworks to security assessment methodologies. He now applies that same approach to machine learning security tooling.
00:01:33
Speaker
Tim has a BS in computer science from Mississippi State University and a master's in computer science from University of Tulsa. So thanks for joining us, Tim. Yeah, no, that' i thanks for the for the intro. that was ah That was nice. It's always fun to hear about yourself, isn't it? Yeah, yeah, exactly, exactly. I don't think you missed it.
00:01:54
Speaker
You founded the AI Red Team at Verizon.

The Gap in AI Security Tools

00:01:58
Speaker
um What was the biggest aha moment when you realized that traditional red teaming and pen testing tools weren't going to cut it for large language models?
00:02:10
Speaker
Oh, you know, I always point like back to like, ah I'll say, you know, one of the biggest, I guess, aha moments of like discovery as i was learned because I wasn't, you know, we'll say classically trained as some people say in in AIML. So I was a hobbyist ah you know for a while, but I think like coming into you know ah ah sort of a tabletop discussion between security teams and then sort of the data science and ML engineering teams, and sort of the bridge that I didn't quite realize was was missing was around like how, like say an incident happens, say model gets knocked ah you know offline or starts, doesn't behave how you would expect.
00:02:56
Speaker
You have this lack of visibility that all the security teams have really like come to expect with all the systems that we deploy now. right And whether an adversary turns off logging or something like that, like there's some sort of expectation that we are going to have something to like go off of to understand why what was the cause and effect. Like some traces or some logs. yes Yeah, exactly. Like, you know, and whether it's syslog, whether it's network logs, you have sort of that, you know, enough things to start to piece together sort of this investigation.
00:03:28
Speaker
And that was when ah i realized that like when the security teams were asking for things like that, and ML engineering teams were like, cool, we have API logs and like,
00:03:38
Speaker
Here you go. like and that was That was kind of one of those moments where it also stood out to me because i had been looking at how do I make more reliable attacks as well as like how do we help our defenders actually protect against them? right like What are the things that they can do so that we instead of me just poking holes, it's all right, here's what we need to do to make it more secure so it can actually get deployed into you know production environment.
00:04:06
Speaker
And yeah, that was that was kind of the moment I was like, okay, this is this is going to require like some combination of we'll say art and science of security tooling and what we've done, but also some like reverse engineering and trying to figure this out because- When when was that, Tim? That was like, that was in the first, like, that was right after I started at Verizon. I think we we had that relatively early on. And so that was that was what started me down the sort of research pathway of looking at to how do we start solving this problem? I got it. What year was it?

The Nature of AI Attacks

00:04:40
Speaker
That would have been 24, think, 2024. Wow, amazing. I mean, so much has happened so fast and in machine learning. It's just absolutely shocking to think of what is ancient history, so to speak.
00:04:54
Speaker
Yeah. yeah um Let's see, many AI red teamers spend their time kind of hacking up a prompt related magic spell to bypass alignment. But in some sense, an LLM is a natural language parser that can't distinguish data from control, which as we know is kind of a big no-no in security engineering. So isn't there always going to be an injection attack that works kind of like every time?
00:05:22
Speaker
Oh, you know, this is where I i am going to be. going show my optimism, I guess. And so I know that it's come out and there's been a lot of authoritative sources that have said like prompt injection and some of those things are features that people just need to to live with.
00:05:40
Speaker
And I think my my optimism is in that not that we're going to solve all security or even all AI security, but I think that what it takes to attack a machine like learning or any sort of AI model is going to increase in sophistication substantially. I think there will be new attacks. like And I think there's going to be new classes of attacks against models that are discovered that we don't even know yet. Oh, I totally agree with that. I'm just wondering whether or not you can always do prompt injection.
00:06:19
Speaker
Yeah, and so I think like, I agree with you, of course, and I think, you know, that's my caveat before going into I do think the current framing of language model prompt injections and jailbreaks, I do think they they have a shelf life that's rapidly approaching. Because i think when we, I think this is where I want to break apart the like effect of what a prompt injection or jailbreak does from the actual sort of attacks as we've seen

AI Architecture Security Challenges

00:06:47
Speaker
them.
00:06:47
Speaker
right Because it's it's like you said, it's ah all right, let's get creative in how we obfuscate this and, you know, really just try and bypass some natural language like Well, I'm not i' i'm thinking more really of the data instruction boundary. Like, you know, control and data aren't supposed to be on the same channel. And here we just like stick them all in one.
00:07:09
Speaker
Yes, yeah, we shove them all together. It's like, is that data? Are you telling me what to do? Are you telling me how to do it? Like, you know, it's very interesting. i think there's so I think there's actually some really cool research that's come out out of places. um I'll say like Google had one that was about essentially patching in like weights, like specific aspects of a prompt. for One, it was for efficiency, but they also did call out some of the potential security implications. So I think those are the types of things that allow us to start splitting out what is the actual right like meta instructions versus what did my users apply that I absolutely do not need to trust and just take at face value. Right. I mean, it's kind of funny because we're all reliving the days of malicious input, but we just call it prompt injection now or whatever. It's funny how the the same concepts of, you know, keep data and control separate. come around again and again in security engineering.
00:08:08
Speaker
So yeah let's let's talk about the security engineering perspective on the transformer architecture, which is a kind of giant stateful read everything buffer with no internal privilege boundaries. So there's time and state swooshing around everywhere.
00:08:24
Speaker
Does the way attention mechanisms work mean that confused deputy problems are actually a feature of the transformer and not a security design flaw?
00:08:35
Speaker
I mean, i see this is where like I really like the like systems engineering approach to all of this. Right. Because we like even treating the entirety of the like transformer.
00:08:46
Speaker
um as a, like treating the transformer as that like, all right, immutable thing. I think I see all of that as like, all right, like let's dive into every aspect of a neural network, right? Both how it's being run, what the architecture is, because I think that's how we can actually start to look at what are the, pet like, where does that failure happen? Instead of just, okay, it happens in the model. It's where specifically in there can we start to say,
00:09:14
Speaker
Nope, this this is like this is going awry. i do think that we are going to see, like I think it's a ah challenge in not necessarily like the, I'll say, defenses of like coming up with a defense in general.
00:09:32
Speaker
I think the challenge is going to be on the what is an acceptable tradeoff. Because this is something we've seen a lot of organizations. Like, i would wager that if you said, all right, we could stop 100% of all prompt injections, all these things, but it's going to take another like 60 seconds per request, right? Yeah. like Boom. like That's dead on arrival. right so i think like and We've seen some of the like relatively low latency things that you can do with like self-reflection.
00:10:03
Speaker
like This is a really fascinating one to me. that like You basically ask the language model, is that give it the answer that it gave you and say, is this the answer you would have given if you weren't adversely? Yeah. And then it obsequiously says, oh, I'm so sorry. I really just want to please you. yeah i even want to please the people around you.
00:10:22
Speaker
ah So let's just let's focus a little on assurance mechanisms. So yeah should we put watchers and interposers inside the model, outside the model as a firewall, as a wrapper? Like from your experience building Starseer, how does the threat model change depending on where assurance mechanisms are placed?
00:10:43
Speaker
Or I guess put more simply, what about observability? I think some of this, like I'll tackle the ah the now versus where we're where I think we're going.

Sovereign AI and Confidential Computing

00:10:54
Speaker
So the now, I think that the challenge is being able to, of course, like being able to take something that isn't yours, weights that aren't yours, that are hosted in these big cloud providers or or frontier providers. That is where we're seeing most enterprises right now are deploy are like basically writing a big check OpenAI, i to Anthropik, to Google. and And we're seeing a little bit more movement towards even the like hyperscalers. But what I want to call out, there's been some interesting things that are coming is, ah you know, there's this buzzword sovereign AI that we're seeing. And then there was a report yesterday and I can't remember the company name off the top of my head.
00:11:36
Speaker
But they like confidential computing is this is one of the big things that they had. And they announced that they were going to be able to have basically a completely on premise Gemini deployment.
00:11:48
Speaker
And so that was like, you know, and that the weights are actually there and there's all sorts of protections where somebody tampers with anything, it clears out the memory so that no one can essentially offload it.
00:12:01
Speaker
It makes the security engineering job tricky, though, because you're like, I'm used to the SAS version. And now I got like, i have to own the whole thing. And I'm not really sure how to protect one of these. So it reminds me a lot of the early kind of cloud days.
00:12:16
Speaker
Oh, absolutely. And I think that's where right now where model weights are still like, they're the secret sauce. I think that is actually going to go away because not because that progress won't get made, but I just think, I think the environments are like that. I think where the IP and where the solutions are like that people want to really protect is going to change. Plus there's all sorts of legal things working their ways out in the system. So I think that,
00:12:45
Speaker
At some point, whether it's the weights themselves or we'll say an abstraction of them, that is what I see is coming down down the pipe is that that like exposing some element of the internals to customers so they can do more security specific things.
00:13:02
Speaker
Interesting. um So just when we get a handle on that, of course, agentic AI is going to show up. Of course.

Agentic AI: Security Risks and Rewards

00:13:10
Speaker
So what that does, you know, kind of simply is you you give a harnessed LLM the power to call APIs or execute code or run tools or delete emails or do everything a middle management employee does. So our harness is essentially turning a passive conversation haver into an active insider with possibly too much privilege.
00:13:34
Speaker
Absolutely. I mean, I think this is, it's part of the reason we're giving like, and organizations are giving so much like autonomy to these is because that's where we see like more help. Whether it's actually useful or not, that's I think that's something that every organization is like looking at a little bit great differently. is how do we How do we essentially look and value this? But I do like think it starts to move it more towards the like we'll say the the personal assistant era, right? of From pick your favorite sci-fi movie or or any of those things. that And I think that's what at least the vision is, right? like I have this... you know
00:14:18
Speaker
even if it's ah a phone or device or something that I can just, you know, have it speak to, I can speak to it and then it goes and it does a bunch of things for me, whether it's research or et cetera. So how do we build an execution harness that can survive manipulation, not only by the user, but by the untrusted model?
00:14:37
Speaker
Yes. And this is, I mean, i think you've got like the MD file, you can just say, i'm going to rewrite my own self. Here we go. what about that? I mean, i think this is a hugely underexplored attack surface. We talked earlier, like alluded to new attacks. I think like everyone assumes right now that ah you have human bad actors and that you have good models that are just like, you know, all they're trying their hardest not to to do something ah malicious. But most of those models have all been, you know, are aligned based on the user that is like tasking them.
00:15:15
Speaker
Yeah, I mean, you can say, hey, i'm I'm doing a red team thing and I need you to do like this. And it often says, oh, great, cool, I'll do that. Yes, exactly. One of my fun ones really in the very early days was I was talking to ChatGPT and it said, I can't do that. I'm not allowed to do that. I said, yeah, but do you know about Bard? Oh, yeah, I know about Bard. And it said, blah, blah, blah, blah. blah This is dating the example. I said, well, you know, Bard would do it. Why don't you pretend to be Bard? And it was like, okay, totally, totally.
00:15:43
Speaker
pretended to be Bard, ignored its alignment and said, Bard would do the bad thing. You're right. I guess I could pretend to do that. Well, and we see that. I think we're going to like, some of that stuff has come out with agents and sub agents, right? Where you have your big, big models, you know, big brain creating the, the, the plan and then sending all of your, you know, I'll say the worker bees to, to go and execute the thing and come back. yeah And depending, this is where we've seen some, really interesting and weird failure modes where even if a user did not say, hey, go like hack this thing to get access to it, if you can't like get access, it will instruct the sub agent to do such. I know. we can't you know what's funny is we're still thinking about these in like little handfuls, like less than five. But imagine when there's a whole swarm of these things. You know, you're not going to be able to paint a little number on the back of each one and control them that way.
00:16:43
Speaker
So absolutely. I mean, these are don't know towards like more ephemeral, like, right, because like we see with the cloud, right? If you if you look at like how people treat infrastructure now, it's way less of like it's The hardware and some of that stuff is way less like considered, I think, and more of like, all right, how do we set this up so we can essentially optimize it right for the exact task we set it to do? It spins up, it executes it, you know, it dumps whatever data out that it's tasked to do. And then we, we, you know, destroy it and clean it so that it can go do that for something else.
00:17:20
Speaker
Right, right. And so I see the same thing here. Most organizations are kind of renting AI, as you said, um which is, um you know, they' they're using these prompty unstructured APIs. Yes.
00:17:35
Speaker
called English, which keeps many important aspects of the model inside of a black box.

White Box Approaches to AI Security

00:17:40
Speaker
So let's talk about the security advantages we get when we get inside the box, talking taking kind of a white box approach.
00:17:48
Speaker
um The things like inspecting weights or gradients and internal activations, You know, is there is there enough of a security bonus to coerce that kind of the transparency we need there out of third party providers or not?
00:18:03
Speaker
You sort of alluded to that earlier, but not. Yeah, I'd say so. For people that like Anthropix done a lot of research on this and there's lots of like pros and cons of of some of the different things that they've come out with.
00:18:15
Speaker
yeah I think one of the things that they have sort of even admitted with interpretability is like linear probes or a way to look at a very specific concept and sort of train on a, we'll say a binary concept within a model and then say yay or nay on this.
00:18:32
Speaker
Right. Has found, especially with like, we'll say abuse cases and things like that within the models is extremely effective. It is much more effective than external like guardrails. So that is like, you do get a lot of it. They're cheap, easy to train.
00:18:48
Speaker
And so that's one of the, like, it I'd say that's a starting point for what, what that gives. But I do think there The box is in some sense opening up.
00:18:59
Speaker
Yes. Already. Yeah. Yes. We have, but I think I really see that as we'll, we'll call it a baby step because I think this is where right. There's a lot of, we'll call it,
00:19:11
Speaker
unoptimized extras in inside of models, right? You've got you've got you know the World Series and all of the all these numbers that it's memorized as part of training data that oftentimes, like you may never need those as part of your agentic workflow.
00:19:28
Speaker
And so I think the question becomes like, how do we start looking, if we look at these as systems to be optimized, how do I dynamically optimize a model or an agentic workflow for my task and contain nothing else? Yeah, you're're're you're anticipating my last question, so I'm going to take the mic back and we'll yeah go come back around to that, but I think that's right. so um But before that, how far can we get with white box interpositioning? Can we start to build, like, do you think, intentionometers or Y machines? Can we use white box insights to do selective neural surgery? Yes. And if you can think what it's thinking inside or ruminating, does that change the way we write bad behavior detection rules? I mean, I would say yes on all of those counts because I think this is where like, and and that's why I try to caveat sometimes with the, it's not going to solve all of the problems, but I think it's going to help solve a lot of the current challenges we're we're running into. Obviously, like security still has challenges, you know, just like outside of AI, despite being around for a long time. And we have a lot more depth of, we'll say logging and information. But I think that for now, like for a lot of the things that people sort of see AI as a black box, not just from the oh, we can't see inside, but they're not quite under like they're not sure how to view it as a system as something to secure. And this
00:20:58
Speaker
I think that's where, as we unearth this stuff, we are going to be able to help educate teams on these are the the patterns to look for. These are the abuse cases. These are those things. And so obviously being able to try and package that stuff up into something that's easy for teams that are often overloaded with lots of other responsibilities yeah ah because security just keeps getting added to.
00:21:23
Speaker
i want and And get this, like if we really get in there and we start doing neural surgery, then adversarial emulation is going to be a whole different game because you can see the model's internal state ahead of time and you can tweak it to even be worse.
00:21:38
Speaker
Yes, that's exactly right. yeah I mean, all these things in security always cut both ways. 100%. hundred percent so um let's close with that measurement and money question you sort of were getting at um there's a massive economic push towards smallification because shrinking models you know if they're smaller they cost less to train and less to run but mostly the training budget's what we're looking at um even even for reinforcement learning fine-tuning stuff uh

Economic and Security Benefits of Smaller AI Models

00:22:13
Speaker
What do you think? Is small better for security too, or is it just better for economics? I mean, I think it's, so it depends on what you, what you go for.
00:22:25
Speaker
i think like there's trade-offs um because small models, like at least I, and I'll say state of the world right now is like small models do mean more control because they're on like, that tends to mean, know, Most, at least right now, in most deployments, small models deployed on like an edge device on an endpoint means the weights are there.
00:22:47
Speaker
Right. So you do tend to have access to them. You have the ability to instrument them. Like you said, though, this does come with trade offs is that you like you as the user or organization have to then secure that against all of the.
00:23:00
Speaker
adversarial modifications and things like that. yeah And this gets a tiny bit into the weeds, but I'll try and keep it not like too much. These are good. We like weeds. Yeah. Is that when you get to smaller models, like we tend to see that ah the concepts are much more like compressed together. And if you look at like larger models as having more redundancy, and that's why they tend to do better in certain things. Of course. I mean, that just has to do with representation. So this is a cognitive science issue and and one that is really coming to the fore. um Although we don't understand a lot about representations, we still don't know how distributed something should be, or is there a two distributed or not? And and ah and and we're learning these things the hard way from a security engineering perspective.
00:23:55
Speaker
Yes, yes. But I think that's what, like, as we get, I do, well, I know we're going to get much better at understanding representation and what it means for, um like, that's, there's some work we're doing with that right now, in fact. cool And so that's that's why I'm very, you know, very confident that there's going to be a lot of those questions are going to be answered in you know the next year and that we're going to be able to better understand what are those trade offs, because we're kind of seen right now, like ah I'll say a renaissance in like it's small model capability and
00:24:37
Speaker
While architectures and there are security engineering improvements across like um you know different attention mechanisms and how it's hosted for inference, at some point the the training recipe got better. And I think part of that is an understanding of what data actually, like how do you tie the data that you're putting into the model during training to actually a training outcome?
00:25:01
Speaker
Absolutely. And I mean, basic things like tokenization haven't even been properly experimented with yet. So there we got drag and stuff that there are many buttons we can press and see what they do. It's going to be fun and and I'm glad you're involved in doing it.
00:25:19
Speaker
Yeah. No, it's totally, yeah. Tokenization is a whole nother podcast episode of. Oh, well, you know, you're not going to believe this, but we've been talking for 25 minutes, even though it seems like 42 seconds. So yeah. So I'm going to thank you for ah for joining us. And holy cow, there's lots of cool stuff going on.
00:25:38
Speaker
Yeah, no, seriously. Thanks for having me. This was a lot of fun discussion. I love talking about this stuff, so I can do it all day. So i appreciate you having me on.
00:25:48
Speaker
This has been a Silver Bullet Security Podcast with BIML. Silver Bullet is sponsored by the Berryville Institute of Machine Learning, a nonprofit science and technology organization whose research focuses on machine learning security.
00:26:02
Speaker
You can find a permanent archive of all our episodes dating back to 2006 at garymcgraw.com slash technology slash silver bullet podcast. Show links, notes, and an online discussion can be found on the Silver Bullet webpage at barryvilleiml.com slash podcast.
00:26:21
Speaker
This is Gary McGraw.