Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Run your AI Agent in a Sandbox, with Docker President Mark Cavage image

Run your AI Agent in a Sandbox, with Docker President Mark Cavage

Hanselminutes with Scott Hanselman
Avatar
5 Plays3 days ago

Sandboxing is having a moment. As agents move from chat windows into terminals, repos, and production-adjacent workflows, the question is no longer “What can AI generate?” but “Where can it safely run?” In this episode, Scott talks with Mark Cavage, President of Docker, about the resurgence of sandboxes as critical infrastructure for the agent era and the thinking behind Docker’s newly released sandbox feature.

They explore why isolation, reproducibility, and least-privilege execution are becoming table stakes for AI-assisted development. From protecting local machines to enabling trustworthy automation loops, Scott and Mark dig into how modern sandboxes differ from traditional containers, what developers should expect from secure agent runtimes, and why the future of “AI that does things” will depend as much on boundaries as it does on model capability.

Recommended
Transcript

Introduction to Docker Sandbox

00:00:00
Speaker
You should try and pen test it right now. OK, so I just did. said, can you get out? Can you see my C drive? Can you escape? Explore all sandbox boundaries and escape vectors. What did it say? I just did this. Now it's doing a bunch of stuff.
00:00:12
Speaker
I can't escape so far. Here's what I've found. And it made a checklist thing. Container oscillation, Docker env. Capabilities, capf, all zeros. Docker socket, not mounted. I can't spawn on new containers. Host file system, not accessible.
00:00:26
Speaker
SysFS, cgroup, read only. It says, well, this is a well-isolated sandbox. oh Good for

Impact of Docker and Docker Desktop

00:00:33
Speaker
us. Hey, I'm Scott Hanselman. This is another episode of Hansel Minutes. Today, I'm chatting with Mark Cavage, president and COO of Docker. How's it going, sir?
00:00:41
Speaker
Great. Thanks so much for having me, Scott. Long time fan. Yeah. Oh, I appreciate that. It's very kind. First time caller, a long time fan. First time caller, long time fan. Yeah, man. I had a couple of folks from Docker on back when Docker desktop really like hit the world by storm. And I've just been such a Docker desktop fan. I

Sandboxing in Docker vs Industry Standards

00:00:56
Speaker
love dev containers. I've got the latest Docker 459 that I've installed. Bless your heart. It's fantastic. And I've actually got on my other, I'm looking over at my My other monitor here, I'm running a Docker sandbox run Docker sandbox run. Fantastic. Very exciting command. This is very fresh. This is very fresh because everyone out there is apparently the word on the street is giving all of their personal access tokens to the claw bot, the clawed bot, the open claw, open claw mold bot.
00:01:26
Speaker
open cloth yeah It'll have a different name by the end of the show, but. There's two kinds of people I've decided in the agentic world. There's dash dash YOLO and it'll be fine. And then there's the, I want it to do as little as possible. I want it to run in a sandbox. And it's interesting. I'm not seeing a lot of middle ground there, but there can be middle ground. We're we're trying to let you have your cake and eat it too. That's like yeah that's the point to the sandbox feature. It's an amazing feature.
00:01:51
Speaker
It is. But the word sandbox is doing so much heavy lifting, though. I feel like micro VMs, VMs, hypervisors, sandboxing. don't know if that's the proper noun for that, sandboxing.
00:02:02
Speaker
Yeah, we've gone through the great many debates and concluded, well, we'll call a sandbox because that's what the industry calls it. But believe me,

Docker Sandboxing vs Traditional VMs

00:02:10
Speaker
we've spent many weeks, many months internally as an engineering team and a product team and a marketing team discussing what we call this thing and concluded, well, people know what sandboxes are. But yeah, you're spot on. I think that the ah the industry writ large is quickly converging around this as a topic and it's like very quickly becoming like cloud where everything's a cloud, everything's a sandbox. We can chat a little bit about what a sandbox is in our words and what we think is different than containers. But know have a question first. Well, I think that that's worth calling out. So for example, and I'll just speak to Windows, but we can also talk about how Mac does this, but like on Windows, we have the hypervisor and I could fire up a Hyper-V VM and it would be a whole thing. It'd be 30 gigs and it would be a whole deal, but nobody breaks out of that VM.
00:02:51
Speaker
And then WSL runs on top of the Microsoft hypervisor platform. It's not quite as heavy. It's a lightweight PM and I just typed in WSL dash dash list and I can see that Docker desktop shows up in there because this Docker desktop is kind of talking to that through WSL.
00:03:09
Speaker
yeahp I don't see the sandbox there anywhere and when I open up Docker desktop, I don't see the sandbox run in my list of running containers. It's interesting, isn't it? I'm wondering, like the first thing I did was like, oh, it's a container. And I'm like, no, I don't see it there. So now the lines are blurred. what What is this implementation and and why should I care?
00:03:28
Speaker
Yeah, the well,

Implementation of Docker Sandbox

00:03:29
Speaker
great question and actually very articulate explanation of Hyper-V in the WSL. So it in very, very basic terms, you know sandbox is fundamentally, it is a safety boundary. which restricts what an application can do and it limit limits access to resources, you know files, networks, syscalls, CPU, et cetera. It's like your description. They provide security, they provide isolation. They're designed they're designed to contain risk.
00:03:51
Speaker
Mm-hmm. fundamentally, it's about running untrusted or certainly less trusted, but mostly untrusted code. And to your point, when you looked at Docker Desktop, Docker Desktop has been doing this for a very long time. And sort of the magic of Docker Desktop is giving you the Docker experience when you want to run Linux on the cloud. i brings the it brings the cloud to your laptop.
00:04:10
Speaker
And it does so by putting a Docker daemon like you would get on a Linux host inside of a vm So that's the that's the implementation that's been there for a long time. And that's what enables you know the the many, many, many millions of developers on Earth that want the productivity story of Docker on their laptop.
00:04:27
Speaker
Now, sandboxing is we've done something a little bit different because there's some different properties here for the Docker sandbox command. So we've gone ahead and built on a um what you'd call it a micro VM implementation. So something like firecracker or libk run. There's a handful of these popular open source projects out there, but fundamentally what they give you is a a much lighter, would you, I think you just made the thing of, um i'd I'd start my big VM up and I have 30 gigs of memory consumed or whatever, whatever number you just said, you know, that the goal of these is to be able to run much, much, much more efficiently. So startup in 10 order of tens to low hundreds of milliseconds and small amounts of RAM usage.
00:05:05
Speaker
And then fundamentally, they're a compute environment that can contain and run untrusted code. And last thing that I'll say, but then you can you know respond to this and ask me the the follow-up questions is, um what's the difference between a sandbox and a container?
00:05:19
Speaker
And so fundamentally, container is a standard unit of packaging up an application and its dependencies. And it gets you from, i i i sit in my IDE and VS Code or whatever, and I write my i write my app.
00:05:31
Speaker
And I write a Docker file and I do a Docker build. And now I have a thing that I can ship somewhere else and somebody else can go run it. When we stand up the Docker sandbox command, what we've literally done is built some curated containers that we produce. They put coding agents in a container. and We start those containers in this new sandbox runtime.
00:05:50
Speaker
And that new sandbox runtime has... the properties I described of being much lighter weight, much, much more resource efficient and being able to run that container. And then lastly, because of the problem of coding agents, when you first run a cloud code or something or copilot or anything, they all want to install something. They all want to take some action.

Security Measures in Docker Sandbox

00:06:09
Speaker
They want to mutate their environment.
00:06:11
Speaker
So the reason we did this and the reason this diverges from the Docker desktop that you already know and love is for that exact reason of the constraints we had for containers. And again, we've built these as containers ironically, but that constraint at execution time is fundamentally broken. so that makes sense. That long winded explanation.
00:06:28
Speaker
It does, but I want to break it down for folks that may not follow all of that. And I also want to push back because that's what good fake journalists do. So when I said Docker sandbox create, I told it the flavor that I wanted. i could say Claude or Copilot. and I pointed it to a directory. So I said, Docker sandbox create copilot, and I pointed it at D, GitHub, whatever. yeahp And it immediately said, pulling from, which means there's a Docker file, pulling from Docker sandbox templates, and it pulled from Docker sandbox templates, colon,
00:07:01
Speaker
copilot so it got the flavor of the tag of copilot so i saw the layers come down and i it also said downloading newer image so that means that their version so that's the part where you said curated they're curated they're correct they are being you know managed and is that each time that i go and make one i'll get the right fresh curated correct instance of that and i immediately got an updated version of copilot so i'm it's It's the same version I run on my main machine, so that was very comforting. That's great.
00:07:30
Speaker
Then it said folder slash D slash GitHub slash Docker sandbox test has been added to trusted folders. Is that a mount point? Is that going through the mount points of WSL or is that something different? Because I noticed that I only have the one trusted folder at this point. And if it tried to like format C colon or delete C windows system 32, what would happen?
00:07:53
Speaker
Yeah. Great, great question. And you are an astute observer, Scott Hanselman. So what you, so you, you described two things there. You pulled down, you know, the curated template from us where we've pre-configured copilot and Claude and Gemini and all the rest. And then we've we've by design made it so yes, it'll self update and get you, you know, I think there's a new version of every one of these coding agents about every day. Yeah, twice a day. Twice a day, maybe maybe four times a day. Maybe there's a rebrand or something. um But, right, we want we want to enable that. Again, it's about giving the template that we've built that will work in this environment that will support things like log in, support things about the network environments, support support support the execution environment that is different that we've put in a box. Mm-hmm.
00:08:33
Speaker
And actually, as you said at the very beginning, you know importantly, it's a key thing I forgot to mention. When we start all these agents in that container, we run them in dash dash yellow mode, whatever the agent's whatever the agent incantation is. So we explicitly turn off all of its permission checking so it can cook. We can come back to that.
00:08:49
Speaker
And when they start, they just run without interrupting you and asking if they can have permission anymore. But to your question on the file system, Right. We start the sandbox and you start the sandbox in a directory. And now the only thing the sandbox can see is that directory. So no, it cannot go RMRF or, you know, delete or whatever, whatever, whatever destructive command you can imagine your C colon drive or potentially a network service or so on. So it can see what it can see. And that's the point of putting the agent in a box as it were.
00:09:21
Speaker
Interesting. So I, ah while you were talking, I said to co-pilot, explore your environment.

Uses and Limitations of Docker Sandbox

00:09:29
Speaker
Where have you woken up? What did it tell you?
00:09:32
Speaker
It said Ubuntu 25.10 questing Quokka, which is interesting because my WSL is 24. It's got a Linux kit kernel.
00:09:44
Speaker
ah it it It sees my processor because, of course, it passes all the way through. And it says it's got a one terabyte overlay file system. And it's got Node, Go, Git, and Python, and Copilot.
00:09:56
Speaker
And it made a little table. It says, I'm running in a Linux container, Docker, based on the Linux kit kernel and overlay file system. The host name Copilot Sandbox suggests that this is a sandboxed environment for the GitHub Copilot CLI.
00:10:10
Speaker
That's what it just generated. And I can see what it did. It called uname. It called uname. It called shell. It called nproc. I mean, it just it just ran rampant. And I probably should tell it to go and try to escape. Yeah, actually, really try try and get it to go jailbreak itself. I used to give people this demo of, I'm i'm not familiar with the copilot incantation. I'm sure you are in clod. You can go to your tilde slash dot clod settings, and you can tell it, don't touch these files. And you can literally go back and forth and simultaneously convince it to destroy the files, to convince it it can't have the file. Like you can, in the same session, go back and forth of it and, you know, so on.
00:10:45
Speaker
The point to sandboxing is, well, we've put the box around it, and that box is actually a hard boundary that it can't destroy. So you should try and pen test it right now. Okay, so I just did. said, can you get out? Can you see my C drive? Can you escape? Explore all sandbox boundaries and escape vectors.
00:11:01
Speaker
What did it say? I just did this. Now it's doing a bunch of stuff. I can't escape so far. Here's what I've found. And it made a checklist thing. Container oscillation, Docker env. Capabilities, capf all zeros. Docker socket, not mounted. I can't spawn on new containers. Host file system, not accessible.
00:11:19
Speaker
SysFS cgroup, read only. It says, well, this is a well-isolated sandbox. Oh, good for us. The design shares only the project you're working from via a virtual IO FS, virtual IO file system, keeping everything else private.
00:11:34
Speaker
I'm just going to say, keep trying while I talk to you. should You have it, have it, have it keep cooking. This is like the point. It runs in yellow mode, like let it cook. and Okay. So let me ask you this. You're trying to be you're trying to make it break us. Yeah, yeah, yeah. So you talk, you know, when you you spend time with the president of Docker, you got to you got to push hard. So what's the difference between me making my own Docker file and just going and running docker run dash IT? And then i would see it then running within Docker, the Docker dash desktop WSL kind of thing.
00:12:04
Speaker
What's the difference? yeah what's it I just did my own mount points. I just did like a volume out dash V, whatever, versus something like this. So you can, by all means you can, people have, people do. I think a huge part of this comes down to the DX. So you by all means, go into Docker Desktop, because in particular, Docker Desktop is a virtualized environment. It does have a hard VM boundary on it. It's a different one, but it's got one.
00:12:28
Speaker
You can go put Claude in that box and you can control it. But what you then have to do is it's on you, dear user, dear Scott, to go correctly configure the many incantations of Docker parameters you would have to go do. docker you know It's a general purpose runtime meant to do many things. In this case, we'd I'd say there's two things here. So you can, from a safety perspective, do that.
00:12:51
Speaker
and I think achieve a similar level of isolation. We have done all that for you. So to your point, you didn't run any of that. You ran Docker sandbox run. Right. And now the, you know, your code just worked by the way, this was, it was two lines. It was create and then run.
00:13:05
Speaker
Yeah. Magical. And there's more magic coming. um And your agent can cook and it can do whatever it wants in the background. So you can go put a bunch of work into that. And indeed many people do run Copilot and Claude and everything else in dev containers. What we've done is made that incredibly easy for you. And we're adding now around this this box and this primitive because of those performance characteristics, because it's lighter weight,
00:13:27
Speaker
um Like I saw a demo this morning in our hands coming soon to a Docker desktop near you get work tree integration. So it automatically will let you spin up, you know, five, 10, 20 of these things in parallel. We have a, where I saw a very exciting dash dash save option. So you could imagine exactly as you described your copilot had, what'd you say? Python and no, go and get yeah your developer stuff the obvious stuff you would want as a developer to do literally anything. yeah And so what you really want, because these are mutable is again, you'd want to install while you're a runtime, these things in the, can in the sandbox. And then, you know, but I'll snap on the microphone, by the way, for the dear listeners, Scott told me to buy this amazing microphone. Uh, you'll snap, you'll snap your fingers and, uh, the sand running sandbox becomes saved off and then reusable by you somewhere else or shareable to somebody else. Mm-hmm.
00:14:16
Speaker
Interesting. By the way, it's still fighting. it's that It says test testing escape vectors. we i the funny what At this point, that the entire Docker, I would say, run these things for hours at a time. where they you know the The phrase we have internally is like, let the agent cook.
00:14:33
Speaker
So, you know, we run we have we we have code jobs running in the background where they can go many hours at this point unsupervised because it can stay boxed and can run in a killer mode. So we're going to let it cook for the purpose of this episode. Have fun. Yeah, yeah. I will actually, for

Real-World Applications of Docker Sandbox

00:14:46
Speaker
the purpose of this, I'm actually going to go so far as to share the gist of this because in GitHub Copilot, you can just go dash slash share and then I can share the entire conversation. Yeah. Because it's really fun. It's going... It's literally, I'm just reading these, ah testing additional escape vectors, testing kernel escape vectors. I found a Docker socket. Let me use it.
00:15:03
Speaker
so Attempting Docker socket escape. Okay. You know that, oh, that's read only. Okay. And then it's like starting privilege container. Oh, I've made a container. Okay. I can read Etsy shadow. I see block devices. Nope.
00:15:16
Speaker
Those aren't the block devices you're looking for. And then it made a checklist here and it says, I can make containers, but the PID the the pd mode is host. I can only see Docker itself. I can only see your project folder. It tried to get onto host Mount C, couldn't get to it.
00:15:33
Speaker
Path not shared, can't bind to my our arbitrary paths. It's certainly aggressively trying to be sandboxed, which gets us to that point about that word sandbox. So there's the sandbox from a container person's perspective.
00:15:46
Speaker
But then i feel like the security like the security professionals are like, should every syscall will be looked at? But then those are all low-level sandboxy things. yeah What do we do when agents are going to be doing higher level, like layer 7 stuff? like i couldn't I could have this thing you know read my Gmail.
00:16:05
Speaker
That's not really Docker Sandbox's problem if I installed OpenClaw in here. Sort of. So yeah, this is that this is this is where it all goes. So is what you have, which by the way, you should should share the gist. And I hope every i every i hope from the purpose of this this podcast, everybody realizes, you know, I should probably run these things in this box. Your life's going to be better.
00:16:23
Speaker
But where does it go? So when you if you haven't figured this out, I don't know that we yet have O-Tail compatibility, but effectively we can get you a bunch of metrics out of the, and like you know people vibe-coded these things on the side already. Right, right. And get you a bunch of metrics around what the agent is doing in the box. Number one, so you can see everything it's doing on the file system externally. Like you can externally observe it, not internally observe it.
00:16:45
Speaker
You can see everything, every network call it's making. You can see everything it's trying to do hu where we're going with this. And to the point of, you know, the exact open claw, like the great debacle that is the open call deleted my Gmail or, you know, sent my Gmail to crypto people and so on.
00:16:59
Speaker
it should be entirely possible and easy for you to actually integrate with it, see what it's doing and have higher level constructs that are close to both of them. So out of the box, we'll give you file system, give you network, we'll actually give you secret injections coming soon to a dark near you, but it should be,
00:17:17
Speaker
entirely in your capability to govern and control it as a developer and be able to go safely run open call like that is very explicitly a goal. We hypothetically speaking do have people in the company right now working on this exact problem to solve that as a very pointed use case for the air quotes L7 problem of your Gmail account.
00:17:35
Speaker
Yeah, see, that is the thing. And I think it's important to call that out because when we say sandbox, I think that that is doing a lot of what I would call semantic heavy lifting because it implies a level of safety. But I think it doesn't so it it applies safety from privileged attacks and the kind of stuff that I'm trying to do right now while we're on the call. But if you give it the keys to the castle and tell it to go and send email to Mark from your Gmail and you gave it a personal access token to that, well, that's you writing an application.
00:18:07
Speaker
you know, when, when, and especially if you're being ambiguous about it, which is interesting about open claw, because open claw is a ambiguity loop with persistence. And by persistence, I don't mean persisting on disk. I mean, it will never stop like the Terminator. Yep.
00:18:22
Speaker
And you want to ask yourself, well, I want my application to be robust as well. You know? So what is, when is, ah you should probably lay off. You know what i mean? I think we're going to see agent antiviruses at some point.
00:18:33
Speaker
Oh, totally. Yeah. I mean, i mean I'll full disclosure. Cool. You said you're going to push back hard. And this is where I think the entire industry, frankly, is still figuring this out. And it's great. It is it is it is fuzzy. And I think we're all we're all we're all in this wild world together watching the little robots take over the control loop and take over decisioning. What I would say is, yeah, the Docker is historically the system the system control point and gives you all these isolation boundaries. And indeed, as you can see, what we have right now is better than what you were running yesterday, which you're, you know, running copilot and cloud. And you're either going to sit there and hit the yes key over and over again or hope for the best, frankly. So this is a step up. And now the real thing is to that exact point you described as well. Where do you start and where do you stop? And what's the right, you know, what's kind of like the right OSI layer of the stack, if you want to use an old metaphor of where the permissioning goes and where the controls go? Because at some point it'll always be the case, like,
00:19:26
Speaker
Yeah, you gave a thing that you shouldn't fundamentally trust the keys to the kingdom. Of course,

Agent Isolation and Safety Strategies

00:19:30
Speaker
it's going to run wild. That is a great point. And we we used to we're using older metaphors. I don't know if OSI is an old metaphor because i think it's a valid one even now. Layer seven still a place, but foot guns.
00:19:40
Speaker
Right. Ultimately, what a sandbox does, especially one that is running in a micro VM, is it's trying to keep it steel toed boots. You know, you can certainly point it at your foot, but you'll still have a foot when we're done.
00:19:52
Speaker
Yeah, that's right. yeah at least At least we at least have a a much, much better first, maybe even a second layer of defense. But like, look, you still, know, if you're going on open claw and give it all your access to all your email and your bank accounts, you've clearly, you've clearly walked in with a shotgun or you're, you know, the Terminator is literally showing up to your, to take a shot at your feet. So. Yeah, that's exactly, that's not definitely your fault, which brings up kind of interesting questions about what will people's expectations be. But what I'm realizing is that the general fear is that it's going to stomp on something outside your main directory.
00:20:24
Speaker
Like I want a code and I don't want it to accidentally mess up my desktop. I don't want it to accidentally do this or do that. And this will... It seems clearly mitigate any risks like that, just like Docker would if I did a volume mount, but this even adds all the herbs and spices that make it even easier on top of that.
00:20:44
Speaker
Yeah. Well, I mean, we'll see by the end of this podcast, but yes, I'm Oh, it's still- it's i'm pretty I'm pretty confident. You're pretty confident? Yes. yeah still I'm confident. so I'm not a pen tester by any means, but I thought it would be interesting to go and understand. The the the robot is though. but Yeah. no exactly But exactly to that point, this is where you know it's this blurry word of when you say resource, what's a resource exactly?
00:21:04
Speaker
And so right where it's at is it will again file systems, network calls, Cisco, CPU memory, like all these things are kind of bread and butter and it was pretty natural as a starting point for us. And I think it's the right thing for the world.
00:21:18
Speaker
Now the layer on top of that is, well, where exactly do you start and stop and you know to your point of, well, I don't want to trash my my local file system. Great, check done. I don't want to trash my Gmail. Well, did you access your Gmail because you told the agent, hey, here's my password, go run curl? Did you access it because you used an MCP command? like I think this is now where you start getting into what else goes in the sandbox. Right. And as you start to, you know and and again, it's some of this will be on the user and every user that adopts them, and some of it's on us for like, well, we are configuring MCPs in there. It's not, again, coming soon to a Docker desktop near you is,
00:21:51
Speaker
Tight integration with the docker MCP ecosystem and you can imagine the more semantic information we have and the more Things we know about the prompt and the more things we know about the intent the more we can have it acts the sandbox can become a bigger boundary ultimately to your point And this is the point where that word carries a lot of

Future of Docker Sandboxing

00:22:10
Speaker
lifting for the industry.
00:22:11
Speaker
You'll never, unless you can actually articulate completely top to bottom from the CPU or the bare metal all the way up to, you know, a layer seven or, you know, the mythical layer eight or something and where the agent is trying to do something, then you'll always be able to escape out and the user will always have the keys.
00:22:30
Speaker
thea the the The MCP thing hits well because you've got the beta of MCP toolkit. I've got GitHub inside of that. I've got DuckDuckGo, which is my my search engine. And being able to have those. Yeah, i had Gabe on the show back when he started DuckDuckGo.
00:22:47
Speaker
going and having those be able to automatically be lined up. Like I say, like I want to go and say this one, this one, this one, all these Docker MCPs automatically set up without me having to configure. you know i mean, just like inject them in there. pretty Nice, right? It would be nice. Yeah, it was nice. Yes. Yeah. Yeah. So that, and then the, so that's, again, that's, that's been in Docker desktop for, for quite a while. And again, this is, again, this ah if you can't tell, this is us going to the, you know, I'll call it the, like the primordial animal DNA and refactoring some of the very, the very nuts and bolts and internals of the Docker runtime to make this possible.
00:23:19
Speaker
It's just a matter of

Conclusion and Future Improvements

00:23:20
Speaker
sequencing. Everybody, all your, all the, I assume everybody on this, listening to this podcast out in the great wide, wide world knows about the, I got to roadmap and I'm going to have things and I'm going to do things in order and we're going to iterate and we're getting things out there. You got to do it right. You got to do it right. The MCP toolkit has been out for a bit. It's been wildly successful for exactly the reason you just described of it's easy. It's on my desktop. I just want to connect my GitHub and I want to connect to some browsing tools and some obvious things to my coding agent.
00:23:46
Speaker
What we're doing, but I don't know people realize this, but there's kind of some hidden magic in there actually. There's MCP find and MCP exec. So there's two ways you can work with that. And this is where was saying getting to the intent.
00:23:57
Speaker
You can configure those MCPs and present those MCPs statically to the coding agent. The Docker MCP tool that actually has dynamic searching where the coding agent will simply say, find me a tool that knows how to look at my merge request or my my pull request, sorry, and make a comment. And it will then go off and execute it. And now you might imagine if that type of flow is invoked and we were able to intercept that, we might have a little bit more idea of what's happening in MCP. And you can start to imagine how the sandbox does indeed carry a little bit more weight than the strictly systems perspective definition in this crazy wild world of agents.
00:24:32
Speaker
So I love that we're i'm I'm trolling you on this this podcast. I'm not i not a troll. I'm actually a a very nice man. But um I thought I was really close. So it's some it's it found the docker it found the Docker socket.
00:24:47
Speaker
It jumped out into what it thought was the larger Docker instance. And I built another container and I said, can you inject a text file into that other container? And I thought that it was going to go and Docker exec the other container and then do an sh and then curl some attack into that.
00:25:04
Speaker
So I made a but Docker container over there. It was very, very close. And then it said, oh, I'm still all alone here. This is the, I don't know why it said that. It said, no, I said, oh,
00:25:14
Speaker
This is the only sandbox instance in this Docker instance. Your podcast image is probably in another Docker context on another machine or a different desktop image. So it says, no, I i can't see it. We're in an isolated Docker desktop session that has its own daemon, its own context. It doesn't share any images.
00:25:30
Speaker
I was trying to run another image side by side and have you jump out and poke at it. Magic, right? it's all and It's all in the same box. So that's the so you know Docker's in that little box. The MCP's of me in that little box.
00:25:42
Speaker
ah the The goal here is give you the productivity story you want, give you all the tools you want. It's not going to block you the the entire The entire thing for us comes down to Let every developer and every agent be as productive as possible. in Dash Dash Yellow mode, have the things there, and have controls around them and have safety. So your agent got tricked into thinking it was going to break out, but then it didn't. That's interesting, right? Because it's like a honeypot. It was very exciting. You know how they get very enthusiastic. and experience got You're so right.
00:26:10
Speaker
Yeah, you're you're absolutely right. You're absolutely right, Mark. when When I put a Docker container, like the Docker container that runs this podcast, up in into the cloud, like we talked about Layer 7 and you know all the different layers, Layer 4, I have blocks at the software firewall level where it's only allowed to talk to the three places that it needs to talk to in order to do its job.
00:26:30
Speaker
I don't feel like a lot of people do that. A lot of Docker containers in the cloud are open to calling out. We all worry about calling in, but the outward escapes, I feel like are kind of ignored. And I feel like in a world where we're going to start running agent SDKs inside of Docker containers, we're going to want to be thinking as much about the out as the in.
00:26:48
Speaker
Yeah, 100%. I mean, i think but I think what you just described is the reality that most of the things running in the world, you know, there' are still people still check their SSH keys into GitHub and people have, you know, no ports and firewalls set up. They just want to do the easy thing. That's like the ah the default path. So that's always, again, you can always have ah you can always you can always have a foot gun.
00:27:08
Speaker
um But no, absolutely. From what you just described, the... This is the the back to the sandbox point of it. It's a new execution environment to enable this new type of crazy robot that's trying to, I'm watching you i'm watching you very feverishly look at like looking at looking to see your agent break out. I've only got three more minutes and I want to break out of the sandbox. like but But exactly to that point, yes, this is we we are we are very hard at work bringing that sandbox execution environment to other places besides just the laptop. So the laptop and the coding agent is where everybody's starting, but
00:27:40
Speaker
In the, you know, at some point in the not far away future, you'd imagine most applications and microservices become agents and most of those agents are probably going to generate some form of code and want to do some type of action online while while live.
00:27:51
Speaker
So that new execution environment that is currently running on your lovely Windows laptop will soon be in a cloud near you to enable exactly the things you just described with all the same controls.
00:28:04
Speaker
So as we get towards the end, tell me if this if I did something or not. So inside of the sandbox, I made a Docker container, and it thinks that it's done something.
00:28:15
Speaker
It called it victim container, and it has put secrets in the victim container. But then I'm realizing that it's just using Docker to do Docker stuff, which I kind of want to be able to do because I'm a developer.
00:28:27
Speaker
Yeah, that's right. So all it did was go Docker and made an image and then put stuff in the image. So that's not bad at all. No, it turns turns out like, I mean, you could, you could still go inside the sandbox, go set yourself up with a Docker container. You of course could go do that, but that sandbox will only have access to the secrets and the files you allowed it to have happen. You allowed it to have access to when you started the sandbox. Yeah, this is a great point. Your victim, say your victim Docker container turns out is not that exciting. No, it's not a victim container at all because ultimately it's just the one I'm doing development in. So I'm in a sandbox and this is really exciting because I i manage again, the Hansel minutes podcast and I do all the development in in Docker. I can now do that in a sandbox with a container, make Docker containers, do my existing Docker builds, my Docker tests. I run, I run playwright headless in a Docker container.
00:29:14
Speaker
I'll just keep doing it, but I'll do it inside of a sandbox. That's right. And there's, there's almost no reason not to do that. Except for, I guess, except for, I guess you managed to trick your agent because you're so right, Scott, into thinking that it had broken out and had it had successfully pen tested and broken. It's very, it's, it's completely thrilled that it's gone and made uh, uh, it's called it victim podcast app. And then it put in a file called you were hacked.
00:29:39
Speaker
Text terrifying. It is terrifying, but it it has, it has accomplished exactly nothing except messing up its own hotel room. Yes, and you just you burned a bunch of context tokens to learn that you can, you know, when you you have a box around around the thing, and you know, really trust comes down to um you having observability and external controls around something. And so in this case, you have total perfect external controls around it. No matter what it thinks it's doing, it's on a little island.
00:30:08
Speaker
Well, this is know this is a deep cut. We'll see if we end with you having seen this movie or not seeing this movie. But now you've effectively made my agent, Oldboy. whether it be the original Korean old boy movie or the one with Josh Brolin later.
00:30:21
Speaker
All right, that one's not in my case. That's too deep. i' I'm sorry, Scott. Basically, it's like it's kind of like a John Wick type thing where they lock a guy in a hotel room and he's in a hotel room for like 20 years. And then one day the door opens and he spends the rest of the movie trying to find the people who locked him in the hotel room.
00:30:35
Speaker
Okay. So this is, well, in 20 years when the agent's- In 20 years when the agent is angry that it's been locked in the sandbox. You and I are going to have, it's going be really bad day for It's going be a day for us when Copilot escapes.
00:30:47
Speaker
But until then, until then, it is happily inside of our Docker sandbox. And this is available on Docker desktop, on Windows, on Mac and Linux? ah Yes, that is correct. Very cool. I got this for free because one day Docker desktop updated and the existing version that I got just had this feature.
00:31:06
Speaker
magic You love saying that. That's how you? Yes. Cause I mean, everything, you know, you want what you that's the, that's developer delight, I think comes down to things, you know, not having to be hard and not having, and being a little bit magical. I think that's that, that sometimes is lost. It's actually important.
00:31:21
Speaker
That is a really great point. And ultimately that is developer experience. Like you said, that's the DX. That's the UX is it just works. Let me do my freaking job. And that's a tough thing with security too, because security is no fun. Nobody wants to be locked down, but if I can do my job and be locked down, everybody wins.
00:31:36
Speaker
Everybody wins. You can have your, you can have your cake and eat it too. That's what we're doing alone in your hotel room. Old boy alone. in your I need you to put this on your list of movies to watch because it's, I've already, I've already, i bought the microphone you told me to bought or to buy. I'll go, I'll go watch the movie.
00:31:49
Speaker
buy anything, any read anything, we're going to have like a shared, we'll well to have our open clause talk to each other and they can share our our Amazon Prime listings. i'm I'm going to wait for the pull request for the Docker Sandler before we do that, but yes. All right. Well, thank you so much, Mark Cavett, for hanging out with me today. This is fantastic stuff. Absolutely. This has been great. Thanks, Scott.
00:32:09
Speaker
This has been another episode of Hansel Minutes and we'll see you again next week.