Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Generative AI: The New Frontier in Kubernetes Problem-Solving image

Generative AI: The New Frontier in Kubernetes Problem-Solving

S3 E14 · Kubernetes Bytes
Avatar
1.8k Plays1 year ago

In this episode of Kubernetes Bytes, Ryan and Bhavin talk to Kyle Forster CEO at RunWhen - a platform for troubleshooting Kubernetes with Generative AI assistance. The discussion starts by talking about LLMs and GPTs, and why everyone is building a ChatBot powered by AI. They then talk about how Generative AI can act as a co-pilot for SREs working with Kubernetes, and help them troubleshoot things faster.  Watch this episode to learn more about how AI can help you become the 10x engineer / SRE instead of replacing you at your company.   

Join the Kubernetes Bytes slack using: https://bit.ly/k8sbytes

Ready to shop better hydration, use my special link https://zen.ai/apaSnaIFOuee5jScqZ28a03tKKvQiqkyz8mtm9wipoE to save 20% off anything you order.

  • 00:30 Introduction
  • 04:44 Cloud Native News
  • 18:10 Interview with Kyle
  • 58:43 Takeaways

Cloud Native News:  

  1. https://www.crunchydata.com/blog/announcing-crunchy-postgres-for-kubernetes-5-4
  2. https://www.itpro.com/cloud/cloud-security/kubernetes-on-aws-targeted-by-hackers-abusing-legitimate-pentesting-tools
  3. https://www.scmagazine.com/news/cloud-security/scarleteel-targets-aws-fargate-launches-ddos-as-a-service-campaigns
  4. https://www.techtarget.com/searchitoperations/feature/Practice-troubleshooting-Kubernetes-clusters-using-log-data
  5. https://www.hashicorp.com/state-of-the-cloud
  6. https://techcrunch.com/2023/07/12/istio-graduates/
  7. https://www.infoq.com/news/2023/07/instacart-flink-kubernetes/
  8. https://thenewstack.io/how-to-secure-kubernetes-with-kubelinter/

Show Links: 

  1. RunWhen - https://www.runwhen.com/
  2. RunWhen Local - https://docs.runwhen.com/public/runwhen-local/getting-started-with-runwhen-local
  3. Kyle's LinkedIn - https://www.linkedin.com/in/kyforster/
  4. Ugly Command Party - https://www.linkedin.com/feed/update/urn:li:activity:7087375534606712833/
Recommended
Transcript

Introduction and Podcast Overview

00:00:03
Speaker
You are listening to Kubernetes Bites, a podcast bringing you the latest from the world of cloud native data management. My name is Ryan Walner and I'm joined by Bob and Shaw coming to you from Boston, Massachusetts.

Cloud Native News and Fun Anecdotes

00:00:14
Speaker
We'll be sharing our thoughts on recent cloud native news and talking to industry experts about their experiences and challenges managing the wealth of data in today's cloud native ecosystem.
00:00:30
Speaker
Good morning, good afternoon, and good evening wherever you are. We're coming to you from Boston, Massachusetts. Today is July 13th, 2023. I hope everyone is doing well and staying safe. Let's dive into it. Speaking of evenings, we're doing this in the evening. It doesn't happen that often, I feel like. I know. Yeah, this is the only second episode. I have a nightcap in my hand, which is always a good thing. I know you don't trust yourself to drink liquids anymore. I'm going to laugh.
00:00:56
Speaker
I don't know if he caught that, but Bobbin just shirted himself with the water. So take a look at that. That's fun. We had fun. Yeah. Yeah. Go check the YouTube video of the hour of our previous episode. That's good. That's good. What are you even up to?

Travel Plans and Vermont Flooding

00:01:10
Speaker
Uh, I don't know. I'm just keeping busy. Uh, we have AWS summit in New York city. So like going to attend that. Uh, but we're also doing a workshop the day before. So just busy building out the workshop. Uh, yeah, a lot of things just keeping busy with work. I'm excited that in three weeks I have some PTO coming.
00:01:30
Speaker
So after the summit next week, I'll be in Banff, Canada. So is this National Park again? Yeah, yeah. Finally, you know, like you have a number of running number of how many you've been to. I had at some point, but I obviously don't remember it. Here's the idea for your background on your wall. You put the National Park. Yeah, that's a good idea, man. Thank you.
00:01:54
Speaker
That was your story. This is my first trip to Canada. So I'm excited about the band's National Park. I have all the bookings and everything done for a while now. But just today, I realized that, OK, I need a reservation for a shuttle that takes us to one of those points, which I thought wasn't needed. So I need to do some more research. So I'm not as prepared as I usually am. But I'm excited about like more of an adventure that way. Yeah, true. How about you, Ryan?
00:02:26
Speaker
Yeah, speaking of traveling, I have some PTO coming up next week, so I'm pumped about. That was one of the long moto camping trips, although with all the crazy amount of torrential downpouring, this was a whole Northeast trip, so it's New York through...
00:02:42
Speaker
Vermont, New Hampshire, Maine up to the Canadian border. And they had to shut down the entire Vermont section because so many roads, so many homes, so many towns were flooded. It's really just an insane scenario there. So we might, we're basically planning on doing like the first three or four days and then seeing if we could just like go lend a hand to some of the communities and members we know up there.
00:03:05
Speaker
Oh, that's basically shovel sludge and stuff out of their homes, which is just nuts. So yeah, we'll see. Some of those videos were like terrifying. Yeah. Yeah. And you know, where I grew up in Hudson Valley, New York, they also got
00:03:21
Speaker
a ton of rain, which took out roads and all sorts of stuff, which I thought was bad. And that was day one. And then the next day, everything just went nuts in Vermont. And they said, this was more rain than they got in 2011 with Hurricane Irene, which did cause crazy massive flooding up there, but it was worse than that.
00:03:41
Speaker
Uh, just because we've gotten a ton of rain and so everything was already swelled. Right. And then, and then this just like, if you look at the radar, like the, the time-lapse of it, it's just this giant, it didn't move. It just got there. Like, I think some towns got like 10 or 12 inches rainy and then like one day, I know it's insane. Hopefully people are safe, safe. Like they, they knew this was coming. So they were able to like more, but again, you can't do anything about the houses or homes. Right.
00:04:06
Speaker
Yeah, I mean, rebuild, right? At that point, you just evaluate and, you know, figure it out. Not fun. No, not fun. So hopefully we can help out and we'll see if we have time to shoot up there.
00:04:17
Speaker
want us up there, that kind of thing. It's been a week and a half or something. So they shut down those. So it throws a loop in our trip, but that's nothing compared to what they're dealing with. So we have to take it with a grain of salt and just take what we get. But yeah, anyway, moving on from that, we do have an awesome topic today, awesome guest.

Cloud Native News: Hashicorp and CNCF Updates

00:04:37
Speaker
But before we dive into who that guest is, what the topic is, let's dive into our cloud native news. Why don't you kick us off?
00:04:43
Speaker
Yeah, sure. I just have a few quick things, right? Hashicor, everybody's favorite, I don't know, infrastructure escort company. That's what I love them for, like for Terraform. They published a state of the cloud survey that they did with Forrester. The reason I want to talk about, I linked it in our show notes was
00:05:05
Speaker
they are actually talking about multicloud and the advantages of using it. I know there is a lot of noise right now in the ecosystem, depending on who you are talking to, right? About like, oh, cloud repatriation. Again, that's happening. But the scale with which people are talking about it, I don't think it's happening as much. And this report kind of reinforces that. Like there are benefits to multicloud. Again, last week when we spoke to Surya, right? He's like, even startups are moving clouds because of cloud credit. So a good report to check out talks about
00:05:35
Speaker
the things that you absolutely need to figure out. So if you don't have things like reliability and security and cost management figured out, you will have a bad experience. But then if you are taking care of those things, you can take advantage of all the abstraction that's provided by these infrastructure vendors. Yeah, and the report has some interesting things in there like multicloud isn't just about enabling sort of technology, it's also keeping and retaining talent, right?
00:06:00
Speaker
being able to be familiar in one cloud versus the other and being able to enable those developers to use those clouds, things like that. I have some really good numbers and the spend numbers that talk about even in sort of the economic times. I guess not surprising seeing what we have seen, but it's kind of enlightening.
00:06:18
Speaker
Next thing was just a quick note about Istio. I know Istio used to be a non-CNCF thing, which was then donated to CNCF by Google Cloud and IBM. Now it has officially matured or graduated from an incubating status to a graduated project. So it joins the lights of Kubernetes and Flux and Argo CD and those kinds of projects that have a lot of backing from the community. So Istio just joins that club, which
00:06:46
Speaker
Again, we knew it had support, but now it's an official CNCF graduated project. Cool, very cool.
00:06:52
Speaker
A couple of new things, Instacart,

Instacart's Tech Transition and Kube Linter

00:06:55
Speaker
right? Like, I'm sure everybody might have ordered Instacart at some point during COVID, but you didn't? Oh, no, I love it. I still use it. Oh, okay. You love it. Okay. So they actually- Well, I don't have time to get down there, but yeah. Yeah. They actually run some of their infrastructure on Kubernetes. They have a new blog which talks about how they moved from using an AWS managed service with AWS EMR and moved to running Apache Flink on Kubernetes, and they talk about the
00:07:20
Speaker
Benefits that they saw in terms of operational efficiency so like their ops people are not spending as much time managing and building those things developer productivity also shot up so there were a few benefits again. It's at a different scale right like if you're a 5% team.
00:07:37
Speaker
I don't know if running things on your own on Kubernetes makes sense, but for a company that has more people with the required skill set, this might make sense. That was a good read. Yeah. In the background of that whole company, the amount of scale that they had so quickly because of the pandemic, that's an interesting story in and of itself.
00:07:57
Speaker
I have a funny story as well about Instacart because I used it a lot during the pandemic. I have some small driveway on a steep driveway. One of the Instacart drivers used my garage area to back out, instead of just backing down the driveway, which a lot of people do, and wound up running into my house.
00:08:20
Speaker
just like taking out part of like the two white floors on the side. I was like, what? And it just left and then left. So I had like a camera. And so I called Instagram. I was like, what do I do about it? Like, is this, is this a you problem or a me problem? And basically just like they said, it's just like any accident you can like report it and do insurance. And I was like, Oh, so basically I couldn't do anything.
00:08:42
Speaker
Yeah. Like, would you sue Instacart for insurance money or the driver? No, I think it's just like an accident to the driver. Like they have the driver information. I didn't bother with it because I'm like, at the end of the day, all I wish is that they just like stopped and knocked on my door until I did this thing. But nope, hit and run, I guess on my ground.
00:09:06
Speaker
And the last thing was just talking about an open source project called Kube Linter. I know it is relevant to what we are going to discuss today. I haven't spoken about what the topic is, but it helps you analyze your YAML files and Helm charts that you might be creating. It checks it for just security vulnerabilities, whether it's written correctly, whether it has any issues inside those YAML files.
00:09:29
Speaker
It identifies all of that for you and can be customized as well. So a good project to use. It's still in its alpha phase, but something that I wanted to highlight given the topic that we have for today. Yeah, we all know YAML is a nightmare when you're dealing with a lot of it. I remember you did a talk at KubeCon, right?
00:09:46
Speaker
Oh, I did a, I did a debates, uh, YAML versus Jason or something. I forgot which side you were on. There was no winning side. Okay. No, no, no. I think I had to play, I think I had to play for the YAML side. Okay. Okay. We were at, you know, keep gun. So, you know, a lot of people had, you know, scars till fresh. Tough one, but yeah. Okay.
00:10:08
Speaker
Yeah, but that's it for me. Cool. Um, uh, a couple of things for me, crunchy post graphs for Kubernetes. Um, so databases on Kubernetes, uh, near and dear topic to this podcast for us, they released version five dot four. Um,
00:10:22
Speaker
I've always been a fan of what they've been doing. They had an operator early on, all sorts of stuff, but 5.4 has support for ARM, which I think is pretty cool. And a lot of performance stuff that I don't fully understand, so I'm not gonna go into it, but things like huge pages and table spaces and all that stuff. You're into that. And I like that in this document, sorry, in this release, they say documentation enhancements. Just like, oh, that's good. It feels good in my heart there. Just top level in the release.
00:10:50
Speaker
Um, the, the next one is, is all about sort of, uh, AWS being targeted by sort of new hacking techniques, um, and malware and

Security Concerns and New Hacking Techniques

00:10:59
Speaker
things like that. So, uh, I might not say this correctly, but scarlet eel, or it can be scarlet teal. I don't know. I'm not sure. I like scarlet eel sounds way cooler and I'm going to stick with it. It looks like a sushi that you might get a noodle at a noodle shop.
00:11:20
Speaker
This whole sort of approach was identified in I think February, 2023, but it's evolved. So this whole sort of attack has evolved to now target, again, not a new thing for us, misconfigured clusters and AWS policies. And there's a lot of specifics around this new approach being very specific to AWS
00:11:43
Speaker
Fargate, so it can kind of get into containers, identify if it's on a Fargate container, branch out from there, get more credentials. And they're actually using some well-known exploitation tools that are available to just me, anybody here, right? Packoo is one of them, and some others. So they're kind of taking a modernized approach to this.
00:12:06
Speaker
And again, this is all fueled by a lot of the complexities that are in this, which is kind of ties back to our previous episode, if you want to go listen to that. So I just thought it was really interesting and that it's specifically targeting AWS and Fargate and things like that. So go, go spend some time.
00:12:26
Speaker
Or at least like, if you are looking for your next job and thinking, okay, what you might do with all this AI boss, like go just beef up on security skills. I think I'm learning that as part of my day job as well. Like I was messing around with, uh, SCC policy is an open shift today. So like security is frustrating in and of itself. And that's like a simple side of things.
00:12:50
Speaker
Yeah, if you're new to this entire community, Kubernetes in general, go dive in there.

Kubernetes Troubleshooting and AI Tools Introduction

00:12:55
Speaker
I think it's a huge part of it, which leads me to my next news item, which is about learning how to troubleshoot Kubernetes. So we've talked about the CK exam on this, this article actually gives you a really kind of in depth view about how things are troubleshooted, how to understand some things like crash loop backoffs. And it talks about the fact that
00:13:14
Speaker
30% of that exam is troubleshooting, right? You have to know what's going on. And I think it just, again, ties to, A, this episode that we're going to talk about with the complexities and how to troubleshoot and a lot of people and a lot of SREs and a lot of DevOps engineers spending time in that zone of
00:13:32
Speaker
banging their, I should say fingers, but heads probably on the keyboard troubleshooting Kubernetes. So a really interesting article. If you haven't taken the CKR interested or just in that topic in general, we'll take a look at it. It gives you some actual examples and things like that.
00:13:47
Speaker
Yes, that is the news. We do have a couple of things we want to mention in this week's Q Weekly. They do have a GitOps microsurvey. You know, just call it a survey. Microservices? A survey for GitOps? I don't know. It's containerized. It's a survey.
00:14:05
Speaker
Gone down a rat hole there. But anyway, get out of microsurvey. If you have touched these tools, we've talked a lot about them on the show lately. Go take that survey, just kind of help out the CNCF, get a good sort of idea of what people are up to and what they're poking at and things like that. And I'll turn it over to you for the last bit.
00:14:24
Speaker
Yeah, I think last thing was around Kubernetes by sorry. I said I'll turn it over for the last bit. This is what happens when we record this show after hours.
00:14:38
Speaker
Maybe it's more fun that way. Yeah, it is. Are you saying the morning and afternoon episodes are not fun? Come on, Ryan. So we realized after the last episode when we shared the news that we have a new Slack channel or Slack workspace for everybody to join and collaborate. We realized the link that we had wasn't really an invite link. So we fixed that. Yeah. Again, part of the journey, right? We're learning in public here.
00:15:03
Speaker
But now if you go to bit.ly slash k8sbytes.
00:15:10
Speaker
Kubernetes fights with the smaller Kubernetes version. You should be able to join our channel. So we'll include it in our show notes. I'm sure we'll post through our social channels as well. We have a QR code for it, so it's easy to sign up. But yeah, come join us. Come join us. Give us episode ideas. Tell us what you like. Don't like. Everything is on there. Although we feel bad if you say something that you didn't like. So be careful.
00:15:37
Speaker
We want to learn from what you think. That is absolutely something we want to do. Anyway, so go check that out. We'll put it in the show notes as well. Let's move on to today's topic. Today's topic is all about using new AI tooling and approaches that we see commonly with things like chat GPD and other things like that.
00:15:57
Speaker
using that to basically help troubleshoot and identify problems and fix problems in Kubernetes. Really cool topic. The founder of Runwin, Kyle Forster, is the CEO there at Runwin. He'll talk about a little bit about the company, but also more generally, he was a senior director at Google, Kubernetes specifically, and a founder of BigSwish Network. So definitely great guest, and we're excited to have him on and talk about this topic.
00:16:27
Speaker
itching to do some AI topics. We have been doing the chat GPD question for a while, but then I was really excited for, and I am actually, not was, but I'm really excited for this episode where we talk about how AI is actually impacting the Kubernetes ecosystem. So I'm excited. Yeah, very, very cool stuff. And we are going to forego this week's chat TV question for our guests because the entire episode is about AI and you can only take so much AI.
00:16:53
Speaker
We will talk about it a little bit after we do the interview. Why don't we get Kyle on the show? We'll be right back after this short break. As long-time listeners of the Kubernetes Bites podcast know, I like to visit different national parks and go on day hikes. As part of these hikes, it's always necessary to hydrate during and after it's done.
00:17:16
Speaker
This is where our next sponsor comes in, LiquidIV. I've been using LiquidIV since last year on all of my national park trips because it's really easy to carry and I don't have to worry about buying and carrying Gatorade bottles with me. A single stick of LiquidIV in 16 ounces of water hydrates two times faster than water and has more electrolytes than ever.
00:17:40
Speaker
The best part is I can choose my own flavor. Personally, I like passion fruit, but they have 12 different options available. If you want to change the way you hydrate when you're outside, you can get 20% off when you go to liquidiv.com and use code KubernetesBytes at checkout. That's 20% off anything you order when you shop better hydration today using promo code KubernetesBytes at liquidiv.com. And we are back.

AI Tools in Kubernetes: Kyle Forster's Insights

00:18:10
Speaker
All right. Welcome to Kubernetes Spites Kyle. It's a pleasure to have you here. Why don't you give our listeners and watchers a little introduction of who you are and what you're up to. Sure. Well, thank you so much for having me on as a follower of the show for a while. I appreciate you. My name is Kyle Forster. I left the Kubernetes team at Google a couple of years ago to start a company called RunWent.
00:18:33
Speaker
We build tools for small teams and understaffed teams that are currently using Kubernetes. We do this using a combination of AI and an expert open source community. Okay, so you just said the magic word, AI.
00:18:49
Speaker
How does RunWend use artificial intelligence? And before we also talk about how RunWend uses it, can we also talk about, I'm assuming you have been getting a lot of these questions. What are LLMs? What are large language models? How do they work? And why are people suddenly so excited about AI and the whole new wave of CPTs? Well, I could give a short answer or a long answer.
00:19:19
Speaker
medium. Yeah, no, anything works. Just go for it. Yeah. You know, for all the what are LLMs? There are a lot of papers now on what these things are doing under the covers. But I do like think of them as an API endpoint, you send some text and it sends a bunch of text back. Yeah. The I'm a believer.
00:19:40
Speaker
Yeah, I can talk about how we're using them specifically, but I'm a believer that we'll see kind of three waves, because at first, if you look at chat GPT, which is kind of the big popularization of LLMs, and the first real open endpoint for a large one of these, it feels like magic.
00:19:58
Speaker
I mean, it really, really does. I'm assuming that almost all of your listeners have used chat GBT at least a few times in the last few minutes. I think, you know, look, the cost of building a good chat bot interface with these has now gone down. I mean, gone down by about a factor of a hundred. OpenAI opened up their API endpoints. I mean, this year to 2022 is amazing.
00:20:26
Speaker
I'm just a believer that chatbots will be a very short period in history. I was on with a service chatbot for my ISP a couple of days ago, trying to cancel my subscription. This was like for the second time and I'm ripping my hair out. Even if this thing could apologize more eloquently, the fact that I can't cancel my subscription and it keeps sending me to a webpage that is down doesn't help anybody.
00:20:55
Speaker
Yes. I know you were answering a question, but just a tangent. There's also a website called pie.com. Pie is another one of these chatbots. I was talking to it. I was like, I need some landscapers. And it was giving me answers like ABC landscaping, X, Y, Z landscaping.
00:21:14
Speaker
in near my area. I was like, do they really exist? Or is it just throwing some things on? And I was like, can you give me a link to their website and just describe what a website might look like for a landscaping company? And that's it. It didn't give me any links. So yes, I agree. We need these to be better. I mean, the incompetent chatbot that speaks perfect English is not moving the industry that far forward. But unfortunately, I think, frankly, we're going to see a ton of those.
00:21:38
Speaker
But despite the DevOps SRE area, we're going to see every vendor and their friend is going to put a chatbot interface. And I think we'll see a huge number of releases over the course of the rest of this year. And then we'll see a big backlash next year. And people will say, oh, I'm sorry, because we're all these bad chatbots.
00:21:57
Speaker
I hope that this will be a short-lived chapter one of the industry. The really, really interesting thing, the cost of building a chatbot's gone down by a factor of about 100. The cost of building a really good search engine has gone down by a factor of about 10,000. I think that will be the big deal. LLMs are the really, really expensive part of building a search engine.
00:22:23
Speaker
If you build search, if you want to build really, really good search, you don't take the text that somebody sent you in a query and try to match it against the text and all of the target web pages that you're trying to search. You try to take as much information about the user, their current location, their language, their lesson, and you try to build up a search intent.
00:22:43
Speaker
And you try to take all the targets, whether they're web pages, because everybody's familiar with consumer web search, or whether your Amazon has products, whether it's database rows, whether it's more interesting than we can talk about. And you're trying to build up an intent for each of those. And you don't search text to text. You're really trying to search intent for intent.
00:23:03
Speaker
And that's where LLMs are pretty amazing. Dump a bunch of information about the user, including their query, and say, what do you think is this user's intent? Grab a bunch of text about a database row, about a web page, in our particular case about our troubleshooting task. Dump it to chat.gbt to say, what do you think that intent is? And then you do intent to intent match, and you have phenomenally good search.
00:23:27
Speaker
Do you think anybody's doing that today or that's like a vision that you have for the future? Well, I just spent 20 hours last week on like UI designs and algorithms. I can tell you at least score one. We're right now tracking 11 different vendor projects using LLMs.
00:23:44
Speaker
most of which are just chatbots, actually 8 of 11. There are three that I think are interesting on the vendor side, and then we're tracking four different LLM, SRE, and DevOps projects amongst the FANG companies and amongst FANG teams. And every single one of those is active search.
00:24:02
Speaker
Got it. They're all folks that know search, know the capabilities, know the expensive parts, know the hard parts very, very, very, very well. Have a huge data set if they had a fine company. But with all of them, you barely even need the data set. It's fantastic. Just ask them to describe intent and it'll take care of the rest for you. Yeah. And I'm wondering, you know, I think there's a big part of this that search and chat bots are familiar to people. So it's an easy way for
00:24:27
Speaker
getting someone who really doesn't understand what's going on behind the scenes to use it, right? And I know that what you're up to at RunWend is sort of reimagining, we don't need chatbots, or maybe there's different ways to use it. And I'd love to hear more about sort of what was the genesis story of kind of like why you saw this problem and are starting to solve it with AI and specifically with Kubernetes at the moment.
00:24:57
Speaker
You know, look, the Genesis story for Runwin was just in my last role, a huge percentage of my users were stuck.

Challenges in Kubernetes Complexity and AI Limitations

00:25:07
Speaker
As you guys know, Kubernetes is hard, but we did study after study and I don't know how many, I mean, like in the hundreds of customer interviews specifically on this topic that I became very, very, very convinced the hard part is not Kubernetes itself. There are 800 open source packages, give or take that represent around 70% of the Kubernetes workloads in the world.
00:25:28
Speaker
across any individual cluster, there's an average of somewhere around 40 to 60 for a small cluster and many hundreds for a large cluster. And so if you look at like the problem that somebody has managing Kubernetes, it's not only Kubernetes, but they need to be an expert in troubleshooting the 40 different open source packages that their companies are running.
00:25:47
Speaker
Yeah. And it's not like when you go from one company to the next, it's the exact same set of 40. Actually, the set of 40 is fairly diverse. So to be an expert, you need to know how to troubleshoot like 800 different open source. But that's not a human capability. Yeah, that's valid. It's not like open source is going to slow down. So, you know, there's like 12,000 total, but it's not like we're seeing that much collapse into like, oh, the top 30 are really popular. The top 30 represent about 20%, give or take of the worldwide workloads.
00:26:16
Speaker
So you still need to know a huge number. And most teams can't afford to have experts that have that much coverage. You could have an expert in a couple areas, but you really need some fast access to a real troubleshooting expert, a real, you know, somebody to really help you set up monitoring, alerting, somebody to respond to alerts, figure out if this is a temporary issue, figure out performance optimization. And you need that expert to kind of pop in, like not even for days or weeks, you need that expert to pop in for like 10 minutes at a time.
00:26:46
Speaker
Yeah, that's fair. I feel like, you know, part of this, we've as an industry sort of done it to ourselves. Like every time we advance, we put another abstraction on something and it goes deeper and deeper. I feel like the movie Inception, by the end of the movie, you're like, where am I in this movie? You know, like I'm kind of confused towards the end of it. I feel like if you're deep in Kubernetes, you're like, how did I get here? You know, I started off just like my pod wasn't running and now I'm like debugging all this other stuff besides the point. So it's mean,
00:27:13
Speaker
DevOps engineers and SREs, they have to be sort of a unicorn to know all this stuff. They have many challenges, but what specifically are they struggling with that you've seen in the past and what made you create this company that can be solved, I guess, or helped with AI? Is there a matter of information at their fingertips to make them faster at doing it or really having something with the knowledge of what's going on inside of Kubernetes?
00:27:42
Speaker
I came to a conclusion, at least for this, that there were a couple of different areas. First, I mean, talking about, like, observability, I mean, we make a really, really big deal of observability, right? Very, very important. Clearly, there's been a huge amount, you know, like four and a half billion dollars spent on observability tools. That's just the top four last year. Yeah, I mean, big deal. But of all the Kubernetes teams that I interviewed, when I asked the question, okay, so how many of your troubleshooting sessions end at like monitoring and alerting? And they're like, well, like 20%.
00:28:13
Speaker
Okay. So 80% of the time you have somebody popping back on the CLI. Sounds about right. Yeah. Like you think like there must be some thing we could do here. That kind of became the starting point. Like let's help people the moment that they pop back on the CLI. And then how much can we help real experts who really, really, really, really know their stack troubleshoot themselves some, but I think that's actually fairly limited.
00:28:41
Speaker
How much can we help real experts that they need to communicate with other people that they're mentoring, that they're helping? Like, okay, that we can actually make decently more efficient. And how much can we help people that are fairly new to it, where their expert happens to be in a different time zone, is not online yet, is on vacation, is busy in another meeting, has a massive engineering deadline themselves coming up in two days. Sure. That I think is something that we can really, really, really, really help with.
00:29:12
Speaker
Okay, so you're talking a lot about troubleshooting and helping engineers like focus on other things and not just spend your majority of the time looking at logs or going to the CLI, right? A lot of early low hanging fruit examples that we have seen, at least on the interwebs is people using these chatbots to generate YAML files for a stateful set object or come up with some scripts. Is that
00:29:38
Speaker
a use case that you see that, again, is that a use case that you see? Because from my perspective, I can't fully trust the YAML file that it's giving me because in my personal experience, I've seen charge GPT just make up steps for things that don't exist. And then obviously, I don't want to push or copy paste anything from that into my production environment or even my test environment. How do we solve that? In addition to troubleshooting, do you see
00:30:07
Speaker
AI helping SREs in any other areas. We tried very hard to get LLMs to create troubleshooting commands. None of it worked. Not at all.
00:30:23
Speaker
Interestingly enough, we actually, all of us now, pretty actively use LLMs when we're doing any work in JavaScript, any work in Go and any work in Python. Those are three big languages that we're building the back end. And I kind of came to the conclusion for those who can kind of skim the code that an LLM generates, even get a co-pilot, and you can kind of have some confidence that it's your name. Now, when it comes to generating CLI commands,
00:30:51
Speaker
every single one we've just found needed human QA. And as soon as you get done like recreating the setup where that troubleshooting command is useful, you really may as well have just written the command yourself.
00:31:04
Speaker
That's true. Also, when you want to do some hinting to say, hey, here is, if this command presents this output, I want to do some hints to say, all right, this is totally healthy. We don't need to chase this. Or, hey, look, this is weird. And this needs either more AI attention or needs to be escalated to a human. That allows for, like, not good at all with no direction. But if a human gave them some hints, then they were fantastically good at interpreting the results.
00:31:34
Speaker
Gotcha. So again, what I took from that statement was we're good at LLMs and AI will be good at being a co-pilot, not necessarily an autopilot. You obviously can't trust it to do everything. But if you are an expert, if you know what you're doing, but you just need a head start on troubleshooting something or figuring something out, I think this is a perfect way
00:31:55
Speaker
I know for my personal use, sometimes I'll ask it a question, maybe it gets me 50% there, 60% there, and then I can take that information and then do my own thing. I think that's what it feels like people should be using AI and LLM for, at least in 2023.

AI in Troubleshooting: Automation and SRE Trust

00:32:13
Speaker
Maybe things will change in 2025, but for now... I think the organization thing isn't going to change. I became convinced that getting these things to write really good command line is not going to
00:32:23
Speaker
I don't think that's going to change with GPT-5.
00:32:27
Speaker
I'm really curious to see if you could have a bake-off or a race of someone who's just really good at Googling and then someone using ChatGBT to solve the same problem. Because I feel like half the time, I want ChatGBT or an LLM to give me an actual result for it to be useful. I don't need it to be more efficient Googler than I am. That doesn't really solve the problem for me. That'd be an interesting problem. I'm really good at it.
00:32:58
Speaker
Hey, if you already have a set corpus, so in our particular case, all of our troubleshooting scripts are actually written by experts. And we pay them. When a customer pays us, we actually pay them a royalty in return. The search engine through those is extremely good. So a user says, hey, Redis is down.
00:33:22
Speaker
Great, here are 30 scripts that I think are relevant to Redis being down, some of which are about Redis, some of which are not about Redis at all. They're about troubleshooting Kubernetes deployments, some of which are just about checking warning messages and logs. But it's smart enough to be able to say, hey, given I'm trying to troubleshoot Redis and given the data that we have underneath, here's a bunch of stuff that I think is very relevant. That other area where they're insanely good, because if you think about this first, it's like the first instance of search.
00:33:51
Speaker
but insanely good at taking the output of one troubleshooting script, turning that into a search query, searching and finding another relevant one. Nice. They're really good at chaining, like awful at writing the troubleshooting scripts themselves. We found elements were really good at like just chaining these things together into really, really interesting, like root cause sessions. You're like, huh, okay, I get it. Like it went down this path thinking maybe this was a problem pops back and then went down this path.
00:34:21
Speaker
And now suddenly, these things can run like hundreds of troubleshooting commands in a five minute period. And then you get this nice summary of, well, here's the steps that are taken. Here's what it thought the root cause was. Gotcha. So you're saying that everything, all of these different steps are kind of automated. It runs all of these get commands for you, troubleshooting steps for you, and gives you a good summary.
00:34:44
Speaker
Do you think at any point, like, again, we can talk about run when as well, right? And the concept of these digital, yeah, we think we're discovering this stuff that I think the patterns that we're discovering a lot of other people follow.
00:34:57
Speaker
Yep. Makes sense, right? Like, so is there any form of reinforcement learning? Like, are you teaching the chatbot at certain points during the day? Like, okay, this is the right way to do it. Like, no, the fourth command that you had doesn't really make sense. Don't do it the next time around. Is that happening? Or
00:35:14
Speaker
Is it like a vendor like RunVent that has to make those changes and customers are just trying to use those chatbots or digital assistants? I mean, I think that's where I think the role of a vendor gets really interesting here because we're even going through UI decisions to say, all right, when somebody gives a training tip, say, oh, wow, yeah, given that issue, here was really the right thing to run. We're going through UI questions to say, when does somebody want to keep that private and when are they willing to contribute that troubleshooting to the community itself?
00:35:44
Speaker
which is where I think that's where like vendors and communities can play a really interesting role. Cause you say you're a really, really small Kubernetes shop and you're facing some issue for the first time. Can you get like, wow, I now have the knowledge, not only have these like couple hundred experts that are in commands, but I have like all the knowledge of like a Shopify like coming. Yeah, that would be awesome because we always hear about like,
00:36:07
Speaker
The case studies that we heard at things like KubeCon are for large-scale environments, like big enterprise shops that are doing this at scale, but the best practices don't usually apply to a two-person team running two applications. I think just having access to the expertise that has been built inside that Shopify, I think will help smaller organizations adopt Kubernetes faster and be comfortable with it, I guess.
00:36:32
Speaker
That was 100% my thesis. In my last role, if I could have just taken some of the troubleshooting documentation, even, that some of my larger clients had already written, and if I could have just given that to my medium clients, I actually tried with one, and they agreed, but it just turned out to be too much work to scrub out host names, IP addresses, et cetera. If we could just take this and make this public, who'd that change the game for so many smaller Kubernetes shops?
00:37:01
Speaker
I do have a concern there. We have heard case studies about how product managers have basically used chat GPT and maybe given access to data that to open AI, that shouldn't be. We also see a lot of bigger enterprises asking their employees not to use these LLMs. How do we find that balance? I like the idea of learning from the best and then helping everybody.
00:37:27
Speaker
Are people comfortable sharing their troubleshooting scripts back to the ecosystem? Do you think this will be more successful because it's done inside the CNCF landscape where everybody is more comfortable being open? Or how do we handle that? I think sharing the troubleshooting scripts themselves
00:37:45
Speaker
99% of the time, these are generic. I mean, having reviewed so many of my clients troubleshooting scripts, like, okay, you're troubleshooting Nginx, you're troubleshooting Kafka, you're troubleshooting Postgres. And so many of these, like, well, 1% applies to your own code. You get such incredibly good telemetry into your own code by doing all of the troubleshooting for all of the systems around it, most of which are open source.
00:38:07
Speaker
So I think a huge amount of just like open source troubleshooting covers like 90, 95% of somebody's needs and can get somebody to the root cause really, really, really quickly. Yeah. I mean, having, having a community of people developing these scripts definitely is above and beyond any single person could do, right? A single SRV who's, you know, the, the light and shining knight and shining armor of your, your problems at a company. Right. Um, it kind of brings me to, you know,
00:38:37
Speaker
how can, or how can these AI personas sort of
00:38:43
Speaker
be as good as that SRE? And how does someone trust them, right? Because I think people are getting used to hearing about things like misinformation and sort of the term hallucinations I've heard with LLMs, although I don't fully understand it. But like, how do they go about trusting someone, this AI persona, as they would that trusty person, the SRE that they constantly can go back to and they'll help them?
00:39:12
Speaker
I mean, I think this is where the troubleshooting scripts themselves have to be written by experts. Sure. They just have to be like the, we tried so hard doing it the other way and we were sitting here in the middle of it with a heck of a lot of incentive to trust and I couldn't get there. So I think that you have to A, have your troubleshooting scripts written by experts. You have to make those experts famous.
00:39:38
Speaker
I think that's really important. Those are people that's not just, I don't think that should be co-pilot fodder.
00:39:44
Speaker
that we make a lot of decisions around making the name of the author of this script show up in large letters, making their photo show up in large letters, making these people famous. We make sure that they get paid for this work. And that like, as you as a user, like you were keenly aware that you were using troubleshooting scripts that were contributed by 35 people, including these three people who have spent five years troubleshooting mginx. And these two people have spent five years troubleshooting postgres.
00:40:12
Speaker
I think the LLM part of it, our digital assistants, they're more like river guides, like helping you say, hey, there's a great script over here. There's a great script over here from this person. There's a great script over here from this person. Given what you're asking for and everything else that I'm seeing, here are three great scripts that I recommend. And so you put them in the role of recommending humans work.
00:40:36
Speaker
which is where they're really good. A guided, a guided trip through troubleshooting. I like that. I mean, seeing how willing people are to go to Stack Overflow and just like try something that who knows put on there. They probably know nothing about that person or why they put it there or if it works. Right. Um, I imagine this is a big step up, right? That's the way I see it. Um, I mean, this looks good to me. Like we can do, we can improve on this. Yeah, exactly.
00:41:05
Speaker
Um, I think the, the next step for that is, you know, even these are humans, humans makes make mistakes. Sometimes they won't work. How do, how do you go about sort of improving the quality? What's that sort of process process looks like for these scripts? And I think the first, and we found this in our own use, I was surprised, frankly, I thought that a lot of our own scripts who ended up being remediation, the vast majority, like 90 to 95% of ours are information gathering.
00:41:33
Speaker
That's not to say that it's free. If you gather a ton of information that a human has to look at, you didn't really improve the situation all that much. So there's a lot of information gathering and then summary and hey, that's why the hinting is really important. This is an issue that's worthy of attention or this command just generated like a 4,000 line output, none of which you need to read. That has a lot of value. Yeah.
00:41:57
Speaker
And the remediation ones, we actually set our system up. This is just a UI thing, but I think it's interesting with multiple digital assistants and multiple personas. Like for our own internal use, we have a set of eager Edgar and cautious Kathy and then admin alley. Yeah. Love those names. And admin alley actually has materially different RBAC permissions than cautious Kathy or eager Edgar. Ooh. All of the remediation steps in admin alley and the idea of being like,
00:42:25
Speaker
If you're talking, if you're asking, I don't know how to do things. Be careful. Yeah.
00:42:42
Speaker
Gotcha. So do you ever see, right now you said, it's data gathering phase. Do you ever see us going to like actually, like does admin alley, or I feel sorry if I messed up the name. Do you ever see them taking over and fixing things for you, not just giving you a summary after running 30 scripts?
00:43:03
Speaker
Just a second part of the question, right? I think the idea of having those community backed or user backed scripts will definitely help because if I'm running a production environment, if I'm responsible for one.
00:43:13
Speaker
I will not trust AI because that will have an outage and I can lose my job. I can trust a chatbot to give me suggestions of what I should do for my two-day weekend trip that I have, but not for messing with my day job and my production environment. That's a great idea, right? So first question is, do you ever see them taking over, like doing things for you? And then how do the community handwritten scripts help with that approach?
00:43:39
Speaker
And I always say, will we see like fully automated remediation bit by bit by bit by bit? I mean, let's start with like, Hey, all of Kubernetes is predicated on this idea of the reconciliation loop that itself is doing automated remediation in a very material form compared to its predecessors. So I do think that we'll see some of that bit by bit, mostly like, Hey, what are the really practical things? Restarting stuff. Yeah. They made a whole TV series about it. So yes.
00:44:09
Speaker
These bots can restart things. They can help, hey, wow, our scheduler somehow got itself into a really, really bad state. OK, add a node, add a node for an hour, let the scheduler re-figure itself out, and then delete nodes or scale back down.
00:44:23
Speaker
We're big believers in GitOps. So there's an awful lot of, okay, reconcile, re-reconcile, re-reconcile, oh, all right, rollback, fair. Okay, we must have a bad PR somewhere. Let's trust that a digital assistant could actually do a rollback to a prior PR. I wish we know it's just gonna get
00:44:48
Speaker
hour ago. Okay. I don't think things like fixing configuration to me, it feels pretty dicey. Like could I picture any of our AI assistants actually fundamentally checking new stuff into any of our infra repos?
00:45:05
Speaker
Yeah. Sounds like a stretch. It sounds like these personas have sort of a context they're allowed to work in, a sort of set of laws, call them Kubernetes Turing laws or something like that, where like, shall not edit YAML versus harming a human or something like that. But how do those things get set in the sense that cautious Cathy is cautious and functionality does more things?
00:45:35
Speaker
Um, I mean, between like, between those two examples, we just set it up with our back rules the exact same way. Frankly, like my colleague Shay doesn't particularly like me having any right access and repos like period forces like, ah, you know, you get that you do other things and don't touch these, um, by dev environment anytime I like, um,
00:45:56
Speaker
So we view the, like the personas, that's a really easy, like mental model to pick up and put down on top of, on top of personas as well. Like, Hey admin, like cautious Kathy, a lot of people on the team can use cautious Kathy. Okay. Let's give cautious Kathy just our back that we feel pretty comfortable with. Admin alley might be very limited. Only a couple of people have access. So it's kind of a clean, like, Hey, let's take the human mental model we've gotten. Let's apply it to digital. And.
00:46:25
Speaker
So beyond that, frankly, tiny tweaks to the way that these things search leads to pretty different personalities. I mean, we gave these names after fiddling around with a few configuration files. They didn't figure it out with a little bit of training. They just developed their own little personalities, and they're fun to watch.

AI Learning and Adapting to Microservices

00:46:49
Speaker
So users can name their own.
00:46:53
Speaker
That could get weird, I feel like. So once we have, let's say I have access to Edgar and Kathy and inside my organization, how does it learn? You already spoke about different levels of RBAC levels, right? So I can have different teams inside my organizations, different people have different levels of access to these digital systems in the first place. But is there a way for them to learn from what's going on inside my environment or
00:47:22
Speaker
done when does all the training and learning and evolving these models and I'm just an end user. No, I mean, this is now we're getting like deep in like the vendor stuff, but this is where this is where the vendor stuff A gets hard and B is where we spent a lot of time. Yeah. For us, our view is they learn from a couple different sources, they can actually learn from the person who authored the troubleshooting commands themselves.
00:47:45
Speaker
So if you see conditions like this, think of it as almost like SEO. Like I want to boost my troubleshooting command when I see situations like this. I think that's actually good. Like SEO is good. So that's one source of learning. One source of learning is from us, like just changing the base models.
00:48:01
Speaker
Sure. Um, which I think that's like how we add value to our a hundredth customer, then or a thousand for 10,000, et cetera. Um, one sources where users actually say, Hey, I want this training tip to I'm okay. Submitting this training tip back to the community. Okay. And then the last is like, Hey, there's a training tip, but for one reason or other, I wanted to be private.
00:48:23
Speaker
And we're still playing with the public training tips. They're so much more valuable. Do we charge more for private training tips or for people that want to keep everything private and not give anything back? I don't know. We're still playing around with a whole bunch of different ideas there. But the four sources of learning are really important. Okay. No, I think it would be really cool if
00:48:46
Speaker
the chat board of the digital system can see me troubleshooting something for the first time. And then it comes and like does it for the next hundred times. Like if I get hit the same issue, I don't want to do the same thing over and over again. And that's why we have adopted things like infrastructure as code and get ops and all of those automation things are principles, but like learn from me and then just do it for me. And that way I being the 10 X engineer again, scale and the rest of my team, the rest of my organization can be unblocked.
00:49:14
Speaker
The nifty thing is, I'm having a couple of conversations with some folks. Folks have been doing this in our industry for a long time. I think it changes the way we think about automation, because if you think about, let's take the exact case that you said with today's automation tools, we would write this in a script to do that. And let's say we refactor our microservices. It's probably broken.
00:49:40
Speaker
We refactor our microservices a lot. And before we were actually using our own product, we were breaking our troubleshooting and regression scripts all the time. It was real pain in the neck. Now, if you assume instead that like, hey, each little step in the script, maybe you authored it, maybe somebody in the community authored it, but the script was fundamentally orchestrated by a digital assistant that took a whole bunch of training tips.
00:50:09
Speaker
Suddenly, you'd refactor your microservices, and you don't change a thing. It'll just kind of ignore the broken commands, and it'll probably pick up a whole bunch of commands that are relevant to the new architecture without you ever actually changing anything. And that's what we found that's really cool, especially for things like, I want to tie this so every time an alert comes in, EagerEdger goes off and works on it for 10 minutes.
00:50:36
Speaker
Yeah, that's key. We don't change the Eurekers config anymore. Eurekers kind of just figures out the new set of services and does it. And takes into account training tips that were relevant in the old stuff. They're still relevant, forgets training tips that are no longer relevant because of the refactor. And then if somebody goes, does go in and give a couple of training tips, fantastic. But another way, Eurekers could be the first on the list of your pager duty list. Hopefully not. Not give it off to another human.
00:51:07
Speaker
We don't have the time to fine tune all of our alerts. It would be really nice. We don't have that kind of time. It's just nice when we can say, all right, did Eager Edgar think there was an issue with this alert? No, it's just a preempt event. Who cares?
00:51:23
Speaker
I think there's this point in which a technology like this gets added to a company's common usage and SRE team or something like that. There's a point in which I feel like when someone who is part of the support group or SREs goes back and said, oh, who fixes that?
00:51:45
Speaker
And they just say Edgar did, Edgar did, and they don't know it's actually not a human. I think that's where you've, you can define sort of like, we did it, right? Or gets the level of, you know, it's becoming just another person and somewhere form. Do you think I think that that looks like success, but I mean, having just
00:52:06
Speaker
come out of a meeting last week, make a bunch of UI decisions on this, to me the big success is, oh, I asked Edgar to do that. Edgar works for, like, our assistants do things on behalf of people. And I think when we can put them in that light, hey, we set up the assistants for success, but we also set up the people for a ton of success. And there's a lot of subtle UI tweaks that we do around this, of like, okay, so who asked Edgar to do this? Who asked Kathy, cautious Kathy to do that?
00:52:35
Speaker
And you can really start to basically say, hey, it's not only 10X engineers, but it's also people who aren't 10X engineers who can suddenly be like, wow, they're actually a really, really effective manager, free digital assistants that are doing a ton of work. And you give that person is not the 10X engineer.
00:52:53
Speaker
the ability to have amazing impact in their organization. Yep. I'm excited about being like, especially starting my career as a very mediocre programmer, the fact that I didn't have to manage my memory with Java and I did have to see like, yes.

Runwen's Evolution and Community Engagement

00:53:09
Speaker
That's a win. Do it again.
00:53:15
Speaker
Gotcha. Okay. So Kyle, I think next question I want to ask is around the troubleshooting scenarios, right? Like I know you used Nginx and Redis as the two examples. Like do you see, or can you talk about some really low hanging fruits or common troubleshooting scenarios that you see right now when you're working with maybe your early design partners, design customers, and like, how are you like, we're still in the early stages. What do we, what are we fixing now? And then what are the next steps?
00:53:42
Speaker
And I mean, the really basic ones that we did a year ago now are like all of the basic kubectl gets across the stuff. And then you start to go and say, all right, let's go instead of kubectl gets, let's start basically kubectl gets, let's start troubleshooting Helm charts and let's start troubleshooting things like flux customizations. So now we're starting to get into some of the meta areas that actually do impact lots of other deployments.
00:54:14
Speaker
memcache and kube, like nginx, kong, etc. So I think that was kind of the sequence that we took. And we clearly know after this, we're going to go beyond Kubernetes. I mean, that's, you know, that's kind of the direction that things are headed. But you know, right now, let me use it as the opportunity to give a short plug, my colleague Shay just started an ugly kube party.
00:54:40
Speaker
I saw that. I need to attend the next one. We'll put that in the show notes too. It's like a paragraph long. You're like, Oh wow, actually, yeah, that actually is a lot of, that's the exact output that tells us exactly what we want. So I think that in addition to going from like the base, like default to like the meta to the very rare workload specific, we're also going from like, Hey, really, really simple commands to really useful ones that if you don't have copy and paste, nobody's ever going to type. Yeah.
00:55:10
Speaker
Yeah. I mean, like the, the amount of JQ hell I've been in when I was like AWS APIs or something like that. Right. Um, if it was just there for me, I'd be, you know, the happiest camper. So I get the value there. Now, um, you know, for, if I could use that, let me give a small plug. Yeah. We also to promote our authors, we put together a little open source tool.
00:55:36
Speaker
that scans people's kube clusters. And it's not a paid product. It's not paid at all. It just scans kube clusters and it pulls out all of the CLI commands that an assistant could run. And it puts it in a copy paste cheat sheet. Cool. Yeah. This is run one local, right? This is run one local. Yeah. So it's not like, and we did it to, in a big way, we did it to promote our authors. So the author's names like show up in big, big letters. I think it was really cool. So even if somebody doesn't, somebody doesn't buy our product, we can't pay the author royalties.
00:56:06
Speaker
but if somebody at least uses this and we can make our authors little famous. Yeah. Yeah, that's great. And I believe that's open source as well, right?
00:56:14
Speaker
Well, okay. That's perfect. I mean, that's a good place probably to ask more about where people can get involved, right? So there's open source run one local, um, you know, if someone wants to become part of the community, writing these scripts or, uh, you know, earn royalties or something like that, where do they get involved with that? Or, or, um, docs.run1.com has all of our technical documentation and that has the information on joining the authors program.
00:56:42
Speaker
Gotcha. And what about people who want to try this out? Because this is exciting. Yeah. So right now, pretty much everybody starts with Run Run Local. It's the fastest way because you don't even need to create an account with us. And you can say, all right, form my cluster. Here's what Igor Edgar would do.
00:57:04
Speaker
And then after that, it's more commercial conversation to set up a right now basically set up a set up a POC with us, but we're going to open it up for GA. It's a little later this year. Awesome. But right now it's sort of private POCs. Gotcha. And anywhere else, you know, is there a place for folks to communicate besides the website like
00:57:23
Speaker
I don't know, Twitter or GitHub or anything like that you would like to share. Absolutely. So in our tech docs, we put in just our Slack channel where we hang out. We put in our Discord channel where people who hate Slack on our team hang out. We have too many repos at this point under Run, Run, Controbe, or that's for all of our discussions back and forth. Gotcha. It feels like we have a lot of these for a small company.
00:57:49
Speaker
You're just covering all the bases. Do all the things, right? Get them all.
00:58:03
Speaker
Well, that's great. I feel like we could talk about this for hours and hours more. Maybe we'll have to have you back on the show when Edgar and Allie and those are in their tweens or something like that and growing up. We can check up on how they're doing. But yeah, I feel like this has been super insightful. I think really interesting topic for folks both interested in AI and what we're using it for in a more general sense. And then obviously,
00:58:31
Speaker
those who are in the weeds troubleshooting Kubernetes, I think this is super valuable. I want to thank you for coming on the show and hopefully we'll have you again soon. Thank you so much for having me. I really appreciate the opportunity.
00:58:43
Speaker
All right, Bobbin, I think that was probably not enough time to cover that entire topic with Kyle. I think I could have talked to him for another two hours probably on this show. It'll be a longer episode. But, you know, I think it's one we could easily have back on and kind of talk about again. But what do you think? What were your takeaways from that? No, I think I really liked the episode, right?
00:59:05
Speaker
the way Kyle came in and didn't hype up AI and LLMs to be the next best thing. And yeah, everybody should be using that. He's like, no, there is a place and reason for every solution that people are using. So I like the fact that he understands, even though he's a CEO and founder of an AI based company, he's not gung-ho about it. He's like, no, we still need to support SREs by having human written scripts.
00:59:33
Speaker
AI can be the way to surface those scripts, give the user an easy troubleshooting workflow. They can enable them, but again, when anything is touching the production environment, it shouldn't be
00:59:46
Speaker
At least in 2023, in July of 2020, it shouldn't be completely automated and AI written. It has to be something that a human has written that AI is just helping you. It's like doing a Stack Overflow search and figuring out the best solution for you and then giving you that snippet like go and run this and this can help you solve a specific redecision.
01:00:06
Speaker
Yeah, we might we might get to an ex machina. I wrote that human touch that trust element I think is gonna be key and if you are someone working on kubernetes troubleshooting like it's an opportunity for you to give back to community and also kind of help to.
01:00:25
Speaker
I make some royalty along the way. Sure. That is cool. Giving some value to those. Yeah, I think that was a really cool part of it. I think just all of this coming from, for me, a lot of this coming out of real life examples with complexity. SRE spending
01:00:42
Speaker
you know, so much time in the keyboard on the terminal, like I've been there, I get it. And it definitely comes down to there's, you know, there's only a few people that are going to know everything in that stack and know how to understand, you know, troubleshooting everything. So having a tool that can help you get somewhere faster, you know, as a human kind, we're all about that, right? So I think that fits into that sort of human element really well.
01:01:09
Speaker
We didn't do the chat GBD question, but we asked it, why should an SRE trust artificial intelligence to help them with the Kubernetes clusters? There was a whole long answer and I'm not going to go through it, but part of it was it should be used as a complementary tool rather than a replacement for human expertise.
01:01:27
Speaker
I really liked that line. I don't know where it pulled it out from or anything like that. And obviously we're asking itself to, uh, why we should trust it. So I don't just take with a grain of salt, but I really did like that part of it. And I do feel like that's a big, big piece of it. Um, the other piece I think Kyle talked about was, you know, people don't just need more chatbots. Like that's how we've been exposed to a lot of this. And there's a lot of frustration there.
01:01:49
Speaker
Right. I mean, he gave the example of being on the phone with some something. And I was I recently did like a few days ago, I'm like, yeah, no, no one wants this until it gets way, way better. And so like that sort of persona angle they have going for it that's, you know, backed by the human elements, the written pieces is really, really cool.
01:02:09
Speaker
I know. And I think that like the experience will improve overall. Like, yes, chatbots, it's easy for mass consumption. Like, okay, everybody can just talk to a chatbot. Yeah. But yeah, it doesn't have the expertise right now. I think it will improve. I still remember early days of just voice assistants, right? Like I had an iPhone 4S and like Siri hasn't improved that much, but there are other voice assistants. Oh, she's so frustrating.
01:02:35
Speaker
But there are other voice assistants that are better. Even though everybody loves chat GPT, it's still early days. The fact that we're frustrated with it also tells us that we're itching for more. We're itching for those things, those tools to do more. Although I found out a stat that the amount of the chat GPT usage has gone down right now, a few things to it.
01:02:57
Speaker
hype schools been out. People are like, maybe kids are not using it for their assignments. I was like, they shouldn't be using it for their assignments, but like, that's one of the reasons being cited. So maybe the workforce who are taking vacations are not using it. This is not just supposed to be used for work. I use it for planning my trips. Like, okay. I, instead of me watching 15 videos over a weekend to figure out what other top five spots I should
01:03:23
Speaker
touch in every trip. I just ask chat GPT, okay, I need to plan a trip to XYZ for four days. Can you give me a list? And it gives me a list, which is like, okay, this is a good start. A couple of years dated, but yeah. Yeah. Again, national parks, right? The hikes don't really change that much. Not that drastically. Not drastically.
01:03:44
Speaker
It's a good starting point. Saves me a day's worth of watching videos from influencers. Until that one time where a volcano took out an entire trail and you show up, you're like, I can't. If that happens, I'll definitely tweet about it.
01:03:59
Speaker
That's my level of rage. I'll tweet about it. Nice. Cool. So as we wind this down, we do want to remind or we did not even remind. We want to tell our listeners we are taking a summer break ourselves. We talked about the trips we're going on. So next episode will be at least four weeks out. So skipping one, we usually do them every two weeks. Hopefully you won't feel lost without us.
01:04:19
Speaker
you know, enjoy your own summer, I guess I'll say. And we'll be back with a lot of new episodes ready to go down the road. And just a reminder, please come and join our Slack. Yeah, if you miss us too much, there's always Slack, you know? Chat with us there. We have it on our phones, stuff like that. And yeah, we'd love to hear from you there. So this brings us to the end of today's episode. I'm Ryan. I'm Bobby. Thanks for joining another episode of Kubernetes Bites.
01:04:49
Speaker
Thank you for listening to the Kubernetes Bites Podcast.