Podcast Introduction and Hosts
00:00:03
Speaker
You are listening to Kubernetes Bites, a podcast bringing you the latest from the world of cloud native data management. My name is Ryan Walner and I'm joined by Bob and Shaw coming to you from Boston, Massachusetts. We'll be sharing our thoughts on recent cloud native news and talking to industry experts about their experiences and challenges managing the wealth of data in today's cloud native ecosystem.
Year Review and Travel Plans
00:00:30
Speaker
Good morning, good afternoon, and good evening wherever you are. We're coming to you from Boston, Massachusetts. Today is December 7th, 2023. I hope everyone is doing well and staying safe. Bobbin, it's almost the end of 2023. How are you doing? I don't know. I'm not okay with that fact.
00:00:50
Speaker
I'm happy that it's almost holiday season, but yeah, still can't get over the fact that it's December already and the year just flew by. We had a lot of fun. Yeah. I mean, this may or may not be the last show of the season. We'll see. But, you know, we're getting towards the end of this, this full year, which has been quite a, quite a ride, I would say. I know. It has been. How many, how many national parks did you visit in 2023? You got one more, right? Yeah, one more. Oh, two more. Two more. In 2023?
00:01:20
Speaker
There are multiple national parks, like, yeah. So, two more remaining. I think I covered more national parks last year, or maybe the same number, but I don't know. I'm happy, like, with the way it traveled. Most of my study is good. Yeah. And you leave next week for Panagone, right? I leave tomorrow. Oh, jeez, that's right. Yeah. Sorry. I had to get this recording in, yeah. And then, then I can enjoy my vacation, you know. I needed to do this.
00:01:47
Speaker
Nice, nice. So Patagonia and you said there's two. Yeah. So no, so Patagonia is an actual region, right? And then there's a national park called Torres del Paine on the Chilean side. And then I don't know what the national, I think Los Glacialis. I think literally anything on the Argentinian side. And yeah, I'm going to do both of those. So the plan is to rent a car in Argentina and drive over to Chile.
00:02:16
Speaker
do the National Park there, do a long day hike, like a 13-mile hike. And then a couple of days later, drive to Argentina, do a similar hike, like 13 miles, 3,000 feet of elevation gain on the Argentinian side. And then if I still have some energy remaining in my legs, I have a couple of smaller hikes in the same area. So let's see. I'm excited and nervous.
00:02:42
Speaker
Clearly, you have never seen me speak Spanish, so I'm going to completely rely on Google Translate and see all the tools. I mean, I took Spanish in middle school and I could tell you I still wouldn't want to speak Spanish. I'm pretty terrible at it. I could get by, you know what I'm saying?
00:03:00
Speaker
Today's technology. I think you're in better hands. And I think the thing that made me feel better about myself was I had a colleague at my previous job who was very fluent in Spanish. He grew up in Arizona. And then when he went to Batagonia a couple of years back, he's like, they speak so fast that even if you understand that, you can't really have a conversation. So even he ended up showing things on a Google Translate app like, I want to go to the ATM.
00:03:29
Speaker
And then Uber driver basically took him. So yeah, I think if, if, even if I knew Spanish, it wouldn't have helped that. That's my, that's such a powerful tool though today. You go to any country and pretty much, as long as you have service, you could translate. Oh, you can actually download these offline as well. So you can download specific languages offline and so nice. Yeah.
00:03:51
Speaker
I don't like, like I under it's kind of that feeling where, you know, we grew up today with GPS and things like that. And if you get like, you tell someone to go somewhere without Google maps is I feel like that's a very similar feeling. They're like, what do I do? You know, I know. I follow the signs for the signs.
00:04:15
Speaker
We don't have great signs, but we have people everywhere. I feel like I need to experience. So you can always stop at a crossroad and ask somebody. They'll not give you good directions, but they'll at least get you to the next person who can maybe help you better than him. All right, you'll get there eventually, right? That's on an adventure.
00:04:39
Speaker
It's not quite choose your own. It's more like, let someone else choose your own adventure. It's a new category. Anyway, it sounds like a blast.
Understanding AI and Machine Learning
00:04:50
Speaker
Today's topic, super fun. We have a one-on-one topic again. It's you and I talking about stuff. And today we're going to talk about AAL. Yeah.
00:05:01
Speaker
Exactly. AI, ML, LLMs, all the terms, what they actually mean at a very high level. So that'll be really exciting. But even if you get just the full forms for those terms, I think we call this a win. If you don't just mix them together like I just did, yeah.
00:05:22
Speaker
I like this format like like not the 101 format right but when we do one-on-ones I like the the evening setting when it's calm like you don't have to worry about slack notifications and like you can have a conversation it's chill it's evening we're good to go yeah yeah this is a good setting for a one-on-one exploratory topic for sure yep
00:05:44
Speaker
I want to set that baseline right now. Again, not experts. Just here to make sure that we are learning together as a community. We want to share what we learned. And then if you have comments, send them to Ryan, and then we'll maybe call a different expert in the field who has been doing EIML for longer than two, three months to talk more about it.
00:06:07
Speaker
That's the plan. After every one-on-one, we have a whole bunch of guests on who know a lot more than we do about this type of thing. But if you were at KubeCon, like Bhavan and I just were, or even if you weren't and watched some of the fallout from it, or any conference going on right now, you know that AI and ML are pretty much the topic that's covered first.
00:06:34
Speaker
you know, it's top of mind for everyone, every company, every organization, every team. I would say, so that's why we're doing this. And we're going to dig in and hopefully get some really cool use cases and sort of, you know, things like that after the fact. But, you know, we have to cover the basics one way or another. Yeah. So, yeah, for sure.
00:06:53
Speaker
Cool. Well, let's start with some news though. Yeah, exactly. Let's get some news going. How about you start off? Okay. So like it's week after KubeCon, right? So none of the vendors are announcing anything, but, but that also means AWS re-invent was last week. And they, as you said, Ryan, like it felt like an AI, ML conference, not a cloud computing conference because I wasn't there. Thankfully.
00:07:16
Speaker
Thankfully, this year, I wasn't able to go, and I was okay with that. That makes sense. Me neither. I was watching the keynotes live, but yeah, the first keynote where they are supposed to announce everything in AWS, they started doing that for the first 15 minutes, and then it just went into AI, and then they had a dedicated AI or Gen AI keynote on day two, which again, spent like two hours talking about AI announcements. Nice.
00:07:44
Speaker
So a lot of AI content, but that's not what I have for news. A couple of things that are more relevant to EKS and cloud native infrastructure. Amazon EKS announced something called EKS pod identity, which helps simplify the way you pass IAM permissions to your Amazon EKS clusters or your pods running on that EKS cluster.
00:08:05
Speaker
Before this, the way you did it was using something called IAM roles for service accounts or IRSA. You created an IAM role, you then mounted that IAM role as a service account. Now you can create an identity that you can actually mount to a specific pod. You create an IAM role with the required permissions and then you specify a specific service principle called pods.dks.amazon.com.
00:08:34
Speaker
And then when you are deploying this, you map it to a service account directly from the Amazon EKS console or APIs or AWS CLIs. And that's how your pod actually get those permissions that you can manage from outside the Kubernetes cluster. So even though Kubernetes has its own RBAC, if you want your pod to have access to AWS services, so it can maybe spin up an S3 bucket on your behalf, you can control all of those permissions and policies from AWS IAM and mount them using this new pod identity feature that they have.
AWS Announcements and AI/ML Updates
00:09:04
Speaker
Got it. So this is like service to service identity, so to speak. Yeah, like it's about like the part and the access to different AWS services that it has. Yep. And then I did a quick search, but I want to say this helps in like the zero trust world, but it doesn't come up anywhere, but I feel like it does. Absolutely.
00:09:24
Speaker
And then I think the second service, I just found it cool was like AWS Fault Injection Service. And that just brings like chaos engineering to the forefront. So now they have different scenarios that you can like create templates and actually play against your applications running in your AWS account where you can simulate things like availability zone, power interruption, cross region connectivity issues.
00:09:49
Speaker
EKS stress from a CPU perspective, from a disk perspective. Introduce or delete random pods inside your EKS cluster. They have a scenario library for all of these weird events. Even though people can use it for chaos and genetic, I think I'm going to start using these for my official work demos because this just looks cool, dude. This is showing me the best way to fail things.
00:10:14
Speaker
And at the very least you can say like, you know, what happens when I press this button? Is my application ready? Nope, it wasn't. Let's try again. I mean, the EC2 stress for disk, right? This is one that we've done in the past sort of like manually. And I think that's the benefit is people are doing this stuff anyway, so why not make it?
00:10:34
Speaker
something you can just, you know, access from this library, but yeah, you know, increase the disc utilization and see what happens. Memory, all that stuff, right? That makes a ton of sense. This is very cool. I like this. Yeah. So like those were the only two items. I know AWS re-invent didn't just have two announcements, but yeah, there are a lot more if you guys just want to do like a basic Google search. Absolutely. That's it for news. Yeah. Well, let's, let's not waste any time then we're going to dive into our,
00:11:01
Speaker
AI and ML 101 episode. I did it fine that time. I know. It's awesome. I didn't mix the terms together. Speaking of terms. Thank you. Speaking of terms.
00:11:11
Speaker
Let's start with the basic question, right? What is artificial intelligence when someone asks or use that word or sees that word? What is machine learning? And what's the difference between deep learning? Okay. Yeah. So I think I like, I like this question because this talks about the pre gen AI, pre LLM thing. Okay. Not pre in terms of like, but these, these terms technically are a form of deep learning, right? But we'll, we'll get into that.
00:11:41
Speaker
Yeah. So like starting with AI, right? Like AI is the actual discipline. So everything you just listed is part of artificial intelligence. So like machine learning, it's part of AI, like machine learning is actually a subfield of artificial intelligence. But if you're talking about any of these things, you can talk about it as AI. And then before we before we move on, though, I feel like
00:12:02
Speaker
I need to pose the question for every one of our listeners. Yeah. Please comment either anywhere you can comment or on our slack. What is your favorite artificial intelligence movie? Oh, need to know. Mine's Ex Machina, by the way. Okay.
00:12:18
Speaker
I think I knew that, or I could have deducted that. I've definitely talked about it before. Or shared gifs from, yeah. Anyway, anyway, continue. So like machine learning is that's a field of artificial intelligence, right? I think it was taking like a history lesson. It was originally defined in 1950s as the field of study that gives computers ability to learn without being explicitly programmed.
00:12:44
Speaker
That's the crux of it. I love that definition because it's simple. It tells you exactly what it is. You don't have to tell the computer what to do.
00:12:55
Speaker
You can give it or make sure that it has the ability to do certain things without explicitly training it or writing a specific piece of code that tells it how to do certain tasks. So machine learning is where you give the machine a lot of data, a ton of data, and let it learn or let it train on its own. And then you can expect it to imitate intelligent human behavior.
00:13:21
Speaker
So this is where like, okay, you're training a model, and then you're using a model to predict things or generate things, but trying to mimic what a human would do in a specific situation. I mean, you bring up the whole training aspect. I think there's a point which I think we should stop and distinguish between the two sort of
00:13:42
Speaker
modals of how this works, right? There's kind of two steps in this process. It's like everything that happens beforehand, and then the thing, the artifact that is actually doing the inferencing, right? So maybe we explain the difference between those two. Yeah, for sure. So like,
00:14:00
Speaker
When you're talking about machine learning, if you're talking about the before, that's the training stage. The training stage is not one simple step. It involves a few different steps where you are actually getting access to a lot of raw data, and then you are processing that data or prepping that data, and then you're building a dataset on which your model can actually be trained.
00:14:25
Speaker
This is where machine learning itself can be divided into multiple subcategories based on the type of data. So if you are using labeled datasets, that becomes supervised learning. So if you are going to train a model, the simplest example, whether something is a cat or a dog, you are giving the machine the data in which you have specific images of dogs and cats, but they are labeled as such. So it's learning that, okay,
00:14:52
Speaker
if it's a dog, if the label is dog, this is how it should look like. And then it's building that intelligence in it. So like when you are actually going to the second phase, which is the inferencing, you are using that model. And when you upload a specific random image of a dog or a cat, it should ideally, if it's trained properly, tell you whether it's a dog or a cat. And that's a very simplistic example, but you can think about it in so many different ways, right? Let's say,
00:15:18
Speaker
Like in your emails, you get so many spam emails. But now Google emails and maybe even Outlook has become so good in terms of identifying which are the spam emails versus which are legit emails. Because underneath it, there was supervised learning where they trained the model with a lot of spam, what a spam emails look like and what do legit emails look like. And now it like tells you, like it automatically sorts that and makes a decision for you.
00:15:46
Speaker
mostly correctly, in most cases, pretty accurate. You can also use this for skin cancer detection. If you're training your model on labeled datasets, it's not that skin cancer was something that was recently discovered and people have been doing that. There have been so many
00:16:03
Speaker
Instances of that, we have a lot of data. If you go to Kaggle, you'll see an actual dataset of things being properly labeled as something that shows a cancerous cell versus a non-cancerous or like a regular cell. Then you train your model on it and the model can actually help you infer whether something, a picture that you upload is cancerous or not. That's how supervised learning works. Usually, Ryan, to answer your question, when you have a dataset and you want to do supervised learning,
00:16:33
Speaker
The best practice is to split it 80-20. You train your model on 80% of that dataset, and then you want to validate or test whether the model is actually performing in the right way using that remaining 20
Supervised Learning Examples
00:16:47
Speaker
% of the data before you actually release that model in the wild and do inference. 80-20 is a good rule based on just my research. But yeah, supervised learning implies the data is already labeled.
00:16:59
Speaker
Sorry, go ahead. No, I was just going to say, a lot of people probably have heard of, especially who are not necessarily in this technology world of chat GPT.
00:17:14
Speaker
it kind of like brought AI, ML into kind of how, how people can interact with it as an example, right? Big sets of data that it learns on is things you've already heard of, um, texts from books, Wikipedia, news articles, scientific journals, right? It's just consuming tons and tons of data. And that's like, uh, I think the version three, right. That it officially kind of came out and made a big difference, but like,
00:17:40
Speaker
That's what Robin's talking about, right? It's consuming tons of past examples and finding these sort of patterns, essentially, and being able to compare what a dog pattern looks like to another dog. So it may be bias, right? And that's probably something we'll talk about a little bit later. But anyway, I wanted to give that example since it's probably one that everybody's thinking about.
00:18:07
Speaker
Yeah, no, makes sense. And then I think the second, like, we'll quickly cover the remaining two subcategories of traditional machine learning, like unsupervised learning, where you don't have labeled data, it's unlabeled data, and you just throw your machine towards it and ask it to find patterns or trends.
00:18:23
Speaker
that may be even difficult for humans to figure out. So this is where people are building these recommendation systems where if you are shopping something on Amazon, it knows what to show you next. Or if you are, the simple way, like if you go to a grocery store, they know how to put diapers alongside beer. Again, humans would have never made that connection, right? But looking at all of that data that they had right now.
00:18:52
Speaker
But yeah, that's where unsupervised learning can come into the picture. Financial fraud detection. So unsupervised learning looks at your data and then it splits it into different clusters. So if something is outside that specific cluster, that means it's not part of that group. And that can really help in financial fraud detection. If something is outside that cluster of previous transactions or outside your usual pattern of spending money,
00:19:19
Speaker
It will generate an alert. It will let you know that, oh, this is not your usual behavior. So unsupervised learning can obviously help in all of those scenarios. And then the third thing is reinforcement learning. So reinforcement learning is where you don't actually have data. You don't feed data to your model. You just tell it that, OK, this is like you have the machine learn specific things based on trial and error.
00:19:48
Speaker
incentivize it to take the best action by establishing a reward system. So examples where AI and machine learning models have learned to play chess or go and beat humans at it, not because they were trained that, okay, this is the best move after this. They learned through series of trials and errors to figure out that, okay, if they did this, they'll lose the game. And they knew that
00:20:10
Speaker
the humans that trained these models shared the rule books with them. So the machine knew that, okay, if I do this, I lose. Okay, that's not the right way to do things. Let me figure out something else. And that's why, that's how they learn through this process of reinforcement learning and they figure out their best way to win the game. So like those are like three main subcategories when we're talking about machine learning.
00:20:35
Speaker
Got it. Got it. So we mentioned a little bit about chat GPT already, so we might as well ask the question. Yeah. So where does generative AI fit into these categories and the term LLM come into play? Yeah, for sure. So generative AI and LLM, I think you said this earlier as well, like they are subsets of deep learning.
00:20:57
Speaker
And deep learning can be considered a subset of machine learning. So I'm just trying to create smaller circles inside bigger circles. But yeah, that's where Gen AI fits. Gen AI and large language models or LLMs are a subset of deep learning. The difference here is instead of it being predictive, so instead of identifying whether something is a dog or a cat, it is discriminative or it is generative. So instead of
00:21:23
Speaker
It's identifying whether it's a dog or a cat. You tell it, or give me a picture of dog, and it will generate a new picture of a dog just by looking at the text that you gave it. And as you said, Ryan, this involves a huge amount of training, massive. GPT-1, I think, came back in 2016.
00:21:47
Speaker
It only got popular when GPT-3 came out last year. I think we passed its one year anniversary last month. GPT-3 came out because it was trained on a huge data set that it can actually predict or that it can actually generate things that look like human behavior for that artificial intelligence definition. That's when it gained traction. Generative AI or LLMs are just means to generate new things
00:22:16
Speaker
As you said, it's trained on those scientific journals or on those books. So if you give a generative AI LLM model 50 books about different communities' concepts, and after that, when you ask it a question about community storage, or let's say CSI, it's not going to find the right sentence in one of those 50 books. It's going to use all of the knowledge that it was trained on and generate an answer for you. So it's a weird thing where it's not actually
00:22:43
Speaker
searching for the right answer, it's trying to generate the right answer. I think generate is like the key word. That's why I like to like, I like to find the big questions. Which is why it also can be slightly misleading. Yeah. Yeah. Yeah. And like we'll talk about hallucinations in the next section as well. But yeah, as you said, if it's not trained on enough data or enough source content,
00:23:07
Speaker
it doesn't know enough. So it will just try to fake things because it understands how English works. And then it has some knowledge of everything else. I think one way someone explains to me was, you know, deep learning as a concept, right? When you have a when when something is deep, you have a lot of something.
00:23:23
Speaker
Yeah mentioned it has to be training a lot a lot of data and in this case right the one we're familiar with. With like chat to be is the fact that it's so much it's a lot it's deep a deep amount of data that is trying to mimic the way the human brain learns.
00:23:42
Speaker
So it's trying to mimic who we are, you know, what we're doing, who we are, how we respond. So it needs enough data from the internet, basically, to try to mimic us. And sometimes it's good at it. Sometimes it's terrible at it, right?
00:23:59
Speaker
I have not seen this in person because again I don't have any kids right now but the question for you right like when your daughter was growing up when you ask her something that she doesn't know about like she'll still be confident enough and just give you random answers right just to show that she can she can talk have a conversation with you but yeah or sometimes she just uses words she doesn't know right yeah but doesn't know what they mean she's just like yeah you know that yeah so you
00:24:26
Speaker
And once you teach her about a subject, like not a subject as in a person, but like the concept, then she'll be able to have like a proper discussion. So I think that's it with LLN.
00:24:38
Speaker
Like if the LLM is not training enough data, it's like talking to a kid, but they'll still see things. This gives me the idea. You remember that show that, are you smarter than a fifth grader, right? You remember that show? Nope. No. This was a show where like basically a fifth grade education, they would compare it to like a contestant, anyone who could come on there and try to be like, are you smarter than a fifth grader question? I would love to see this
00:25:05
Speaker
against an AI system where that is like trained online education. Anyway, total sidetrack, but I like that. Like I would watch that show, at least for a season, like then it depends on how they produce it. So speaking of LLMs, what is sort of
00:25:30
Speaker
So what does model training look like? And how do sort of like the inferencing differ from traditional ML versus Gen-A?
Model Training vs. Inferencing
00:25:39
Speaker
Yeah, for sure. So like with the traditional machine learning, right? We spoke about like the data prep stage, the model training stage, and then inference. In that lifecycle or in that pipeline, you needed a lot of compute resources or GPU resources to do the actual training.
00:25:58
Speaker
Like so it was more accessible to every organization because maybe you can train a computer vision model for your manufacturing facility by just having access to like four or eight GPUs and like you train the model on that and when you wanted to run that model.
00:26:14
Speaker
you may maybe even don't need a gpu or maybe a cpu work for you so like you can deploy these like you can train those computer vision models inside your data center and then you can deploy it at each manufacturing facility without the need for gpu's so like the resource intensive part was like on the inference side was lighter
00:26:34
Speaker
With LLMs and GenAI, we keep hearing the amount of resources and amount of money being pumped into these startups like OpenAI, like Anthropic, like the larger companies like Google Cloud and Meta. Because again, only those guys can actually afford to train these large language models because they are really resource intensive. You can't even imagine the number of nodes.
00:27:01
Speaker
I still remember the OpenAI article, which I'll refer to later in the episode as well, when they spoke about how they were training their model on 2500 Kubernetes nodes and there were a lot of GPUs involved in that. And this was a KubeCon Europe talk back from 2017.
00:27:19
Speaker
And just imagine, at that point, their GPT models were not even good enough. So they published a blog in 2021 where they were now running their training on 7,500 nodes. So the amount of compute needed is just crazy. But the one thing that I wanted to highlight is with Gen AI or with LLM, right?
00:27:41
Speaker
Forget about training because only a few companies can actually do that. Even for inferencing, these models that are generated, like the LAMA 70B, that's the LAMA open source model from Meta that's been trained on 70 billion parameters.
00:27:57
Speaker
it actually needs hundred and forty gigs of gpu memory to run inference on so like even like forget about running these things on cpu because it will take days if even if it gets deployed to get you a simple answer but even you can't like do it on your laptop which has which maybe has gpu on it you need a proper
00:28:17
Speaker
compute environment to actually perform inferencing as well. So I think that's the difference I wanted to highlight. Even with these newer models, even inferencing has become so resource intensive than the previous generation.
00:28:30
Speaker
Yeah, not to mention the power consumption per GPU that some of these can take too. So it's not the most green effort and probably could use some more thought. Yeah, that's a whole nother podcast. Yeah, it is. But I was listening to a different podcast and they had a sponsor, which is a GPU cloud provider. All they do is provide you GPU instances. And the way they have done this is instead of relying on traditional electric
00:28:59
Speaker
systems, power grids inside countries, they have started building these near
00:29:05
Speaker
like outside in the middle of nowhere where they can stretch network cables, but they can have their own nuclear plant or they can have solar farms. They can have like hydroelectricity being generated. So like they're trying to do some of these things by looking at other sources of energies rather than just using traditional electricity. But yeah, as you said, like all of these things require so much, so much compute and so much power.
00:29:33
Speaker
Yeah, it's crazy. But one good thing that I like is, hey, I was never a big fan of crypto. So instead of using all of the same amount of power to do anything in crypto, I think I like the AI thing better. Let's use this. We're just going to ask AI to do it for us next. They're still going to do crypto mining. They're just having AI do it for us.
00:29:54
Speaker
So it's going to be, you know, turtles all the way down still. Yeah. I know. And media stuff just keeps going up. Exactly. Exactly. All right. So let's bring it back to like reality, right? So let's talk about how people are starting to use LLMs in this case, besides our podcasts coming up with kind of silly questions to ask our guests, how are actually companies being used and how are they kind of fine tuning those?
00:30:23
Speaker
Yeah, I have a few examples, Ryan, so feel free to share anecdotal things that you have heard about in the industry as well. For me, I've seen vendors in the ecosystem build chatbots. That's one of the simplest use cases. People know how to use them, right? That's the real audience.
00:30:40
Speaker
So it made it accessible. It has the ability to summarize your capabilities. It has the ability to, if you point it to like your internal repository of documents or policies, it can go and look at all of those things and then give you
00:30:55
Speaker
Accurate answers almost so like i think that's important i like the idea that you had i was just talking to somebody like a friend of mine who works for a consultancy like they get all of these. Infrastructure documents documents from their clients.
00:31:12
Speaker
like 40 PDFs. And they have to like summarize everything and then make sure they have all that information ready. They have started using like the AWS bedrock service, uploading all of those documents there. And then all they need to do is just ask that chatbot a question like, okay, what was this about? And then it just based on the data it got from the 40 PDFs, it gives you the right answer. So it is saving some time and some effort for people. So I like that.
00:31:40
Speaker
But it's, I mean, it's still definitely relies on the individual, right? To interpret the response, right? So that like, that's the whole thing here is that.
00:31:51
Speaker
you know, false information is a real problem, right? Yeah. We, we assume that this chat bot is smarter than us and whatever it spits back to us. Like people believe anything on the internet. You already know this. Most people already know this. Unfortunately. People follow our podcast. Like I'm sure we can, we can say Kubernetes is not great. And then I'm, I'm hoping at least one person believes it. Come on, man.
00:32:18
Speaker
So how does this occur in LLMs? I think you mentioned it before, hallucinations actually happen, right? What does that actually mean? And how can we kind of combat this? Yeah, for sure. I think I just thought about a different use case. So I'll talk about that before moving to hallucinations.
00:32:39
Speaker
This was an interesting thing, dude. It blew my mind. We have been talking about text-to-text LLMs, but there are also image generation LLMs. If you go to things like stable diffusion, just go and look up Microsoft Designer. There are so many things out there that just by giving it text, it will generate an image for you.
00:32:59
Speaker
like in the gaming industry whenever developers are working on games they actually need designers on staff to design how different chairs or how different bars tools or how different buildings would look like and I think the example was they have to make sure that couches in all different rooms inside the game have to be different and they paid a lot of money to designers now all they need to do is just notice anyway like yeah
00:33:24
Speaker
Nobody notices, right? But if you see the same thing over and over again, it will matter. So like to create that good user experience for that gamer, they were hiring designers before this. Now, it's like they're just generating new images based on a model and it's giving them a hundred different types of couches, which solves and accelerates so many things. I don't know. When I heard about it, I was like, scary too, though, because like, I mean, in that perfect example, right?
00:33:50
Speaker
I think which many people might be afraid of is like, is AI going to take people's jobs? The reality is, yes. But at the same time, AI will also produce some jobs and how to design and architect these things. I'm not predicting the future, but there's some balance that will happen. But yes, the reality is it will take the job of some people.
00:34:17
Speaker
Yeah, I think that's that's the upskilling that's needed, right? And hopefully we can help people part and be part of their journey as they're learning more about things similar to how we are learning more about things in real time. So hopefully we can keep doing this podcast for longer and not just have AI replacers too. Okay, now we'll just feed it all of our episodes and tell it to come up with new ones.
00:34:39
Speaker
Yeah, I think somebody did an experiment that's worth trying once. No, for sure, like for the community's bites podcast, like the all in podcast, it's like a popular like top five podcasts in usual all the ratings, not not a technology podcast. But people took all of like their library for the past two years, fed it to a model and it had a generate
00:35:03
Speaker
an actual podcast episode, and it was good. Someone already did it. Yeah. Was it good? It was. It faked things, and that can tie us back to hallucinations, but it had the ability to... I think we should produce a Kubernetes Bites episode generated by AI. I feel like it needs to happen. Yeah, let's figure it out. If we can facilitate that, let us know, because I'm so interested.
00:35:27
Speaker
that's true. We can we can take a vacation like we can go sit on the beach and then the podcast we just keep to see what it comes up with. I'd love to hear the interpretation of Bob and Ryan, you know, that's true from the perspective of that. Or if it just creates a guest like what you know, what would it do? I don't know. I would like that. Yeah, it just makes someone up.
00:35:49
Speaker
Like if you look at our historical information, most of our guests are from vendors. So like maybe it creates a new startup idea, dude, like on its own. Yeah. I'm just, I'm still, I'm very interested. Okay. Coming back to hallucination. So, uh, so lack, as we discussed, like lack of information or lack of enough data set that it's been trained on leads to hallucination. So it's trying to like generate text based on what it has been trained on. But if it wasn't trained on the right amount of data, it's kind of useless.
00:36:18
Speaker
A good example of this that I've heard too is like documentation is key for anything. Really, really anything. Any job profile, any Democrat, whatever it may be. It doesn't even have to do with technology.
00:36:35
Speaker
Documenting that thing is key. But we've all heard the term, especially in technology, tribal knowledge. There is tribal knowledge within all aspects of life, technically. To a certain extent, a model can't learn
00:36:53
Speaker
Any travel knowledge, right? Because that's something that someone else knows it's not out there in a book or on a website. And these are these gaps that it kind of fills in. And you're like, someone might know the actual reality of that. But no, the AI's just job is to just fill in what it thinks might be there. Yeah. Although, right. So like,
00:37:13
Speaker
It's a statistical model. It's just trying to figure out what's the next word that would make sense in the sentence and can just make things up. I remember when chat GPT 3 came out and we got free access to it. Okay, let me try it out. I just asked it for a space like, oh, can Portworx do this? And obviously it wasn't something that it can do.
00:37:38
Speaker
but it generated like an entire doc page like, Oh, this, these are the steps click here on this button. Like those things. Okay. No, but yeah, there is a way to avoid hallucinations and that can like,
00:37:55
Speaker
Obviously, the people that are building these models can keep training it on more and more data. I think GPT-5, whenever it comes out, it will be trained on a trillion parameters and that's a whole lot of data. But right now, you can use your existing things and if you're creating these chatbots or if you're creating your own custom LLMs based on open source LLMs, you can use something called as vector databases and use that as the source of current information.
00:38:25
Speaker
So, let's say, for example, if the Llama 2 model or the Claude model from Anthropic has been trained on everything, but it understands how English works or English semantics works, it understands English grammar, but it doesn't really know anything about Massachusetts state law.
Vector Databases and LLM Accuracy
00:38:44
Speaker
So, if you ask it a question like, oh, can I wear two earbuds while driving?
00:38:48
Speaker
It will try to come up with something which would make sense like, yeah, it doesn't care. But this is something I've learned from Instagram, that if you're a Massachusetts resident, you can't drive your car with two airports. Somebody can penalize you for that. You can get a ticket for it. So, Massachusetts... Instagram might not be the best source of truth either, though. That's true. Come on, Ryan. Don't chat.
00:39:15
Speaker
No, agreed. But Massachusetts can take all of its state laws, put it in a vector database, and then when somebody asks that LLM more information about this specific law, it can give... Yeah, like a state-based model that someone, they can use. Yeah, absolutely. So it's like...
00:39:34
Speaker
creating these custom models or fine-tuning these models or creating these architecture, something called as retrieval augmented generation. So it's trying to retrieve data from your vector database and then it's using that retrieval to augment the generation that it's doing, so rag models. So in rag architectures, you'll always have a vector database that has the current information or that has the relevant information.
00:39:57
Speaker
and then the LLM before it responds to something queries the database and then gives you a better answer than instead of hallucinating in the first place. So a vector database like that's a new type of I don't want to say it's a new type of database like it is it is okay you can use
00:40:17
Speaker
semantic search capabilities or vector search capabilities on even PostgreSQL or MongoDB or Cassandra, for example, things that we are very familiar with. So it gives that database the ability to specifically store something called as vectors. And vectors are just mathematical representations of objects in a multidimensional space. So I think one easy way that I visualize this, or it helped me visualizing this, if you are visualizing like
00:40:46
Speaker
type of clothes. If you have a shirt and if you have a pant and a skirt, you will have the pant and skirt that's closer to each other because both of those are on your lower body and then the shirt is in a different space. So if you are looking for a pant suggestion, it would go and look at the space near where the other pants are.
00:41:07
Speaker
But the pants are not stored as pants, right? It's stored as a mathematical or like a numerical representation that is not human readable. It just stores it in that vector database. And then when you ask it for something, it tries to find that information using something called vector search and looks at that information for you. Yeah. And like a lot of people are familiar with sort of XYZ in terms of dimensional space. Yeah.
00:41:29
Speaker
That's a very consume like consumable type of information when you think about vectors and and I think they use the term embeddings right commonly in vector vector embeddings think about that kind of like expanded to many other dimensions as well right so you have this kind of like
00:41:50
Speaker
mathematical representation that is beyond XYZ and can can represent many things. And the reality of it is it's it's very hard to represent something in something that humans understand. So yeah, doing it in in in some, some other future understandable format, mathematical format is what a vector is kind of doing for you. And so it's, it's, you know,
00:42:14
Speaker
It's probably that's one way to think about it that maybe it made it make sense a little bit more for me Just because I'm used to those And I think there's also a second difference where right like in traditional or relational databases you're look you're trying to Find a specific row in your database based on a private key or primary key. Sorry, not a private key
00:42:38
Speaker
Vector databases, you're not trying to find the exact answer. You're trying to find the nearest neighbor or something that's closest to it. And then there are different search algorithms like K nearest neighbor or just nearest neighbor algorithms. And these can be fast if given enough resources or slow if you want a really specific answer, close to exact answer as it can. So the way you search for these vectors can completely differ.
00:43:06
Speaker
Again, as Ryan mentioned, XYZ is still imaginable right now, but it can be any dimensional and these are just stored in random numerical digits that you can look up information but you have to convert your text data into vector embeddings and then search your database for it or something that's close to it and that's how you find relevant information. One example, one simplified example that I've seen over time is where
00:43:36
Speaker
Maybe you've been used to using HTML or computers at a basic level. You might have heard of RGB, the system that allows you to describe a color via numbers. Well, RGB is technically a vector system. The color green can be represented as 6, 205, and 0.
00:43:56
Speaker
Technically. What does that mean to a person? Absolutely nothing. That is a vector example, a simplified one because there's not that many dimensions, but it is one version of how that works. We see a lot of vector databases getting popular. Last week at AWS reInvent, for three of your
00:44:20
Speaker
database services, they announced vector front-end, so you can actually use them as vector databases. You can use Cassandra in a vector mode. I know data stacks is...
00:44:30
Speaker
I don't know, doing a lot of work around enabling the community around vector databases. So go and check out their hands-on trainings or webinars. And there's the data type, right? Yeah. So other databases can adopt this data type. Yep. Yeah. It's just about half of this. They are storing things. And if they can actually look up that information, so the semantic search capabilities as well. MongoDB has a solution that can work.
00:44:55
Speaker
There are so many of these solutions. There are newer vendors like Pinecone and VV8 as well, but that's where we start now slowly pivoting to Kubernetes. You can actually run these models for inferencing and you can run these things, run these vector databases on Kubernetes and build your own retrieval augmented generation models for your own use case. And that's how I think after 45 minutes we lead into Kubernetes.
00:45:21
Speaker
Yeah, exactly. So I will try to find a link to a talk that I forgot his name from VMware, who did a great talk on vector databases at Data on Kubernetes Day. And with that, we'll lead into, so how does this all relate to Kubernetes, right? What are we talking about here?
00:45:39
Speaker
Since Kubernetes is so great, it's so great, but in all seriousness, Kubernetes solves for so many things already that it makes sense to start from a well-defined starting point for running artificial intelligence workloads. You don't have to go back to running things on bare metal and figuring out your own way to orchestrate your
00:46:02
Speaker
models or orchestrate your applications that are using AI, you can still use Kubernetes for it. And the things that I have learned out of all the benefits that Kubernetes provides, a few things that really stand out when we're talking about AI or ML workloads is the shared infrastructure access. You can open AI, for example, build
00:46:23
Speaker
this huge cluster with like 7,500 nodes. So it wasn't like multiple different clusters. One big cluster for all of their data scientists and all of their AI researchers to use. And it can share infrastructure very efficiently. So Kubernetes can handle the orchestration of a specific request coming in from the data scientist on a node inside your cluster that already has certain resources. So the shared infrastructure component is really cool.
00:46:50
Speaker
They are doing so many good things with their device plugins that they have built with Kubernetes or built as add-ons to Kubernetes where you can actually share your GPUs concurrently. There are different ways. I know we have covered this in news like a few months back.
00:47:08
Speaker
They have a way where you can time slice your GPU, so like entire GPU, but it's sliced, so everybody gets access to it for some time. You can actually run a multi-instance GPU where you can split that entire A100 or H100. Those are really hard to combine, but you can actually break it down into smaller chunks and then everybody, every scientist can get
00:47:29
Speaker
a dedicated instance of their GPU that they can perform their training and their inference or whatever they want to do with it. I think the shared infrastructure access with these and Nvidia enhancements is really cool. Portability, right? It's similar to like the concepts around containers in general, like C groups, right? Being able to split up a CPU for various processes and adopting those kind of techniques for ML and AI and stuff like that. So, okay. So Kubernetes.
00:47:58
Speaker
You're able to run these types of workloads
Kubeflow and ML Workflows on Kubernetes
00:48:02
Speaker
on. What are the types of projects that we're talking about, right? I know Kubeflow is an example. It's been around for a long time. Maybe we can talk about that. And there's some new ones like Cubray, MLflow, and some others. Let's dig in there. So Kubeflow, as you said, has been around for a while. They just came out with their 1.8 release a couple of months back. But it's an open source project that's aimed to make
00:48:24
Speaker
or enable users to build their machine learning pipelines or workflows on communities and make them simple, portable, and scalable. They have a lot of components, but I'll focus on two today, like notebooks, right? So Jupyter Notebooks, if you are doing anything around machine learning, you need an IDE, and Jupyter Notebooks have become like the de facto standard. That's what everybody uses. That's what all the data scientists use to train their models or to do data preparation.
00:48:51
Speaker
in order to train their models, everything is done from inside Jupyter Notebooks. You can deploy your Jupyter Notebooks on your laptop or your MacBook, on your virtual machine, on a cloud instance, on a bare metal node, you can deploy it anywhere. But Kubeflow allows you to deploy it on a Kubernetes cluster. So let's say, Ryan, you are a data scientist, you come into the Kubeflow UI, you ask it for resources like, give me two CPUs, four gigs of RAM, and then give me two GPUs.
00:49:17
Speaker
and 100 gigs of persistent volume or 100 gigs of my workspace storage. Jupyter or Kubeflow basically takes all of these requirements and then spins up a pod on one of the nodes in your Kubernetes cluster with these limits or with these requests set. And I like the move from AWS where they open source the carpenter project, which used to be something that only worked with EKS. Now, if you have carpenter configured on your underlying Kubernetes cluster,
00:49:44
Speaker
it can actually auto scale your cluster. So you don't have to have GPU enabled nodes always all the time on your cluster. Whenever our data centers actually needs a GPU enabled node, that's when Carpenter can add to your cluster and deploy that pod and they can perform your training. And once it's done, Carpenter can get rid of that GPU enabled worker node. So that's really cool. And then the second thing was the ability to run these machine learning pipelines using the
00:50:08
Speaker
using their module called kubeflow pipeline. So it allows people to build these directed acyclic graphs or DAGs or DAG architecture which shows all the different phases in your machine learning pipeline. And then each of that stage when it's actually running gets deployed as a pod and it does certain things and then it
00:50:25
Speaker
stores some data somewhere and then it goes on to the next stage. But the ability to run these machine learning pipelines allows for experimentation and hyperparameter tuning where data scientists, if they are going to run some experiments of their own on a specific model that they want, they can do that from the Kubeflow UI. They can share those models across different users or different data scientists on the same team. So like all of these features have made Kubeflow like a really important part of the ecosystem when we're talking about Kubernetes.
00:50:54
Speaker
Yeah, and pipelines, the way that I think we think of them in CI-CD in general, it's similar in the Kubeflow. It's doing a particular thing, right? In AIML kind of training, we're kind of learning, there might be steps which might transform a piece of data to be more understandable for another step, right? Yeah. So there's all these type of things that you might put into sort of steps, and you want a structured way to understand that, and that's really what the Kubeflow does.
00:51:23
Speaker
And it gives you a neat little UI where you can actually look at your logs that your pods are giving out. You can actually look at summarization tables, things like that. So it's super cool. One thing that OpenAI and Anthropic do that's weird. It's weird to me. They don't use PVCs and PVs.
00:51:41
Speaker
For the model training, they need a lot of data, but instead of storing inside Kubernetes, they just use S3. I don't want to worry about spin-up, spin-down times, attach, detach operations. I'll just use an S3 target as a checkpoint way. If you're training a model, you're running that single job for a long, long time, and you don't want it to fail two days in, you spent thousands of dollars on those GPU worker nodes, and you didn't get anything in return. At every stage of the pipeline,
00:52:08
Speaker
these components can actually checkpoint and store something in that S3 bucket. So if it fails after that, it can go back and resume from that checkpoint or from that milestone and continue working. So you're not losing all the money that you were spending as part of that experiment. Nice, yeah.
00:52:24
Speaker
Gotcha. And then the next few tools that we have, MLflow, this is becoming a really popular open source tool as well. So it doesn't do everything that Kubeflow does, but it does some things really well. It helps you keep track of your machine learning experiments, and it records and compares different model parameters. So for hyperparameter tunings, hyperparameters are basically
00:52:45
Speaker
parameters that are external to your model itself that you can control from outside. So you're trying to figure out what parameters these should be and that's what the exercise is called. MLflow actually helps you keep track of these, evaluate the performance, it also helps you create like a model registry so it manages your artifacts for you.
00:53:04
Speaker
So if like Kubeflow gives you the ability to prep your data, build your models, run some experiments, MLflow can provide you a model repository, give you the ability to monitor your experiments, store all of your logs and compare your performance. So that's how both of these things can differ. You can actually use them together and build like an end-to-end solution. And then. Yeah. And I was going to say, we can't go very far without mentioning hugging face again.
Community and Collaboration with Hugging Face
00:53:33
Speaker
Yeah, once you have a model that you're ready to share with the rest of your team, yes, go use hugging Facebook. If not for the name alone, just go use it, right? I know. Yeah, create that sense of inclusiveness. That's what we miss. Anyway, go check it out. It's definitely a way to be part of the AI community and learn about other people's models and share them as well.
00:53:58
Speaker
Okay, two more things and then we can start wrapping this up. Cubray. So, Cubray is the Kubernetes way to deploy array clusters. I was looking at multiple talks like at Uber, right? They built their whole Michelangelo AI platform for years now and initially when they started pre-2017 or pre-2018, it was all based on Apache Spark and that's what they were using.
00:54:21
Speaker
But then they figured out like Ray, the open source framework, unified framework for scaling AI and Python applications. They like that better than Spark for their machine learning workloads. So Cube Ray is just taking Ray functionality, bringing it inside Kubernetes. That's all it is. So instead of having long running Ray clusters that are maybe not utilized at 100% of the time, because that's a real challenge when you're dealing with AI ML workloads,
00:54:47
Speaker
Cubray allows you to have a CRD on your Kubernetes cluster. And then whenever a data scientist needs a ray cluster to perform certain tasks for them, that's when they'll actually spin up a new custom resource inside your Kubernetes cluster with the head node, with the worker nodes inside your ray cluster. So if you are a data scientist, that functionality isn't changing for you. You're still getting a ray cluster, complete access to it, to do the same things that you were doing. But doing it on Kubernetes makes it more resource-efficient.
00:55:16
Speaker
always need to have that running continuously. So it's using, bringing the best of both worlds, right? Like for Kubernetes admins, nothing new. They can start supporting new teams now. And then for data scientists or AI engineers, they can still continue using the array frameworks or the array models or array.
00:55:35
Speaker
frameworks to run their AI and Python applications. So I think that's another project that's super cool right now.
Kueue Project for Batch Workloads
00:55:43
Speaker
Again, from my description, it's clear that I'm a one-on-one level expert. So we'll definitely try to get more people who can shed some light on it.
00:55:50
Speaker
And then one final thing was the new open source project, I think since last year called Q and Q with anything. Not with the letter Q. Yes, exactly. Very clever. It's Q-U-E-U-E. That's what it's called. So this is something I learned. K-U-E-U-E. Oh, I'm so sorry. Yeah, I just did that. Come on.
00:56:13
Speaker
Yeah, okay. But this is something I learned at the supercomputing conference. In traditional HPC workloads, traditional AML workloads, these are long-running jobs, but they are batch jobs. Kubernetes, as we know, it's really good at orchestrating workloads, but it's not good at batching things. If you create a Kubernetes job or a Cron job object, it will create that pod when it's triggered.
00:56:39
Speaker
Q brings that additional batch scheduling capabilities to Kubernetes. It doesn't interfere with how Kubernetes actually orchestrates your pods or their application, but it helps you decide when a job should wait, when a job should be admitted, so a pod can be started or created at that point, and when a job should be preempted, so active pods can be deleted. So Q adds additional batch management capabilities on top of Kubernetes.
00:57:07
Speaker
Absolutely. Yeah, I mean, there's, there's, there's so much, there's so many projects in this space too, right? Yes, literally talk for days. I think if we've incited you to go learn or learn more about something or go research something, we've done our job here.
00:57:25
Speaker
today, I think, because it's a really interesting space. There's a lot going on in the Kubernetes world. There's a lot of interests and a lot of new places to get involved start contributing to, right? Whether that's Kubeflow, Maflow, all these other projects, or even just kind of your own research or even trying things out. I think there's so many places to really get involved.
00:57:45
Speaker
And if you want to do those things, great. This is a place to start to kind of intrigue your interest. We will definitely, like Bob and said, have a whole bunch of future guests on to kind of dig into these topics, because I think they're the very top of mind, like we said.
00:58:03
Speaker
to the whole community. And if you're one of those people, let us know. We'd love to have you on the show. I know. Yeah, we would love to have you on. There is so much content out there. Dude, I've been spending the past two, three months learning more about AI and ML, and this is what I've learned. But that's it. This is where I stop. There is so much more to learn and so many new projects to explore. So yeah, we would love and appreciate any experts if they want to join us and share their knowledge with our community.
00:58:28
Speaker
Awesome. Well, we'll put as many links as we can in the show notes that we talked about in today's episode. Of course, join our Slack to let us know what you like, don't like, share some links, whatever it may be.
00:58:43
Speaker
suggest some episodes. You can find that at KubernetesBites.com. Top center of the page, join our Slack. That's the best place to do it. Of course, subscribe to our YouTube channel if you haven't yet and give us a review on Apple Podcasts if you want to give us some praise or just... Oh, some love. Yeah. Either way, we'd love to hear it. Yeah.
00:59:11
Speaker
with anything, any final thoughts from you? No, I think this is a really highly changed, rapidly evolving ecosystem. So we're learning together, let's do it. If you want to start conversations on Slack and we can collaborate, for sure, I'm looking forward to it. But that's it.
00:59:30
Speaker
Yeah, let's do it. I mean, I had a conversation today at my day job about using this type of technology to help just internal reasons, right? And there's so many really cool use cases that you can apply. And it's really fascinating once you start digging into it. So we fully encourage it for you to kind of get your feet wet, so to speak, and go way beyond what we just talked about today.
00:59:55
Speaker
Anyway, I'll leave you with that and that brings us to the end of today's episode. I'm Ryan. I'm Paul Vann. And thanks for joining another episode of Kubernetes Spites. Thank you for listening to the Kubernetes Spites podcast.