Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Running Ray on Kubernetes with KubeRay image

Running Ray on Kubernetes with KubeRay

S4 E17 · Kubernetes Bytes
Avatar
1.2k Plays3 months ago

In this episode of the Kubernetes Bytes podcast, Bhavin sits down with Kai-Hsun Chen, Software Engineer at Anyscale and maintainer of the KubeRay project. The discussion focuses on how the open source Ray project can help organizations use a single tool for data prep, model training, fine tuning and model serving workflows, both for their predictive AI and generative AI models. The discussion also dives into the KubeRay project and how it provides three different Kubernetes CRDs for Data Scientists to deploy Ray clusters on demand.   

Check out our website at https://kubernetesbytes.com/  

Cloud Native News:

  • https://azure.github.io/AKS/2024/08/23/fine-tuning-language-models-with-kaito
  • https://orca.security/resources/blog/kubernetes-testing-environment/
  • https://www.redhat.com/en/about/press-releases/red-hat-openstack-services-openshift-now-generally-available  

Show links:

  • Kai's LinkedIn: https://www.linkedin.com/in/kaihsun1996/
  • KubeRay doc: https://docs.ray.io/en/latest/cluster/kubernetes/index.html
  • Ray Summit registration: https://raysummit.anyscale.com/flow/anyscale/raysummit2024/reg/createaccount (code: KaiHsunC15)
  • KubeRay repository: https://github.com/ray-project/kuberay
  • Ray repository: https://github.com/ray-project/ray
  • Ray Slack workspace: https://docs.google.com/forms/d/e/1FAIpQLSfAcoiLCHOguOm8e7Jnn-JJdZaCxPGjgVCvFijHB5PLaQLeig/viewform  

Timestamps: 

  • 00:02:40 Cloud Native News 
  • 00:07:20 Interview with Kai 
  • 00:49:15 Key takeaways
Recommended
Transcript

Introduction to Kubernetes Bites Podcast

00:00:03
Speaker
You are listening to Kubernetes Bites, a podcast bringing you the latest from the world of cloud native data management.

Cloud Native News with Bob and Shaw

00:00:09
Speaker
My name is Ryan Walner and I'm joined by Bob and Shaw coming to you from Boston, Massachusetts. We'll be sharing our thoughts on recent cloud native news and talking to industry experts about their experiences and challenges managing the wealth of data in today's cloud native ecosystem.
00:00:30
Speaker
yeah Good morning,

Post-Labor Day Weekend Reflections

00:00:31
Speaker
good afternoon and good evening wherever you are. We are coming to you from Boston, Massachusetts. Today is September 5th, 2024. Hope everyone is doing well and staying safe. um Hope everybody had a good Labor Day weekend.

Personal Travel Decisions

00:00:43
Speaker
Personally, I didn't travel anywhere. I i think I learned my lessons a couple of years back that um Traveling anywhere on long weekends is just just crazy there are always long queues and starting everywhere from the airport to national park entrances to restaurants trying to find a reservation and. I think two years back we decided like maybe it's not meant for like traveling outside of Boston but maybe we can travel in in new england and just take a road trip somewhere but.
00:01:08
Speaker
That also didn't work out a couple of years back when you were driving back from Acadia National Park and a four-hour journey ended up being an eight-hour journey. So yeah, I stayed local, hung out with some friends, played some board games. So that was a fun weekend for me. I hope everyone had a great Labor Day break as well. But at least for the people in the US, like ah if if you are listening to us from outside the US, hopefully you have a long weekend coming soon and you can you can take a break ah from from the day to day.
00:01:38
Speaker
Okay, so ah today we have an interesting episode, I think, with with a lot of episodes that we have done this year.

AI/ML Tools in Kubernetes

00:01:44
Speaker
It's in the AIML team trying to present more and more tools that are being built, are being modernized, are being adopted to work for AIML or for Kubernetes, one view of the other, right? So like existing AI ah frameworks, ah machine learning frameworks that are now adopted to work on Kubernetes or ah new projects that are started with Kubernetes that now also support the AI use case.

Cubre Project and AI on Kubernetes with Kai

00:02:07
Speaker
ah for for For this episode, we are talking about the cube-ray project. We're going to ah ah talk about how existing technology around the ray project can now be adopted ah using using ah Kubernetes. so ah for to To discuss that, we have Kai from NECL joining us. He's a software engineer and is an active part of the
00:02:29
Speaker
ah Cubre community and active maintainer of the Cubre project, so I'm excited for that discussion. ah But before we bring Kai on the podcast, right a few news items to cover as with our usual flow.

Fine-tuning Language Models on AKS

00:02:41
Speaker
I have three articles for us today. ah First one is a fine-tuning example using the Kai to project the Kubernetes AI toolchain operator. And that's something that we just discussed a couple of episodes back with Sachi and Paul, but ah yeah Just researching around that episode and then even talking to those guys, it felt like I didn't know enough about fine tuning. So I was glad when Sachi ended up publishing like a blog on Azure's GitHub.io page around how you can actually fine tune language models using Kaito on AKS. It's a great example we link in in the show notes, but ah it shows you how you can use different
00:03:16
Speaker
um of more but parameter efficient fine tuning techniques like LoRa or low rank adaptation or Q LoRa quantized low rank adaptation ah to fine tune and ah the phi small language model with custom data set that's hosted on Hugging phase.
00:03:36
Speaker
Dude, I was so impressed because it it it look it makes it look so easy. right You have a YAML file where you can specify your preset or the model that you want to fine tune. You specify the method, ah the LoRa or QLoRa that you want to use and then link to the dataset. and then you run that as as as Kubernetes pods for the actual fine theory exercise, which takes like two, three hours for sure, even with the small dataset and the small language model. But a great example that helped me learn more. So I thought, let me share it and um with with our community as well. So that's one.

Orca Security's Open-Source Testing Environment

00:04:08
Speaker
Second one is a new open source Kubernetes testing environment from ah ah Orca security. They are a vendor in the Kubernetes DevSecOps security ecosystem.
00:04:19
Speaker
yeah and KTE or Kubernetes testing environment is meant as an open source is built as an open source project for EKS, GKE, and AKS customers and it's designed to help organizations improve their security, ah Kubernetes security by providing a safe and controlled space to identify and and address potential vulnerabilit vulnerabilities before applications are being reported to production environments. so so By using KTE, organizations can simulate various attack scenarios, ah test security patches, evaluate effectiveness of security configurations and policies, all on these like test dev or staging clusters for their application before actually promoting everything to production. so
00:05:01
Speaker
um If you look at their documentation ah from our show notes, you'll see that they as part of these deployments, they do deploy ah industry tools like industry open source tools like the Trivi open source project from Aqua security. Now it helps with um container security or scanning those container images, generating S-bombs and things like that. so again it I think the the I interpret this project as making things easier for our customers instead of having to deploy these tools manually or figure out how to do it. Maybe do it once through KTE, maybe you ah twice through KTE, and then you can eventually get a hang of what tools work for you and maybe adopt those. so so Something interesting to try out, they have like a ah Terraform of
00:05:42
Speaker
thing, a module that's available, you can just select which clusters you want to deploy and it deploys it for you and deploys those tools on top as well. And then the final news item we have is is something that Ryan and I have discussed in the past.

Red Hat's OpenStack on OpenShift

00:05:53
Speaker
I think it was post Red Hat Summit or maybe pre-Red Hat Summit this year it was ah that Red Hat was working on um providing OpenStack as a tenant on OpenShift clusters, so like the ability to deploy OpenStack clusters on OpenShift clusters. um And again, when I read it for the first time, it felt weird to me. But then again, ah if you if you want to go back to that episode, I then i did have a conversation around
00:06:17
Speaker
ah It's just making the process of opening deploying like day zero operations and managing day two operations ah for deploying and managing OpenStack clusters easier. And if your if your organization is investing in OpenShift anyways and you don't want to use something like OpenShift wordt for deploying VMs, you would rather have a private cloud environment with OpenStack.
00:06:37
Speaker
oh that functionality is now GA or generally available. So ah if you want to try it out, OpenStack on OpenShift is now generally available from Red Hat as a fully supported thing. So those are the three news items.

Fall Industry Announcements

00:06:49
Speaker
um I know it has been a slow summer. I'm hoping that once fall kicks in, when all the conferences kick in, we have a lot more announcements to talk about. And man, I'm hoping that the ah the private market moves and we do do see some acquisitions or mergers or even people going public um after and after ah

Kai's Background and Work on Ray

00:07:10
Speaker
after the market opens. So ah with that, let's let's ah move to the the the interview section of the episode and and bring on Kai to the podcast. Hey Kai, welcome to the Kubernetes Bites podcast. I know that you are a popular guy this month. I already saw you on a couple of different podcasts. so I'm glad ah Kubernetes Bites also made the cut. ah But for listeners who haven't listened to you on any of the other podcasts, why don't you take a minute to introduce yourself and what you do at any scale? Yeah. Thank you. Yeah. My name is Kai Shin, and I'm currently a software engineer in the N scale. I'm primarily working on the RAID core routine, and I'm primarily working on two parts. The first one is the QBrace, the RAID on Kubernetes. has And that that's why I'm here. And the the other part is that NAND is a 30 DAG.
00:07:57
Speaker
It's a generation for the rate to support like large scale inference and training. Yeah, that's not a true current focus for me. Okay. Awesome. And it looks like you moved to the US sometime recently. What's the background there? ah Yeah, I think I i previously seen Taiwan. Okay. And just, you know, I work on some open source project and ah about the ML platform. Yeah. And ah And yeah, and I have a chance to know like a Ray and the Spark. Yeah. So I moved through to the US to start school and and go to the Debris and A scale. Yeah. and culture in and scale to walk out okay So it wasn't recent. Like it it, it's been a while since you went to school here. Okay.
00:08:39
Speaker
Yeah, yeah, yeah. it's So it's been maybe two years. Gotcha. Nice. That's awesome, man. ah So ah like you already spoke about or or spoke about Ray as as the thing that you're working on. And that's why I wanted to do this podcast. Right. I saw you were ah presenting in one of the iotal meetups and Madhuri had posted something about it. And I was like, OK, this is like a topic. I was looking for a guest and you seem like the perfect guest

Ray as a Computing Engine for AI Workloads

00:09:00
Speaker
to have on. So.
00:09:01
Speaker
I want to pick your brain and learn all about with the Ray project, what Qubray is, how people can use it for their machine learning and AI workflows. ah But before we dive into all all those kinds of details, let's start by talking about like what is Ray like as a project? like What is the Ray project and how does it fit into like ah an AI pipeline that an organization might be building? Yeah, I think i think you can think Ray is a computing and a distributed computing engine for the for the AI workloads. And I think one of the cool things is that you can see it's similar to like the MapReduce or Spark. But I think this kind of MapReduce and Spark are designed for like the data processing workload. So the interface is like a SQL or a map function or reduce function. But Ray is designed for the general purpose programming.
00:09:53
Speaker
Yeah, so ah you have a one-on-one mapping with your local programming, like the function class and the variable. You can find a distributed version of the function class and the variable that's in the in a array. So it's pretty easy to convert your local scripts into the distributed application with array, because it has a one-on-one mapping.
00:10:13
Speaker
Okay. And personally, I got to know of Ray after I learned about the cube Ray project, and then I saw how popular Ray itself has been. So can you also share some some of your thoughts around like what problems does Ray solve for the user, like for any anyone using those remote functions? Got it. I think a one of the stuff that to the race office,
00:10:36
Speaker
ah As I said, that is just designed for general programming. oh So so ah why do we focus on the AI workload for now? It's because I think AI workload is a very special workload. It includes a lot of different small workloads, like data processing, training, tuning, and And at each stage, you require a different infrastructure requirement. For example, like the serving, you require like ah to have a support auto scaling and the high reliability.
00:11:06
Speaker
Yeah, and the for the training, maybe you need to support against scattering. And for the data processing, maybe you require to support heterogeneous computation. Or maybe you need to use the CPU to do some pre-process and then use like a GPU to do maybe embedding generation. OK. Yeah, so so I think a framework that can cover the end-to-end computation is very important.
00:11:32
Speaker
And I think before Ray, I think the infrastructure is typically built for the microservice architecture. For example, I use a sound system for data processing, and I use another system for training, and I use another system for the service.
00:11:51
Speaker
And I think ah there are a lot of ah system very good at the specific kind of workload. For example, like a Spark is pretty good at their processing. yeah yeah But so maybe it is not that good at like ah maybe training.
00:12:04
Speaker
Yeah, and the for serving, is like ah they are like K-serve. K-serve is pretty good, yeah but it' maybe not so it cannot do you you cannot use K-serve to do the training. Okay. Yeah, so ah so before race, the are people required to use ah a lot of like a YAML file or like a lot of pipeline actuator to glue all of the system together, and it will increase the time for the prototyping to production for ML.
00:12:29
Speaker
yeah and array If you use Ray, you can run everything from data processing to serving that ah in ah with Ray. You can use a single infrastructure infrastructure to handle it. so I think that that's why I love to use Ray.
00:12:43
Speaker
OK, no, I think that makes perfect sense, Kai, right? Because that's why things like Terraform became so popular, because instead of learning how to do infrastructure as code for each cloud provider, like learning cloud formation and learning Azure arm and then running learning what Google Cloud has to offer in for ISE, instead of doing all of that, I can just use one tool and use it for for any any operation. So I think that that resonates with me, right? Like instead of learning or using different tools and then It adds overhead for both the actual users because they have to use different tools for different services, but also on the back-end operations team right because now they have to ah manage and maintain and offer these different tools as services, whereas Ray provides that unified architecture. so I think that that makes a lot of sense. and I had a question though. like You said ah but this is for distributed computing. right so in in in your regular functions, you can specify array remote as as a wrapper, as a decorator, and it it gets executed somewhere else. So can you explain or talk about like what that workflow looks like? Are those array clusters already provisioned? How does that work?
00:13:51
Speaker
Oh, you say ah how does the Raycasser provision and yeah how does the end-to-end lifecycle? Okay, yep cool. I think Ray itself is provided two deployment solutions. One is on the Kubernetes and the other one is on the virtual machine. And I think Ray on Kubernetes is supported by the project and the Kubernetes, which is, I maintain the project. and And I think a Kubernetes is a dominant solution for the ah Ray users to deploy RayCasser.
00:14:20
Speaker
Yeah, and I think Cura is a normal ah Kubernetes operator because I think this is a Kubernetes podcast. I assume everyone knows the hottest operator. Oh, yeah. Yeah, I think everyone knows that. Yeah, so define a CRD and it create a CRD and Cura will help you to reconcile the recursive or drop or service depends on your clothes.
00:14:43
Speaker
Yeah, so I think this is how it's reconciled. And then we provide different kind of CRD to support a different kind of job. For example, we have a raycaster to support a prototyping. And we have the ray job to support a best job. And it also integrate with some scheduler like a queue, like a volcano, like a unicorn.
00:15:03
Speaker
And it also supports the service. If you want to serve the model, and that it can you can it can support an auto scaling network. So for a different workload, we have a different deployment solution. um And on the application level, we build several AI library based on the record. okay ah Including like RayData, RayTrain, RayTune, RaySurf, RayIodip. And RayData is primary for the data loading and batch inference, embedding generation. And the RayTrain is for the distributed training and RayTune is for the hybrid parameter tuning. And the RayService is for the ah file for a model serving and to support the multi-placing model deployment. And the IODB is for the reinforced learning. Yeah, so I think we provide several AI library that cover end-to-end and the Ray is pretty
00:16:00
Speaker
Ray, if you are using Python, and it's very easy for Ray to integrate with the different Python libraries. So ifla if a Ray AI library doesn't cover use case, you can still, we have a lot of user as a user record itself, which they are all like Python dependencies. No, I think thank you for walking us through that, right? Like, I do want to get back to the cube Ray project. And I know we focus most of the episode there. But these four AI libraries, so do like Do I need to be an expert to use any of these? or These are just Python libraries that I can use. and and If I'm building like ETL pipelines and you want to use Ray to do that, the data library helps me do that. Is that is that the gist of it?
00:16:47
Speaker
yeah i think ah Yeah, I think you can still use the array to build to your data pipeline. um Yeah, to be honest, I think ah we have the some use case for it. yeah ah For example, like Emil recently moved one of their production cluster from Spark to the array. And the cluster is pretty big. It's SOP ice levels.
00:17:10
Speaker
Yeah, so it's a bite. Oh, it's a bite levels. Okay. Yeah. Yeah. yeah So I think it's a pretty but big cluster. Yeah, so, and I think it's, I think for, I think for this time, for Infra engineer or for the deterministic engineer, I think Ray is pretty easy to use. Okay. Yeah, because I also work on the Maradu and Spark before. I think Spark has a SQL, so if you write SQL, I think maybe Spark is very easy to use. But I think for Maradu, I think,
00:17:45
Speaker
Yeah, I work on some, I've published some papers I've already used before, and I would say that it's not everyone can understand how to write a memory reduced program. Yeah, so I think compared with that, I think Ray is pretty easy to use in the distributive framework, computation framework. Yeah, but honestly, I would say that it is still not so that's easy for a data scientist.
00:18:13
Speaker
I think it is it is one, but I think you still require a bit of a distributed system understanding. and Some experience, okay. and no And I think most of the people that are involved in these AI slash ML projects will need to have that as a prereg. Coming from a non-AI ML background, I can't just open my laptop, install Ray, and expect Ray to like do everything for me. I still need to know how to build ah those those pipelines on my own. Ray will help me with training and tuning as as well, as you mentioned. and This is not just for large language models. right This is also for what vendors have now started calling like predictive or
00:18:52
Speaker
or traditional machine learning. So those predictive models with supervised and unsupervised learning, they can help me with all of those models as well. yeah okay yeah And i had I had another question. like You brought up hyperparameter tuning. For people that are listening or hearing about that phrase for the first time, can you expand like what hyperparameter tuning means and how it translates into experimentation and but building a more accurate model? ah god I think hyperparameter tuning is that ah Because I think everyone, if you feel have experience to train the model, you will know that you need to tune the learning rate or batch size. And I think a rate teacher provides sound like look ah sound some sound algorithm that helps you to find the best solution, best combination of this kind of hyper parameter. And then you can pick the best one to train your model. OK. Yeah. So yeah, but honestly, I think it is not
00:19:50
Speaker
very I think hyper-parent tuning is not common in the LLM world because I think hyper-parent tuning is too expensive. yeah But I think it's common in the traditional ML workflow. Okay. And for LLMs, this is me being a a novice in this ecosystem. like I know with the open source models and with LAMA, like there are open weight models as well. like How do they translate? Can Ray help me manage my weights when I'm working with these large language models?
00:20:21
Speaker
oh Yeah, you you say for pre-training or fine-tuning? Yeah, pre-training and fine-tuning.

Ray's Training Efficiency with RayData

00:20:26
Speaker
I think that one good thing for Ray-train is that Ray-train provides an out-of-box like a training. Yeah, so I think if you use Ray trend to-train your to train your model or fine-tuning, I think the first one is said for the ah yeah i Yeah, I think the first one is for the photorents. And the other point that I want to mention is about the ecosystem.
00:20:56
Speaker
ah because I think Ray, it works very well with RayData. And RayData is a very powerful tool for data processing and like aa or data loading. um yeah So I think comparing with stacklog typically if a user uses a torch data loader, they need to run the data loader, do preprocessing with the um With their training workload, that's on the same GPU node. okay yeah so ah have standard And they have sound like a data dependency. You need to finish the data processing and the output data will be the input data for a training step.
00:21:41
Speaker
Okay, now so in the prep, but for maybe depth processing, if I have a a GPU on the single GPU node, but the pre-processing maybe only take one single GPU. yeah So the RL7 GPU will be idle. Yes. But so with the rate data is that you can do ah pre-processing that ah in other maybe smaller GPU nodes. Smaller plus maybe CPU based nodes as well. Okay. Yeah. yeah yeah and After you finish that, you can stream in the data to the GPU node that a you you want to use for the training. So I think it will be more cost efficient ah in this way.
00:22:17
Speaker
Okay, okay. Nice. no Thank you for taking all these tangents with me. I think these are this is super helpful ah for our listeners.

Ray's Collaboration with Kubernetes

00:22:24
Speaker
So ah you already spoke about introduced Qubray and how it it provides you with an operator and different set of custom resources. ah Before we dive into it, right I want to understand like,
00:22:35
Speaker
ah why the decision from the Ray community, and you can also talk about why AnyScale is is heavily involved in this ah ecosystem as well. like Why use Kubernetes as that orchestration layer? What are some of the features ah between Kubernetes and Ray that are complementary to each other, or is there any overlap that users should avoid or think about?
00:22:55
Speaker
guy um I say that at first is that that we are in a very early stage of the Ray. We consider to ah compete or ah collaborate with the Ray and the Kubernetes. app has yeah Because you can see that both of them are the resource architectures.
00:23:12
Speaker
But I think Kubernetes' yeah scheduling unit is a part. yes And the array is the task and the actor, which is a remote function or a remote class. And then I think finally, we decided to just collaborate with the Kubernetes ecosystem. It's because so we found that they are designed in a different perspectives. For example, like a lot Kubernetes mainly decides for the deployment, so deployment and the operation. So layer attack customer is like for the maybe infra team, platform team, and maybe DevOps team. So all of their APIs like the service, for a service discovery, for the deployment, upgrades, auto scaling. Yeah, but I think Ray is primarily for design for the,
00:24:04
Speaker
It's primarily designed for the computation. So if it's abstraction, it's the remote function, remote class. And the it's primary, the user primary is the platform engineer or the data scientist.
00:24:17
Speaker
so i So I think that that's why the Ray community decided not to compete with the Kubernetes and the Core level together. Okay. So that's the Ray can focus our more on the ah focus more on the computation side, like Integra was a different like a AI ecosystem and that you grew all ah andto and um ah the life cycle together.
00:24:41
Speaker
and the Kubernetes, and that we can cube raise the unlockable possibility for ready to leverage the Kubernetes ecosystem. ah For example, like ah ah for example, like ah ah ah it really i we rely on the

Qubray's Operation on Kubernetes

00:24:57
Speaker
login system that's in the Kubernetes, like a front bit, or maybe like a CloudWatch something like that. yes And we also rely on like a sound observer tool, like a poem issues or Grafana further for visualization and for the metrics.
00:25:15
Speaker
and we alsoly on ah some kind of like ah istio you lost stomach mismash yeah And we also some a lot of a different kind of ah Inquest controller, maybe AWS ALB or something like yes yeah So I think it's a perfect,
00:25:40
Speaker
ah And I think it's a perfect like ah collaboration for a different world, one for a computation and a one for a deployment. yeah So that's why we decided to leverage the Kubernetes ecosystem. No, I think that that makes sense, right? Because there's no point in reinventing the wheel and figuring out everything that Kubernetes has already solved for ah in the past 10 years. i know I don't know if you know this, but Kubernetes celebrated it like 10 years anniversary this year in June.
00:26:06
Speaker
So like 10 years of of code base where they have gone through iterations and learned from people running different kinds of workloads stateless and stateful on Kubernetes and build those some of these platform services. So Cubre can just or Rake and basically ah leverage those. I remember looking at a slide or a session from one of the previous years, any scale summit where they were talking about what you just described like like With Qubray, all the orchestration work is being handled by Qubray. The platform team can actually be responsible for your Kubernetes clusters and have that separation of duties. And then the data scientists, or people that are are working on AI workflows can use all the ray functionality. so
00:26:50
Speaker
It allows people to do their job while at the same time leveraging the benefits of both platforms. so I think that that makes total sense. so You said there is an operator, right? like so How do I deploy it? do i is Is there a specific... ah I know you said it works with CPU and GPU, but are there specific resource requirements? What are some of the steps that I need to use follow to get Qubray up and running on my cluster?
00:27:12
Speaker
Oh, I think it's pretty simple. Yeah, because I think you just need... I think Kubernetes is just a single part. Okay. And maybe I think Teamfire used maybe just one or...
00:27:24
Speaker
Maybe it will point to CPU, something like that. Oh, perfect. Nice. Yeah, I think it for some local deployment. Yeah, and you can try. And and I think we provide a lot of examples on the Ray website, for the example. And on the top, you can try to, like ah for example, like a do some batch inference. and Or for like law you can train some model. If you only have the CPU, you can train ne ah distributed training with MS, with a CPU, very simple. yeah And you can also serve the model with mobile nets for image imetric and recognition while um you were um on your laptop. And so I think I typically, when I demo, I just use the local kite cluster. okay So what I think it's pretty lightweight. and the
00:28:14
Speaker
And I think if you use like a GKE, I think GKE also, I think Kubernetes is also, I think maybe the one and I think maybe current is still the only one. AI will put AI down on the GKE cluster. So you can just click a button and the GKE will help you to launch the Kubernetes operator. Okay, that's awesome. And I think,
00:28:41
Speaker
Or if you also want to try it, you can just use like a scale. Yeah. Okay. Perfect. So ah you said there are also some CRDs and we listed like Ray cluster, Ray job, Ray service. Can you expand more on like what each of these CRDs mean and what functionality can users expect? Like by Ray cluster, I assume it's some kind of a cluster. So like, yeah can you talk more about what ah each of those CRDs mean?
00:29:06
Speaker
God, I think the best security is RedCloser. I think it's just a running cluster and it helps you to make sure that your cluster is healthy and the launch has node and the worker node and they come together.
00:29:23
Speaker
And it also supports the auto-scatter. Array has the native auto-scatter. And so if the workflow is, if the array scatter detects that the resource is not enough, it will notify the queue bridge to scale up the cluster.
00:29:39
Speaker
Yeah, and it also provides some like the, the full tolerance for our metadata. Yeah, so I think of that's how that's the recursive works and the typical it is many decide for law like a prototyping.
00:29:55
Speaker
Yeah, and and I think a sound user uses for the store multiple users to submit their job to a cluster, to a single cluster. ah Yeah, but I think currently it's that, I think multi-host, multi-tenant is still something that that we are currently ah focusing on. ah capitalize Currently it's that still have sound feature gap.

Ray's Job Management and Resource Efficiency

00:30:16
Speaker
Yeah, no, I think like even with the Ray cluster, right since it's like you it's a CRD, if I want to create a re Ray cluster for eight hours, I can do that and then delete the the the custom resource object from my Kubernetes cluster. and ah Because of the way Kubernetes works and control loops work, it will delete all the resources and somebody else can spin it up. So I know multi-run and Ray clusters are a huge feature because what I'm assuming that the ability to provision and de-provision resources on demand also helps.
00:30:45
Speaker
ah ah One question, right? Before we move on to Ray jobs and Ray service, ah the different AI libraries that you mentioned, Data, Train, Tune and Serve, all of those will work with Ray cluster or is there a specific customization I need to apply in the YAML file to use it for one ah one type of a library or another?
00:31:02
Speaker
Yeah, I think it will work with Ray Carcer if you if a customeror has the have a computer resource that you require for the for the computation workload. Gotcha. OK. No, I think that's that's awesome. And you said Ray itself, or Qubray itself, has like an auto-scaling functionality where it monitors utilization. And if this it needs more worker nodes, it will spin those up. does it do you Is this already a supported scenario? or um like if my cube-ray operator is deploying more and more of these worker nodes for my ah ray cluster, at some point, my Kubernetes cluster will run out of resources, right? So are you also working on integrating with projects like the carpenter project that AWS open-sourced or some other form of ah Kubernetes auto-scaling principles ah that that can help me make sure I have enough capacity on my Kubernetes cluster for these ray pods to be deployed?
00:31:58
Speaker
god god i think I think typically, when users use the Ray Auto Scaling that's in the... with the Kubernetes, I think typically they will also use like a carpenter or GKE autopilot to auto scale their Kubernetes node. I think there are several levels of Auto Scaling in the Ray. The first is the Ray Scatter will monitor the... like a Ray Test or Actor, and this will launch a new Actor or Task if, ah for example, like a a serving.
00:32:28
Speaker
Yeah, if the server has received a lot of requests, it will scale out a new ray serve replica, which is a reactor. And then if the ray scheduler found that ah it it doesn't have enough resources in the recorder to schedule this this actor, it will trigger the ray auto-scalar to tell the Kubernetes to create a new pod.
00:32:49
Speaker
And then I think maybe a GK autopilot or a carpenter found out that there's no enough resource for this Kubernetes his part. And it will trigger ah all scaling to schedule a new Kubernetes node. um So they are the different level for auto scaling. Okay, no, that's awesome. that that That's great to hear, right? That's a good question. yeah ah What about Ray job? like What does it do? How does it help users? Can you talk more about that now?
00:33:15
Speaker
I think for a realize that it is also best on the recursive CR under the hood. And now the other thing is that it will help you to measure your application lifecycle. For example, ah ah we also integrate with psycho ah sound kubernetes scatter like like we also integrate with stack queue or like a volcano or unicorn to support against scheduling and like a party scheduling.
00:33:42
Speaker
Yeah, so ah it will create a cluster and submit a job to the cluster, and it will monitor the status. okay And when the job is finished, you can have some configuration to maybe clean it up or oh wow but something. is I think it's pretty similar to the Kubernetes job API. To be honest, I think ah When I, I think I just want to make sure that the rate job is the have the same API with the Kubernetes job. So, I mean, I just copy like, oh, the Kubernetes job has this feature, I just copy to a rate job. You can see it's like, it's a manager, a cluster, and the Kubernetes jobs manager pod, yeah.
00:34:22
Speaker
Okay, no, I think, okay, that makes sense. So instead of, in when we were talking about ray cluster, I was thinking about users provisioning and deprovisioning ray clusters and the resources. I think ray job is a better abstraction layer. Like if I want ah my ah like ray to do something for me or run run ah some functions, I can just define it as a ray job. It will automatically provision the cluster underneath it and and ah remove all the resources once it's done okay. So what about ray service? Is it also an abstraction? Like how does that work?
00:34:50
Speaker
I think a red service also depends on the red cluster. First, it will create a red cluster and then deploy the server application. It will control which node can receive the request. Because it depends on the um because RedServe also has their auto scaling something like that and I think Qubre handles the routes of ah offer a request. okay And it also handles like for torrents of metadata because I think in online service you require the availability.
00:35:35
Speaker
Okay. Yeah. And the other side for rest of the handle is that it handle the upgrade. It currently provide a broken upgrade. Okay. Yeah. But so I think ah currently we are also have a lot of discussion about like a rolling upgrade.
00:35:49
Speaker
Okay. ah So a couple of questions around that, right? So, uh, let me start with like last and first out. So, uh, upgrades, like are Ray clusters or Ray jobs really that long running where I have to worry about upgrading the version of Ray? Like, oh ah isn't it like just or deployed once run, maybe it for eight hours, two days, whatever, but then it goes away. I don't have to worry about like,
00:36:13
Speaker
ah three month long run where I have to worry about like ray version updates. Is that a possible scenario and that's why we need some kind of blue, green or rolling updates? ah I think the current is that I think if for ray jobs that typical worry people don't worry about ah upgrade because ah because I think a single job is only for, ah in a ray job, I think a single cluster is for only a single job. Okay. So you you if you want to upgrade your cluster, you just need to send me a new job.
00:36:42
Speaker
yeah Yeah, and I think for RedCaster it's not because as I say a lot of we currently, I think we currently still have something that we need to support for the multi-tenancy. So it's important that it's for the single user for prototyping. Yeah. And for single user for prototyping, it's pretty easy to coordinate that to shut down your cluster and launch to a new one. Yep. Yeah, and ah for the RedService it's not because I think currently,
00:37:10
Speaker
Yeah, to be honest, I think currently the RAID doesn't have a very good story for upgrades. Okay. Yeah, to be honest, because I think the RAID head and the worker are very sensitive in the version mismatch. Okay. So you cannot upgrade like the worker part to maybe to a newer version. Yeah. And connect to the old version of

Ray Service for Long-Running Inference

00:37:32
Speaker
a RAID head. Yeah, you're basically creating a second cluster itself. And then yeah,
00:37:36
Speaker
Yeah, so we currently that's required to create a new Red Crosser. Okay. yeah So that's why we support a blueprint upgrade for now. Okay. And for race service, right? Is it like, the name makes me believe that this is for model serving? Is that the case? Like, and that might be the reason why it needs upgrades, because it's a long running thing, right? Once you have deployed an inferencing setup, you need that service to be around. Yeah, yeah, yeah. ah Okay, all right.
00:38:06
Speaker
Gotcha. Okay. and I think then then the upgrade scenarios make sense. Thank you. ah Thank you for walking me through all of that. ah Next up, I think I want to know if QBrace can help me deploy like LLMs on my Kubernetes cluster and use those as part of my applications that are running on Kubernetes. And the reason I asked this is we had folks from Microsoft who are working on the Kaito project, K-A-I-T-O, the Kubernetes AI toolchain operator.
00:38:29
Speaker
And they allow me to run ah not only Microsoft-tested and validated um ah language models on my community cluster, specifically AKS, but also like ah gives me the ability to package up my own custom fine-tuned models and and deploy those on Kubernetes. Can Qubray help me with that as well? Like if I wanted to download models from Hugging Face and and deploy it on Kubernetes, can can I use Qubray for that scenario? Yeah, I think it i think it's primarily for ah Ray question. And I think for Ray, if you use the... I think if it is if has it has a Python support, i think I think you can definitely use the rate to do that. Gotcha. Okay. and And then you can use the, and then you can use the rate surface separate and the QBrace provider deployment solution.
00:39:15
Speaker
Okay. Okay. Okay. It makes more sense. Like it is, it is like that platform layer. Like if I, ah for remote execution and distributed computing. So if I, if I, Python can do it, I can ah use Ray for remote execution. Okay. Yeah. And we have some example about like the VOM with the Rayserve on the the doc is currently on the review. Yeah. And they and I'm a currently also looking at the project and the S3 then.
00:39:43
Speaker
Yeah, I think it is also an alternative for the VOM. Yeah, and I think one of the good things for the Kubernetes is that I think maybe, I'm not sure, but I think maybe the current only solution for the to support like a different kind of auto-scaling, including multi-hose accelerator and the single-hose accelerator.

Cloud-Specific Tool Integration

00:40:11
Speaker
We currently support multi-hose TPU. oh wow but yeah and I think currently in the open-source world, we have a very flexible way to scale up the
00:40:27
Speaker
Are you also working on similar things for other clouds? I know AWS also has the Trenium and Inferentia chips, right? So, is the Kubernetes community also working on integrating with those? Yeah, I think the Ray work with, I think ah they are sound like a folk from the AWS to integrate with Trenium and Inferentia, that's always great. And I think it can ah it doesn't require any update from the Kubernetes site and it can work with all of us.
00:40:53
Speaker
um Okay, yeah but that's good to know. Yeah, but I think currently it doesn't have the multi-host oscillator. Okay. Maybe in my own current understanding. Okay. For sure.
00:41:05
Speaker
Yeah, no, thank you. Thank you for being honest and not making things up. ah Okay, so we we already spoke about um like the ability to use Ray for things like pre-training and fight fine-tuning. And with Q-Bray, assumption is since it's just a way to deploy things on Kubernetes, those operations are also supported. Is that correct? ah Sorry, can I start again?
00:41:26
Speaker
ah ah Can I use Qubray to pre-train LLM models or fine-tune on models on top of Kubernetes? Can Qubray help me with that? um Yeah, I think that this is... I think Qubray is a deployment solution and I think a great provider computing solution. Yeah, I would say that I think a sound user has pre-trained their model that on the...
00:41:52
Speaker
ah with Ray, yeah, but let me think. I think maybe I can mention that is that because it is a public, there are popular news is that I think OpenAI GPT 3.5 and GPT 4 are on Ray. Oh wow, that's that's good information, yeah. nice I think there are some public ah press about that. And I think OpenAI has been good in terms of talking about their usage anyways. Like I still remember like they did a session back at KubeCon 2018 or something around their use, even when nobody was, I think they were talking about how they trained GPT-1 maybe on Kubernetes. And I know they have a good tech blog. So ah yeah, I'm sure that that information is out there. Okay.
00:42:35
Speaker
oh and I think my next question was around day two operations, ah since we spoke about day zero and what those CRDs mean. ah But I think we all already covered things like upgrades, rolling upgrades, and the communities working on on ah rolling updates instead of just blue-green and then auto-scaling and how that works. ah Any other like day two operations, like the is the...
00:42:57
Speaker
Qbreak community involved in managing those container images or the images of the array cluster that get deployed and how how is security handled and secure ah security vulnerabilities and CVEs handled as part of that community? Is that all part of the community or is there a vendor that provides these images?
00:43:15
Speaker
i think I think for the QBrat Imagine itself, I think it's maintained by the QBrat community. okay I think we have some like a dependent part and we have some ah we have some partners ah in the community. I think they were around there like ah ah some kind of testing. and okay yeah i think we we yeah I think we focus a lot on this kind of stuff.
00:43:42
Speaker
Yeah. And, uh, I think for race, I said that we have the skills. I have the security in turn to focus on that culture. Okay. And how does, how do all these things work with.
00:43:54
Speaker
ah storage like this. So I don't know if you know this, but we started Ryan and I started as this podcast as like a community storage focused podcast and then set it like check name Kubernetes bytes. ah So we do bring around it, bring it around to like storage features,

Data Storage with Ray and Kubernetes

00:44:08
Speaker
right? So when I'm deploying array cluster or array job or array service, ah what kind of storage do I need? Is there like a certain PVCs that get provisioned? I know we were talking about like the array data library and how it ah stores and like It pre-processes data and stores it somewhere. ah When I'm using or doing those operations on Kubernetes, are these stored in like persistent volume objects and it just works because Kubernetes APIs work across any distribution? Yeah, yeah I think it supports every kind of ah Kubernetes API because we just expose the passback to the users. Typically, maybe I just use the
00:44:47
Speaker
maybe like a S3 bucket or like a GCS bucket okay on a different cloud or like sometimes we use it something like a share file system like the EFS. Yeah, so I think it's not a use case.
00:44:59
Speaker
Okay, but there is ah the possibility that if I wanted to use like read, write many persistent volume for that shared file system, I can use Kubernetes objects. Yeah, yeah, yeah, yeah. Okay, okay perfect. No, thank you for walking us through that. ah and Since you have been involved in this community, right, I also wanted to throw in the question around any customer examples that you can share.

Sensara's Use of RayServe

00:45:19
Speaker
I know I've seen a the talk that I saw from last year. It wasn't around Kubernetes specifically, just around Ray. I saw Uber as one of the customers using it.
00:45:27
Speaker
Do you have any any stories that you want to share of and a real-world example using Qubre for their ML workflows? god god ah I think maybe I can share some examples. For example, like ah maybe the first one is stla from Sensara. I think they also published their bug.
00:45:46
Speaker
about you they use the race serve to do a model serving. Okay. And I think, ah at first is that I mean, probably they said out they have a first they have some micro service, maybe be one for one for a golden and golden do sound like ah maybe a model selection and then send to a Python micro service to do some inference.
00:46:08
Speaker
And then the inference result will send to another golden and service to do ah some business logic. And then they use a race serve and the cube rate to write all the logic together in the rate. And then so they can ah reduce the overhead between the simulation and the simulation between the different ah microservice.
00:46:30
Speaker
Yeah. And they can share the resource between a different stage. And I think in the BRACs pro style that they set off for the cost about that. Okay, no, and that's a great use case, just streamlining model serving. So ah since you said this is like a public reference, we'll have like show notes, ah we'll have those links in the show notes. So people if people want to learn more about how they're using it, and I can include it in for people who to learn more about it. So no, thank you for sharing that, right? I think that brings me to my, ah like a couple of last questions. One is ah definitely like the last one, I want to keep it around, but I want to bring it up. Like, are there any things that I haven't asked that I should that users should know about around the Qubre project, or we have pretty much covered the basics here.
00:47:15
Speaker
i think a forlough I think the best way to just know the latest news is that just ah join the Raystack channel and yeah and I think ah I will monitor the channel and the some of our community folks will also monitor the channels and it's the best way to reach out to us.
00:47:34
Speaker
Yeah, and I think we are very active on that. Okay. and And I think maybe the second one is that maybe you can, ah we have a conference recently in the, I think maybe in September 30, it's a great summit in the asset. Yeah, so I think ah a lot of our users will give a presentation, ah give a presentation on the conference and that we will have host sound like the and as sun get together together to just meet with some like a contributor or a users together. Oh, that's awesome. That's like perfect timing for this podcast, right? Like it's three it's September 30th, you said, so people listening to it this week can actually go and register for it if they are in the Bay Area or if they want to travel to the Bay Area. Okay. Yeah. there's also
00:48:19
Speaker
ah In addition to the Slack channel and the race summit, are there any other resources, documentation links that ah you want people to go and and read through if they want to learn more about Qubray and Raycor? I think maybe you can go to ray.io. This is the Ray official website. Okay, okay yeah awesome. yeah And if you have any Qubray questions, you can just ah open an issue that's in Qubray password.
00:48:46
Speaker
Okay, perfect. Yeah, we'll make sure we include all of those links in addition to like the GitHub repo, the Ray.io and and the the conference registration website. Awesome. um Kai, if if you don't have anything else to add, I would like to thank you for joining me for this episode of the Community Spites podcast. and I had a great time learning about Ray. All I knew about Qubray and Ray was that one 20 minute session that I saw from last year. So I appreciate you spending this time with me. Yeah, cool. Thank you. Thank you for your invitation.
00:49:15
Speaker
Awesome. I think that was a great episode. I learned a lot. I loved how Kai went into all kinds of details around ah like ah how all the different Ray libraries work, what are all the CRDs. I know we ended up spending a lot of time diving into CRDs, but I think that's the best you can do in a podcast format when you're not sharing your screen and showing CRD objects. I think ah just talking through the different ah reasons of why have these different CRDs in the first place? What do they represent? How do they help a user or an organization? ah What does the provisioning and deprovisioning or cleanup process look like after the fact? How do upgrades and auto scaling things work from a data operations perspective? So I love all the level of details that we went into.
00:49:56
Speaker
I think Ray is definitely an interesting project. If you go to Ray.io, as Guy mentioned, you will see that they have a lot of these enterprises that are already using Ray. and Now it's just with the Cube Ray project. Everything is more streamlined. ah The Ray audience can now use the benefits from Kubernetes. so One of the things I remember from some of the pre-work that I did before the podcast was um Whenever administrators or IT teams were deploying Ray clusters for their data scientist teams, the underutilization of resources was a really big deal. right like They were deploying Ray clusters on GPU nodes and um they didn't have they did they were not being utilized for 24 hours a day. like They were just sitting around for the next tenant and the next tenant to maybe use it and and run some some jobs ah um remotely. With Q-Ray, I think the whole thing that you can solve is
00:50:44
Speaker
You only provision a cluster and the resources when you need it, when when your job is done or when you're um and when data processing or model training or fine tuning or serving, all of these things are done. Cubic can automatically de-provision all the resources that are deployed.

Closing Remarks and Listener Engagement

00:51:00
Speaker
So it helps with that resource efficiency. all All the same benefits that we talk about when you talk about applications running on Kubernetes, right? So I think there is there is huge value there. I'm excited for the race summit that Kai brought up. um I don't think I'll be able to attend in person, but I'm definitely looking forward to the recordings from that session and see what interesting stories, customer stories come up this time around. I did have ah an interesting record for us. so If you use ah the
00:51:27
Speaker
The promo code that we'll have in our show notes and with the link to register, it's on September 30th. We have like a 15% discount. ah So if any of our listeners want to learn more about Cubray or see how the community is ah using Cubray and Ray, and you are in the bay and want to attend a day-long conference, yeah, use the link. It's not something that's specific to Kubernetes, but it's just something that's specific to Kai. So I appreciate him sharing that with our listeners.
00:51:52
Speaker
ah but With that, ah I think those those were my key takeaways. ah I want to reiterate the fact that please make sure you share this with at least two more people. I think that's that's where how we can get to the growth stage that we expect.
00:52:05
Speaker
ah but Please give us five-star reviews. Subscribe to our YouTube channel. I know our YouTube audience is increasing bit by bit every day. Every day I see more listeners pop up. So that's like a dopamine hit that I get every morning when I look at the stats. So ah even if you're an audio listener, which is way more than what we have as our video ah subscribers, please go to our YouTube channel. Just search for communities bites on YouTube the next time you are on and hit that subscribe button. I think that will help us a lot.
00:52:31
Speaker
um ah getting getting feedback um from all of you is great. So if you have anything you want to share with us, go to our website, KubernetesBites.com, join our Slack community and just feel free to ping us or reach out to us. Now that's it. ah With that, like I want to, like that brings us to the end of another episode. I'm Bhavan and thanks for joining another episode of the Kubernetes Bites Podcast.
00:52:57
Speaker
Thank you for listening to the Kubernetes Bites Podcast.