Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Running multi-tenant Kubernetes clusters using vCluster image

Running multi-tenant Kubernetes clusters using vCluster

S4 E2 · Kubernetes Bytes
Avatar
1.6k Plays11 months ago

In this episode of the Kubernetes Bytes podcast, Bhavin sits down with Lukas Gentele, Co-founder and CEO of Loft Labs to talk about the vCluster project. They discuss how vCluster helps organizations run multi-tenant Kubernetes clusters, where each tenant gets access to a CNCF-conformant Kubernetes cluster, while still being able to deploy everything they need to build and run their applications on Kubernetes. The discussion goes into topics like how Security, Storage, CICD and GitOps work for virtual clusters running on the host clusters.    

Check out our website at https://kubernetesbytes.com/   

Episode Sponsor: Elotl

  • https://elotl.co/luna
  • https://www.elotl.co/luna-free-trial  

Timestamps: 

  • 00:59 Cloud Native News
  • 06:51 Interview with Lukas
  • 55:09 Key takeaways 

Cloud Native News: 

  • https://www.redhat.com/en/blog/krknchaos-joining-cncf-sandbox
  • https://cloudnativenow.com/features/red-hat-makes-idp-based-on-backstage-generally-available/
  • https://trilio.io/resources/trilio-announces-backup-and-recovery-for-red-hat-openshift-on-ibm-power-systems/
  • https://snyk.io/news/snyk-acquires-runtime-data-pioneer-helios/   

Show links: 

  • https://loft.sh/blog/how-codefresh-uses-vcluster-to-provide-hosted-argo-cd/
  • https://www.coreweave.com/blog/coreweave-and-loft-labs-leverage-vcluster-in-kubernetes-at-scale
  • https://www.linkedin.com/in/gentele/
  • https://www.vcluster.com/
  • https://loft.sh/
  • https://slack.loft.sh/
Recommended
Transcript

Introduction to Kubernetes Bites

00:00:03
Speaker
You are listening to Kubernetes Bites, a podcast bringing you the latest from the world of cloud native data management. My name is Ryan Walner and I'm joined by Bob and Shaw coming to you from Boston, Massachusetts. We'll be sharing our thoughts on recent cloud native news and talking to industry experts about their experiences and challenges managing the wealth of data in today's cloud native ecosystem.

Solo Episode with a Surprise Guest

00:00:30
Speaker
Good morning, good afternoon and good evening wherever you are. We are coming to you from Boston, Massachusetts. Today is January 24, 2024. Hope everyone is doing well and staying safe. It is going to be another solo episode for me. I do have a guest, an interesting guest for us to learn more about technology that's becoming really popular.
00:00:51
Speaker
But it's just me for the news and key takeaways today.

CNCF's New Addition: Kraken Chaos

00:00:55
Speaker
So let's start by talking about what's happening in the cloud native ecosystem. Let's start with an open source tool from Red Hat that got accepted into CNCF as an incubating project. The project is called Kraken Chaos and Kraken is spelled weird, but that's how it's pronounced. Kraken Chaos is a chaos engineering tool
00:01:16
Speaker
specific specifically built for kubernetes to improve its resilience and performance are there is a chaos engineering to write that are certain s scenarios that you can run against your cluster to verify that your applications will perform as you expect when they're running in production some of these test scenarios include things like.
00:01:37
Speaker
disrupting application parts, that's pretty simple. But then advanced use cases like sending specific kill signals to mimic failures, dominating nodes,
00:01:48
Speaker
stopping the node, stopping the cubelet running on a worker node, blocking ingress or egress traffic, consuming or hogging up the CPU memory and IO resources on a specific node, filling up a persistent volume to a specific percentage that you specify, introducing packet loss and bandwidth limitations, and many more features. Personally, I'm excited because some of these things do sound interesting. I want to test this out, and I'm glad that they
00:02:17
Speaker
donated it or made it part of the CNCF list of projects. So I guess looking forward to seeing a sticker of this project at the next KubeCon, right? That's what these projects get, in addition to all the other things.

Red Hat Developer Hub Launch

00:02:33
Speaker
Next up, continuing the thread to talk about Red Hat, Red Hat Developer Hub is officially GA, generally available.
00:02:41
Speaker
It was announced last year in, I think, May when Red Hat Summit was at one of the Red Hat Summit keynotes. It's based on the open source backstage project, but it adds a self-service dashboard, standardized software templates, a role-based access control, and ongoing support. One thing that popped out from the release notes is dynamic plugin capability, which makes it easier to install, update, and remove plugins from
00:03:08
Speaker
developer hub without having to rebuild the backstage environment so go check that out if you have a radar subscription or if you're running an open shift cluster this will just work on top. Next up, Trilio, a vendor in the Kubernetes data protection ecosystem announced support for protecting applications that are running on open shift but those open shift clusters can be running on IBM power systems as well so not just your Intel based
00:03:34
Speaker
OpenShift clusters, even if you're running it on IBM Power, you can use Trilio's data protection tool to protect your containerized or stateful applications running on top.

Snyk Acquires Helios

00:03:43
Speaker
And then finally, we do have an acquisition to share. Snyk, the Kubernetes security and developer security company that personally, it's really well funded, acquired a smaller startup called Helios.
00:03:58
Speaker
for an undisclosed amount. Looking at Helios's funding history, it seemed like they only raised a $5 million seed round back in 2022 and there weren't any other funding rounds available. So I think this is just a great exit for anybody who invested in the seed round. Snyk definitely is growing at a rapid pace and getting widely adopted. So this just adds to their portfolio.
00:04:24
Speaker
Helios helps developers troubleshoot and understand their microservices based applications in production. So what Snyk will end up doing is, according to their press release, they'll integrate features like the end-to-end application delivery service that Helios has and Helios' open telemetry-based runtime data collection tools.
00:04:48
Speaker
And both of these capabilities will be integrated into the Snyk app risk product. So if you're already using that, expect these capabilities in the near future. But congratulations to everyone at Helios and Snyk. This just means that there are different exit routes available for startups in the ecosystem. With 24 we want to see either more mergers and acquisitions or more IPOs rolling around. We have a lot of startups in our communities ecosystem that are ready for the public markets to open up and
00:05:17
Speaker
have some exit for the early employees.

Lucas Talks vClusters

00:05:20
Speaker
With that, let me introduce a guest and topic for today. We're going to talk about vClusters, and it's a really cool technology that offers multi-running capabilities for your Kubernetes cluster. And to talk about that, we have the co-founder and CEO of Loft Labs, the company or the startup behind vCluster technology, Lucas, who will join us and we can ask you more questions about how vClusters actually work. So without further ado, let's bring

Ilotl Luna Autoscaler Introduction

00:05:47
Speaker
Lucas on.
00:05:47
Speaker
This episode is brought to you by our friends from Ilotl. Ilotl Luna is an intelligent Kubernetes cluster autoscaler that provisions just-in-time, right-sized, and cost-effective compute for your Kubernetes apps. The compute is scaled both up and down as your workloads demand change, thereby reducing operational complexity and preventing wasted spend.
00:06:13
Speaker
Luna is ideally suited for dynamic and bursty workloads such as dev test workloads, machine learning jobs, stream processing workloads, as well as workloads that need special resources such as GPUs or ARM-based instances.
00:06:30
Speaker
Luna is generally available on Amazon EKS, Google Cloud GKE, Azure AKS and Oracle OKE. Learn more about Luna at illotel.co.luna and download their free trial at illotel.co.luna-free-trial. Hey Lucas, welcome to Kubernetes Bites podcast. Can you please introduce yourself for our audience and tell us more about what you do at LoftLabs?

Understanding vCluster Technology

00:07:00
Speaker
Yeah, absolutely. Thank you so much for having me. My name is Lucas. I'm the CEO and one of the founders here at Loft. And, you know, our, I guess, claim the fame at this point is our open source project, VCluster. A lot of people are actually more familiar with the name VCluster than they are with the company behind it. But yeah, Loft is the company behind it.
00:07:21
Speaker
Yeah, I think that was my first introduction to Loft as well. I found out about vCluster and then I saw vCluster by Loft Labs. And I think since you also have a different website for vCluster, it feels like its own thing. Yeah, we're definitely trying to give it some independence. As an open source project, as a community around it, et cetera, not everything needs to be commercial right from the start. So we felt like having that clear separation makes a ton of sense.
00:07:48
Speaker
Oh, I like that strategy. So let's go, let's talk about weak clusters, right? Like, can you talk about like, what are weak clusters and how they are helping organizations architect for multi-tenancy?
00:08:00
Speaker
Yeah, the cluster is essentially a Kubernetes cluster that runs inside a single pod, right? It can also run in multiple pods if you have more complex setups. But in the easiest case, the cluster is just a pod and it hosts an entire Kubernetes control plane inside of that pod. So instead of talking to your EKS API server, you can now talk to that pod. And there's another Kubernetes cluster inside of there. And the big benefit is you can essentially now create
00:08:28
Speaker
100 Kubernetes clusters that all run on the same EKS cluster without having to spin up 100 EKS clusters. 100 EKS clusters is very expensive. It is. You have a lot of redundancies. Just think about running 100 times Istio. That's very annoying.
00:08:45
Speaker
So, we're essentially telling you how about you run one ingress controller, one Istio, one cert manager, and then you spin up a hundred Vclusters in these pods, and then you give people access to these Vclusters instead of giving everyone their own EKS, AKS, GKE, or private cloud cluster, wherever you are.
00:09:04
Speaker
Okay. And I think the way you were describing it, it felt like the whole VMs to container description that we had in the beginning where instead of installing guest OS on each virtual machine, why not for an application that you want to run? Why not just package your application as a code and run all of them in this on the same VM?
00:09:21
Speaker
So you said you can run hundreds of these and I'm assuming these are CNCF conformant Kubernetes clusters. Like how big do the host clusters or the base cluster have to be to support these many Kubernetes clusters?
00:09:35
Speaker
Yeah, that can be pretty lightweight. I mean, a V-Cluster in the easiest scenario, in our getting started setup, you're using K3S as a super lightweight, single binary compiled Kubernetes Distro with an SQLite backing store for that V-Cluster. So it's as lightweight as it gets.
00:09:58
Speaker
Obviously, you can have that SQLite even in the ephemeral container storage and you have a super ephemeral Kubernetes cluster, but also you can provision a persistent volume and have the SQLite in there, which is obviously what we recommend. I think we have that enabled by default as well to retain the state beyond restarting the pod. That's just super lightweight. I think we have
00:10:24
Speaker
0.1 CPU as a recommendation to get started with to start the vCluster and I think like 100 megabyte of memory. So the foot point is super, super low to get started with the vCluster.
00:10:41
Speaker
Obviously, the more load you have on the vCluster, the more applications are running in there. You want to probably bump up these, just like with regular Kubernetes because you want to get more resources if the control plane, if the API server needs to reply to a lot of requests, etc. But it's fairly easy to get going and have a lot of vClusters in a relatively small underlying host cluster.
00:11:06
Speaker
Okay. Okay. I see the value, right? Like 0.1 CPU and 100 megs is not a lot of resources to get started with. So like, what does the process look like, right? To deploy these weak losses. I have an EKS cluster running right now. How can I get started with it? And, and, and what actually gets deployed inside my EKS cluster to get this weak cluster functionality?
00:11:26
Speaker
If you have an EKS cluster already and you have a valid cube context on your machine, so you run kubectl commands, you have access to the namespace and deploy things to that cluster, you can essentially get started with our CLI, which is really just in a way a wrapper around Helm.
00:11:44
Speaker
for the cluster. So the cluster is really just packaged as a home chart. And if the CLI is a little bit easier, because it just looks at your cluster and then makes certain configure certain help values, right, and helps you kind of upgrade the cluster easier, but you can just run the cluster create.
00:12:00
Speaker
And under the hood, what that's going to do, it's configure some of the Helm values. Of course, you can customize them to your liking. There's a lot of config options to tweak your vCluster and set it up however you like it. But when you just run vCluster create, we have some smart defaults in place, and it deploys that Helm chart to one of your namespaces. And then essentially, in that Helm chart, the two core pieces that it has is a stateful set
00:12:27
Speaker
that deploys the vCluster pod with the persistent volume to host the SQLite database and then a service so that the vCluster is addressable. And then what we typically do is you can either expose the vCluster via ingress to the public, of course, but you can also just connect to it via port forwarding.
00:12:50
Speaker
And that's what we do for a lot of local clusters, et cetera, to just get you started. So when you run vCluster create, we actually connect to the vCluster automatically and set it as your kube context. So let's say you're in the EKS context and then you run vCluster create, it'll drop you into the context of the vCluster.
00:13:09
Speaker
And then you can essentially run because the disconnect to disconnect from the cluster inside context and go back to your eks context but obviously you can also use, you know, Docker desktops like switcher to switch to context or kubectl command to switch to context or I think k9s allows context switching as well as like whatever favorite tool you have to switch your context should work as well.
00:13:34
Speaker
So you said it deploys a stateful set and a persistent volume and a service object, right? So that's my Kubernetes cluster. And if I'm a developer, can I ask for a kubeconfig file to directly connect to the vcluster as long as my administrator or whoever has deployed the vcluster has configured ingress properly and I can just access it as without knowing like it's not a full-fledged cluster.
00:13:57
Speaker
Yeah, yeah, we actually have, we put a, there's a config option you can set that automatically puts a secret inside that deploys a secret alongside that stateful set as well. And that secret contains the kubeconfig. And you can also give us a parameter and say, like, you know, set up that ingress with this domain and then put this in the kubeconfig. And then it becomes even easier to access the cluster.
00:14:26
Speaker
Okay. Yeah, that makes sense. So thinking about like, okay, you deployed your V cluster. What's next? Like how are the cluster wide resources handle? I know we started the call by saying we don't want to deploy STO a hundred times. How does STO know that this is a virtual cluster and the service mesh functionality and the communication, part to part communication will still work.
00:14:49
Speaker
Yeah, that's an excellent question. Because obviously, you still want to use this, right? Yeah. So do we need to now install it inside the vehicle? Well, then we didn't win a lot, right? I mean, we want that we don't have to pay for an extra EKS cluster and provision notes for it. That's already a benefit. And things are a little bit more dynamic, I guess, because notes can be shared and things like that. But a lot of the benefit comes from actually sharing what we call platform stack.
00:15:16
Speaker
And that's things like a cert manager, inverse controller, right? Or like Istio or all these like central components, logging, monitoring, compliance and security tools, like OPA, for example, right? You want to centralize them and certain things you want to make available inside the V clusters. Other things you don't want to make available inside the V cluster. Let me give you an example. So, OPA, for example, you want to deny privileged pods.
00:15:42
Speaker
You don't need any kind of OPA policy resources inside the vCluster because the user shouldn't be touching them anyway. In Ingress controller is different. You want users to be able to create Ingress resources, but you want the controller to live outside of the vCluster and centrally manage it so that you only need one load balance IP for that one Ingress controller instead of everybody deploying their own Ingress controller. What you should actually do in a vCluster is you don't allow people to create load balancer services in the vCluster.
00:16:12
Speaker
And then you deploy one centrally and make it accessible. And the way all of this works is with the synchro. The synchro is really the glue between what happens inside the vCluster and what happens outside an underlying cluster. So when you create a pod, for example,
00:16:29
Speaker
inside the vCluster, right? Or you create a deployment, right? That deployment has replica number one, and then, you know, one pod gets created, controller manager creates a pod. That basically just means in your etcd, or in this case, SQLite, right? You essentially just have one more entry in that data store.
00:16:47
Speaker
And then typically, next to the scheduler would pick up that pod that's pending now and would schedule it to a node to actually be launched. But in a vCluster, it's different. In a vCluster, we don't have a scheduler. Instead, we have a sinker. And that sinker has two kube contexts. It is connected to the API server of the virtual cluster, but it's also connected to the API server of the underlying cluster.

Network Policies in vClusters

00:17:16
Speaker
And what it does, it sees that part that needs scheduling, and then it copies the part down to the underlying cluster. And now that underlying cluster is going to schedule the part. And that also means when we're creating that part, it goes through the admission control loop in that underlying cluster. That means if we have OPA, as I said earlier, in the underlying cluster, these OPA policies get enforced across all your parts.
00:17:41
Speaker
despite them being created in different virtual clusters. And then what we typically do, we have also a feature that is easily enabled with one flag. It's called isolated mode, which by default sets up things like resource quotas, and it sets up things like network policies to make sure that, for example, one pod from one VCluster can't talk to another pod from another VCluster.
00:18:03
Speaker
Because technically, they are run by the same cluster, right? So there's cluster internal DNS and all of that stuff. And you want to obviously lock that down and isolate the V clusters in some extent. But then other things like OPA, you want to enforce it across all pods. And when you look at resources that should be shared, like an ingress controller,
00:18:25
Speaker
You can now enable syncing not just for parts that's a default thing that's syncing for parts is enabled cause that's how you can launch workloads so we need to enable that by default. But you can also say I'm going to enable ingresses. And that means the user can now create ingresses inside the cluster and they also get sent to the underlying cluster.
00:18:45
Speaker
That means the underlying cluster can run one ingress controller and can reconcile all of these ingress resources from the different V clusters because they all end up in the same underlying cluster ultimately. Okay, and that makes sense.
00:19:00
Speaker
deployments, stateful sets, or any of the compute resources gets deployed on the host cluster. Are there any restrictions or limits to the different resource types that can be deployed inside a V cluster? Like, can I deploy my own operators and create custom resources, or it's not supported inside that virtual cluster? Yeah, you can. That's actually one of the biggest benefits, right? So if you are looking on operators today, it's really restrictive to share a cluster, right? Because
00:19:30
Speaker
you're all listening on the same resources, you all have to have the same versions of these resources, right? Like it's the AD management is really hard in Kubernetes. But virtual cluster, the only thing at the bare minimum, what a virtual cluster and a real cluster need to agree on is what is a pod, right?
00:19:50
Speaker
So as long as we know that the pod spec is kind of the same, there's not that many breaking changes in pods happening. There may be a couple of fields being added. I remember, I think about a year ago, ephemeral containers were added to the pod spec. And then obviously, if your underlying cluster doesn't have that, but you create, specify ephemeral container inside the V cluster, then the question is, OK,
00:20:15
Speaker
what happens, right? And that means you can't use ephemeral containers if your underlying cluster doesn't support it. But that's very, very few edge cases where actually functionality gets added. And the worst case is that piece is not possible, right? Okay, but you can still deploy other parts without ephemeral containers. But all the other logic, right?
00:20:37
Speaker
CRDs and then operators listening on these CRDs and performing operations on that API level, they live entirely encapsulated in your vCluster. There's the possibility of sharing things. Like I said earlier with the ingress example, you can have with Istio, there are CRDs, you can enable syncing for them, for example.
00:20:57
Speaker
But then if you have one VCluster where somebody wants to roll another version of Istio, they could deploy their own Istio if you allow them to do that. It's pretty, pretty flexible. We've seen VCluster used a lot for operator development. Our product itself is all controllers, essentially. A lot of CRDs, and we're obviously working in a VCluster as well when we're developing the commercial product.
00:21:24
Speaker
Gotcha. Okay. So can developers or administrators, whoever is deploying these V-clusters, right? Select different Kubernetes versions for the virtual clusters while the base cluster may be running. So base cluster might be running 125 or 126, but can I try start using 129 to develop my app or test some changes out?
00:21:43
Speaker
Yeah, you can. So we are supporting four different Kubernetes flavors. The default setting is K3S just because it's super lightweight, really quick to get started. And for a lot of like Dev, pre-production, CI, you know, preview environments, firmware environments, those kinds of use cases, it's just perfect, right? Okay. When you're thinking about, you know,
00:22:08
Speaker
other distributions, what we also support is vanilla Kubernetes, of course, that's a little bit more heavyweight, because you have the different components, right, not packaged into a single binary. Then we also support K3, K0S, which is Mirantis is kind of like, you know, lightweight flavor of Kubernetes.
00:22:26
Speaker
Which may have some, I think they have some networking advantages over K3S, for example, right? That's a couple of improvements that I did on that end. And then I think the other distro that's really interesting is EKS. That's actually something that the AWS team has contributed. It was in one of their livestream sessions, right? I think it's containers on the couch. Yeah, I love that.
00:22:49
Speaker
Yeah, it's a pretty good channel that they got going there on YouTube, and I joined them there to talk about vCluster, and then we talked about the different distros, and we had just added K-series, and then the AWS team added a pull request like a week later, and they were like, we should have EKS supported as well, and we're like, that's so cool, right? Especially because they jumped on it so quickly. That was really fun to see.
00:23:10
Speaker
How would that work, right? EKS is a managed service, like AWS is managing the control plane for my Kubernetes cluster. If it's running as a V cluster, is it still supported somehow by AWS? Or they're just making sure that you get a flavor of EKS. Yeah, they have this open source kind of like, so they have this really big EKS Anywhere initiative. Yeah, okay, EKSD. Yeah, EKSD, I think that's the open source flavor.
00:23:35
Speaker
Gotcha. That makes sense. That's awesome. Good to see more adoption, right? Like you should jump on like streams for AKS and GKE next.
00:23:45
Speaker
Yeah, I need to check out their open source offerings if that would be possible. Maybe a good way to do that. But yeah, it's really exciting. I think we've been exploring the route of OpenShift and those kind of areas as well. There's so many distros, right? I mean, we already got K3S, so part of the Rancher ecosystem, I guess, is involved anyway. But yeah, we'd love a word of having more optionality with regards to what B-cluster you want to run.
00:24:15
Speaker
Gotcha. Okay. And then I know we were talking about sinker a couple of minutes back. Can a developer who's accessing a vCluster control what resources get synced down to the host cluster or that's an admin responsibility?

Resource Management in vClusters

00:24:30
Speaker
Yeah, that's for very good reasons, the admin responsibility, because we want the admin to be able to say what is allowed inside. Pretty much everything we always advocate for, some people start handing out these. They put up, you know, they don't make you cluster admin when they hand out the vCluster and they put additional RBAC and cluster white rows. And we typically tell folks, just don't do that. Let people do inside the vCluster whatever they want to do.
00:24:59
Speaker
But make sure the sync is configured so that what you don't want to actually be running in the underlying cluster doesn't make it through. That's usually the approach we're taking. So configuring that sync is really what we're seeing as one of the ways to lock down the cluster. And by default, it's pretty locked down what we actually allow. But then you can open it up to additional CRDs or additional resources that you want to add.
00:25:25
Speaker
Okay. And you brought up a good point, right? If I'm deploying cluster level resources, since this is my own virtual cluster, can I create cluster role bindings and cluster roles and those kinds of things. And the sinker will basically stop it from getting configured on the host cluster.
00:25:42
Speaker
Exactly. When you think about a cluster role or any kind of RBAC, that really just needs to live inside the vCluster because that's something that your API server handles. You have a separate API server. It doesn't need to reach the underlying API server. Of course, you could argue, whatever part has a service account mounted and does requests,
00:26:06
Speaker
But yeah, we wire up the part to also talk to the because this API server, right? So those restrictions would be in place for that part as well, despite it actually running in the underlying cluster. And then there are certain things that we do want to enable for folks that are not so tricky, that are more tricky, actually. So when you think about
00:26:26
Speaker
Network policies and let's say you want to allow somebody to network as we learned earlier like networking is something that happens in the underlying cluster right that's why we can have an ingress controller and our pods on an underlying cluster right so that pizza on the line cluster etc. So if we want to allow our you know tenants inside the cluster to be able to experiment with network policies we need to enable seeking for network policy okay.
00:26:55
Speaker
But we need to make sure that these policies now only affect the vCluster. So what we do is we actually have a standard synchro for this. Like for certain use cases, we develop these synchro mechanisms. So it's super easy to configure and you can't go wrong with them. But there's also ways to define your own custom logic and even write code to extend the synchro yourself.
00:27:16
Speaker
But for this case, network policies, we saw that as a very common case. It's a standard resource in Kubernetes. So we put some effort in getting it right. And what we're essentially doing, you know, one thing that's important to understand, we typically try to not create namespace in underlying cluster. So when you have 100 namespaces inside the vCluster,
00:27:36
Speaker
All of the pods that you're launching gets sent to a single namespace, an underlying cluster. So we have to rewrite the names, for example, right? We have to rewrite a lot of things in that process. And with network policies, when you set a network policy on namespace A to not be able to talk to namespace B, what we do is we rewrite that to label selectors.
00:28:00
Speaker
And what we do on these parts is we, instead of putting them in different namespaces, we put them all in the same namespace and then we label them. And we put the namespace in the label, right? And that's how we make essentially network policies work inside the vCluster, despite not allowing you to actually restrict any traffic on another namespace in the underlying cluster. It's a pretty interesting way to do things. I know that's, that's a super smart approach, right? Like inside a vCluster, I can have 10 namespaces.
00:28:30
Speaker
they get translated and they use labels. Okay. And who's configuring these things, right? Is it by default something that we cluster provides out of the box or is it something that needs to be configured? Yeah. So we have a lot of conflict options by default. Everything just works out of the box, right? You want network policies to be synced. You enable that. There's nothing else you need to do because we already wrote logic for that. Okay. Then if you want to tweak logic, you can do that.
00:28:55
Speaker
Okay. So it's really important that we are super transparent about what happens in these synchros and that's also why VCOS is open source, right? Like you can see what the synchro does and what really the effect of things is. And we have this plugin interface that allows you with hooks, et cetera, to kind of manipulate the process if you're saying, okay, I want the network policy rewritten, but I also want certain things to be stripped off. We see that a lot for nodes, for example.
00:29:23
Speaker
One of the questions I always get is like, what happens when I run kubectl, I get notes. Do I see the notes, all the notes, and I'm like, what do I see, right? Because the Vcursor doesn't have a scheduler, so it doesn't even need notes. It doesn't, right? That's the point, right? It doesn't need static note assignment. But for some applications that you want to test,
00:29:44
Speaker
You need nodes. They expect to see a node, right? So your admins can configure to sync nodes into the vCluster. And then you can tweak how they should be synced. So you can, for example, enable, show all the nodes from the underlying cluster.
00:30:00
Speaker
you can also say show no nodes, right? You can also say show only the nodes that apart from the weak host that's currently running on. Okay. Because let's assume your OPA or, you know, whatever automatically restricts everything from one namespace, you know, puts like taints, et cetera, in place to make sure, or node affinity, right, or node selectors, so that you end up on a specific node pool, right? Like these five weak hosters always end on that node pool.
00:30:28
Speaker
And then we would dynamically show you notes based on where your parts are located so you would only see those notes and then you can also enable obfuscation.
00:30:38
Speaker
And that way we rename the nodes, we change the spec, we show you different CPU memory than the reality looks like, right? And we do these things. There's a lot of like defaults that you can just toggle on and off like that obfuscation, right? You can say true and we just obfuscate stuff. But if you want to put your own obfuscation logic in place, that's where the plugins and the hooks come in place. If you want to do something custom.
00:31:00
Speaker
So if you are showing or syncing some of the nodes inside the virtual cluster, can a developer then taint or untaint some nodes or that's easily read-only, not like you can modify those? So typically we only allow read-only. That's what we set up. But you can also make the bidirectional works too, right? You just need to enable that in the sinker. But I wouldn't necessarily recommend doing that.
00:31:28
Speaker
For sure. That was just like a corner case, right? Like what if the developer wants to mess with the system and I don't know, make something or provide something. If you want to allow your developer to be able to do that for that specific node pool, that's the way you can do that, right?
00:31:44
Speaker
But yeah, that's not the typical case we see. We typically see the engineers more detached from the infrastructure, and then the admins taken care of all these kind of things. But there's definitely a possibility, you know, like, I think the, let's just, so for example, we've been working with a cloud provider called CoreWeave. They made a lot of headlines, you know, regarding...
00:32:08
Speaker
Exactly. They became these synonymous terms for GPU cloud. It's crazy. They made a lot of progress in 2023. And I gave a talk with Brandon, one of the key engineers there at KubeCon in Chicago. It was a lot of fun, that talk. And they're really open about the internals of the cloud platform. And what they essentially are doing is handing out the B-Coaster per customer.
00:32:37
Speaker
that really interesting concept because they realized GPUs are super scarce. You can't just give everybody CPUs and it's also very, very cost prohibitive in all of cases. When you have your VCluster, they have two options for you. They have a shared node pool where you can just launch workloads on.
00:32:58
Speaker
And then if you pay them quite a lot of money, then you get a dedicated node tool, right? And for example, for that dedicated node, they may enable that you can set labels there and you can taint it, right? Because it's your dedicated node. But obviously, the shared node tool, they want it to be read-only. So that's a great case to say they would probably enable bidirectional syncing.
00:33:24
Speaker
but they would put a plugin in place and add the custom logic, right? I mean, obviously, I don't know the exact implementation of what they did on that end. But I would assume that's a cool feature for people to be able to label your own nodes, to taint them, etc. But then make sure that doesn't happen for the other nodes.
00:33:42
Speaker
I know that's an awesome use case, right? Like if I'm a user for Core Weave and I want to deploy my apps or containers that need the access to GPUs, I can still do shared. And does vCluster somehow work with the NVIDIA device plugin and give you like specify how many GPU resources you can have access to or as soon as somebody wants to deploy a container that needs a GPU resource, the sinker will find the right node and automatically provision it there.
00:34:12
Speaker
Yeah, so what the Synchro does it just so their GPUs are part of a real Kubernetes cluster.
00:34:22
Speaker
And the sinker just sinks there. And then the scheduler, and I think they have a very custom scheduler as well at CoreWeave, that then decides where does that workload actually get scheduled on, which GPU is used, et cetera. Obviously, that's internals on there, and then probably above my head too. That's not how that works.
00:34:43
Speaker
on the networking and hardware level, whatever needs to be done on that end. But yeah, they gave us a glimpse in the talk. It was quite fascinating to hear what they're doing. Nice. Yeah, I need to find somebody, some contact at Core. We even have them on the board as well to talk more about this. Sounds like an interesting use case. So like continuing along the weak cluster questions, right?
00:35:07
Speaker
Security is like a major concern. I know we spoke about network policies, how they get implemented, but just keeping it simple from a perspective of like CVEs and containers that might not be super secure. How do admins control what's deployed or if they're using some open source tool like an Aqua security trivia, right? Can they still monitor all the container images across those 100 V clusters if they're connecting it to the host cluster?
00:35:36
Speaker
Yeah, they can. So when you're thinking about any kind of tool that, you know, scrapes things from apart, whether it's the image to, you know, statically scan it, or if you look at extracting the logs with something like a data dog, right? Or kind of a monitoring tool to check the metrics, right? What we typically recommend doing, obviously, you could, you know, set a lot of these tools up in each cluster. But again,
00:36:06
Speaker
Typically not what you want to do. What you want to do instead is run them in the underlying cluster. So you have like one data dog running or one image scanner running. And then the only difference for you as an admin or for anyone who has access to this tool to actually get the data and the analytics and be able to query.
00:36:26
Speaker
what data has been ingested. The only difference is, instead of looking on a per cluster, like you set up your views per cluster, you now set up your views per namespace. And you say, one VCluster is one namespace, and it becomes really easy. And one thing that we do is we rename, as I said earlier, we name that parts. And you may want to know also, what is the namespace inside the VCluster? And what was the name of the VCluster? Where is this part coming from?
00:36:55
Speaker
And we have all that information in labels and annotations. So you can just set up your views to be able to filter by labels and annotations rather than by clusters and namespaces. And that way you get pretty much the exact same visibility.
00:37:11
Speaker
Okay, no, I think that that's that makes sense. Like I can visualize it right now. Like security is one big thing. Another thing again, this is Kubernetes bytes, right? We started as a storage podcast.

Persistent Volume Claims in vClusters

00:37:23
Speaker
What about storage resources? Like if I'm deploying on EKS and I have the EBS CSI add on configured on the host cluster, how does storage provisioning work and how like who owns those resources?
00:37:37
Speaker
Yeah, that's an excellent question. So typically what you would do is in a regular Kubernetes cluster, just as in a V cluster, you would create a persistent volume claim, right? Yeah. And that persistent volume claim now needs to be fulfilled and it specifies like a storage class, right?
00:37:53
Speaker
Obviously if you want your cluster to be able to use that storage class you gotta make it available to the cluster similar to if you want your resources like if you want to see these to be used inside the cluster but i gotta exist in the cluster right.
00:38:08
Speaker
But we also have mechanisms to initialize a VCluster with battery-loaded resources. So when it fires up, you see that these are present, that storage class is available. It's really interesting because you could have 10 storage classes in an underlying cluster, and you could say, this VCluster can use storage class A and B, but these VClusters can use C and D only.
00:38:30
Speaker
It's kind of nice to also be able to restrict people that way in terms of they don't even see the others. You could obviously do similar things like with Opa, but then people would potentially see these storage classes, but can't use them and find out later. It's actually much nicer to say, hey, you can only see what you can use. And that way, when you create that persistent volume claim and have it attached to your pod, the pod gets synced, and you would also enable syncing for the persistent volume claim.
00:39:00
Speaker
Okay, and then the underlying cluster has the storage controller that then says like, okay, I see a persistent volume claim, I got to create a PV rate to actually provision the persistent volume. And then that one gets mounted to, to that part that I have, right.
00:39:16
Speaker
Yeah, because it's not really a virtual part. It is a part running on the host cluster. Exactly. So all the mounting mechanism, that's why we're so compliant, right? Because having syncing as that mechanism just makes things so much easier. Initially, we were piloting things and we were trying around what's possible. And you've probably played around with things like Docker and Docker and that kind of stuff.
00:39:41
Speaker
That just gets so difficult and ugly, right? Because like, you know, you have the performance here, then you have scalability limits, and a lot of things are not supported, right? So many restrictions, right? And then with the simper, well, it would be like, you know, just imagine you want to share Kubernetes cluster, and you want to have 100 tenants to share it, right? The difficulty today really is you can't
00:40:08
Speaker
You typically confine people to a namespace, and you've got to lock them down really hard in terms of permissions. But then they also want to use a lot of objects. So the complexity is really high, right? In a V process, it's really easy. You've got to lock down a very few basic resources.
00:40:26
Speaker
persistent volume claim, right? Like, it's not maybe network policy, you know, like basic resources, right? But you don't have to worry about like, when they run ARGO CD inside, or they test the machine learning framework, or MySQL kind of controller provisions databases.
00:40:44
Speaker
They can do that. It doesn't matter to you, right? And they can create namespaces, they can set RBAC, they can add an admission controller in their vehicle. So you know, they can do what they can add in controller, right? Like all of these things are possible now. So you're really elevating their privileges. But at the same time, you're reducing the complexity to lock down that environment, right? That's the beauty of the syncing mechanism.
00:41:08
Speaker
And then you brought up Argo CD, right? And that's one of the questions I had. Like, if I'm building, I'm following GitOps frameworks or patterns and automating my deployments, how does vCluster integrate with tools like Argo or Flux? Do you have runners or something running on your vCluster that always are looking for information? How does it work for CD pipelines? Yeah. So I think when you're talking about a vCluster, the vCluster itself, as I said, is a home chart and it's configured with
00:41:38
Speaker
a value siamo. So the vcluster itself can be managed with org-rcd. And then obviously you can have the vcluster as a regular cluster as a deployment target in org-rcd so that when you have your application set, so whatever you're creating an org-rcd, your applications are going to target that vcluster now instead of targeting the real cluster.
00:41:59
Speaker
And that's just going to work, right? Everything that Argo does is going to work. Whether it sees it as a remote cluster, or Argo itself runs inside of the vCluster, we've seen pretty much everything at this point. I think Codefresh actually, they are a managed Argo offering. They're using vCluster. They're very vocal about this. I think in KubeCon Europe and in Amsterdam, they gave a talk about how they use us in production for their managed offering.
00:42:28
Speaker
It's really fascinating to hear these kind of stories. They actually ran without us, like we didn't even know that they were using it in production. I usually... Those are the best customer testimonials, right? You didn't even ask for it. Exactly. And then we saw the talk on the schedule and obviously showed up. Some of my engineers didn't get in because it was really packed, right?
00:42:50
Speaker
everybody in the room, but it was really fascinating to hear that. Typically when people want to use vCustom production though, I do have to say a disclaimer. I recommend reach out to us first because we can probably share some advice so you don't have to run into issues as you scale out.
00:43:07
Speaker
No, it also touches on, you know, what we're doing on our commercial product, right? Because, for example, your commercial product has an Argo integration. Okay. And what it does, it provides you CRDs to manage Vclusters. It also has a Terraform integration, so you can spin up Vclusters via Terraform, for example. We are working right now on a cluster API integration. So you can spin up, you know, Vclusters with cluster API through the commercial product.
00:43:34
Speaker
And then what the commercial product does with Argo, you can, with just one true false kind of statement in the definition of the V-cluster, you can say auto-connected to Argo instance. And that means if a developer launches a V-cluster, it automatically shows up in Argo, and we even sync the permissions. So you can say, because our system is hooked up to your SSO, and an Argo should ideally be hooked up to your SSO.
00:44:01
Speaker
Argo can even be hooked up to us and then we proxy the permissions to your SSO. So we kind of become an authentication proxy and a YDC provider for Argo. But in any case, we know what the users are and they're the same in Argo and FOSS and the same groups, etc. And then you can essentially give permissions with our product.
00:44:23
Speaker
and they reflect in Argo, which is really nice. So you spin up a V cluster and it's automatically available to your teammates, right? That's essentially what, you know, pieces of what we're doing in the commercial offering. Okay.
00:44:36
Speaker
Gotcha. And talking about all of these things, all the different aspects, I have two more questions,

Building Developer Platforms with vCluster

00:44:44
Speaker
I think. One is, definitely, we can talk about that later, like what's next for vCluster. But then with the whole platform engineering mindset and inside organizations, do you see people building their IDPs using vCluster or spinning up vCluster? How does integration work when it comes to platform engineering?
00:45:05
Speaker
Yeah, 100%. I think we're, you know, with our commercial offering, we're super enterprise-y, right? So we work with some of the largest companies in the world and they really see, they really have the struggle of, you know, they may already have 200 or 500. The biggest one we've spoke to so far has over 3000 Kubernetes clusters. Wow. Okay. The scale in the enterprise is insane.
00:45:31
Speaker
And I think the reason why they have that scale is there was no good way to share a cluster before the cluster. So what they did is what the cloud providers were advocating for, just spin over another EKS cluster. It's super easy. We're reducing the time, right? And that's all great advancements, right? But it's very, like a lot of these clusters run either all the time.
00:45:52
Speaker
Because the resources are not shared. There's an idle Istio running 500 times on a Saturday for all these dev clusters that they spun up. And that's a big problem. And it's also really hard to tell if this cluster is still used or not.
00:46:08
Speaker
How do you know, right? Like you can look at API traffic, et cetera, but like the Istio is always going to do something, right? So it's like, it's really tricky for us to say like, what's inside the VCluster is application specific. What's outside is the general platform and then sharing that general platform, but for all your tenants, it's just so much more efficient. And I think with that model, we really allow platform engineers to build that self-service Kubernetes experience that they want to set up. You know, the scariest thing I've heard so far.
00:46:38
Speaker
is not just the numbers of virtual clusters these guys have today, but where they're heading.
00:46:44
Speaker
Yeah. Because they see, when you ask them how many clusters have you spun up last year, you find out, oh, the number of clusters has doubled. It's kind of like, so it went from 200 to 400. And then you ask them what's going to happen next year. And they're like, we don't know, but probably a similar trajectory. And then we ask them, okay, what if you had no restrictions? Would you give, you know, would you spin up a Kubernetes cluster for every developer? And the answer would probably be yes.
00:47:12
Speaker
Okay. And then think about a big organization of 10,000 developers. Well, that's a lot of clusters to spin out. And you can even take it further. Sometimes we ask, would you spin up a separate cluster for CI run to get a clean cluster? There are people who do that. Yeah. There's a lot of clusters that you have to spin out, right? Yeah. V cluster, that's just much more efficient. So like, we're really enabling, I think,
00:47:35
Speaker
the broad adoption of Kubernetes for all of these use cases while reducing the cost and standardizing the way on how to spin up, how to dispose, and how to manage the life cycle of these V clusters. That's ultimately what partner engineers do. On the one hand side, they standardize things like an Istio and access to storage and all of these underlying things. But at the same time, they want to enable
00:48:00
Speaker
service and autonomy and velocity what better way to do that by saying hey you can be cost admin. That's essentially what we work on with a lot of these companies. Okay so now let's talk about what's next for the open source weak luster project looks like you have most of the areas covered so what's coming down the pipe.

Future of vCluster Features

00:48:19
Speaker
Yeah, there's a lot of stuff still in R&D. So I say, we're still super early. It sounds like, you know, I mean, we definitely put in a lot of work in the past. Like, you know, it's been almost three years at this point, three or four more months, and then we have a three year anniversary for a four-week cluster. But, you know, I think that there's still a lot of areas that we find super interesting in R&D. One of them is, for example, to snapshot an entire V cluster.
00:48:46
Speaker
and move it to a different cluster, even a different cloud. That's a super cool idea. Obviously, there's some really tricky challenges, like how do you make, for example, how do you snapshot the persistent data alongside of the application data? How do you find the right time to take that snapshot to not be disrupted or have anything corrupted?
00:49:12
Speaker
There's changes in that direction, but that's a really, really exciting thing because then you can see a world where, you know, when you install, when you, like the default way to install an application these days is a Helm chart. But a Helm chart is like, that's a lot of images and a lot of containers and a Helm chart is relatively complex.
00:49:35
Speaker
And then you have persistent data and a lot of like infrastructure stuff below it. But if I now, you can't capture a state with Helm. So if I have a problem, like let's say I installed GitLab on my Docker desktop Kubernetes cluster. And then I run into issues with that GitLab instance. Can I send that to my body? No, I can't, right? Like they can install, you know, we have to reproduce the whole issue.
00:50:00
Speaker
But what if i could just snapshot and send it to my body and that's really cool so packing up a cluster and unpacking it somewhere else allows a lot of these debugging use cases but also cloud migrations let's say you have get lab running in this one cluster and then you're saying.
00:50:16
Speaker
Ah, maybe we got to reset that up and maybe we're going to move to EKS instead of cops or whatever you use to spin up the cluster initially, right? And then, well, that's a big migration project suddenly. But what if you could just snapshot that because of GitLab in it and move it to somewhere else, right? That's a really cool thing. Yeah, these are some interesting use cases for sure. Man, I'm looking forward to, like, I don't know, I'm going to follow a weak luster channel to see what's going on.
00:50:43
Speaker
Okay, so like one more one question that we have added in season four, this is our fourth year doing it, right? Like, how are you thinking about AI inside loft labs with weak clusters? Like, in general, like are you and this answer can vary anywhere from GitHub co pilot for developers to you're actually like adding AI capabilities inside the product. Yeah, I think we're not too close to AI on the application level.
00:51:10
Speaker
But we see ourselves kind of as an enabler for that AI wave. When you think about a query, for example, using us to bring more Kubernetes and more GPUs to these AI engineers, I'm very proud that we contribute to that success that queries have in there. But also in a lot of other cases, when you think about the need for Kubernetes in an organization.
00:51:38
Speaker
If you are running, obviously you can always run like a local kind cluster, Docker desktop, Rancher desktop, any of the solutions, right?
00:51:47
Speaker
it's probably really hard for you to run like a large language model or train any kind of data on your local machine. So there's suddenly a really big need to be able to get access to a large scale distributed, you know, system like Kubernetes about right. But then you have things like EKS, etc. And then you're running into the problem again, like how can I you hand out that to everybody, right? And that's where we cluster comes in. It's like because it can be that enabler either within your organization,
00:52:15
Speaker
But also for folks like the core we have that have like a managed service for others to hand out these resources to folks. So we're a little bit below on the info layer on the whole AI story. But yeah, it's something really exciting for us. And you know, apart from that, obviously, we're, we're trying a lot of AI tools also for all kinds of, you know, meeting summaries and knowledge bases and like,
00:52:40
Speaker
You know so many advances coming out i think one of the biggest things i've seen so far is automatic summarization of issues people tend to send us like really big messages of like hey i'm trying to do this in the request etc and then we gotta figure out you know.
00:52:55
Speaker
you know, who do we sign this best to, right? But like, usually, it takes a while and make the triage process a bit simpler. Yeah, exactly. Now we can triage based on the summary first and then have somebody look in in a second step, right? Those kind of things are just leverage, you know, like chat CPT and other things like internally. Okay, one last question, where can find people more information about weak clusters getting started any tutorials that you want to share with our listeners?
00:53:22
Speaker
Yeah, vcluster.com is the website. Pretty straightforward. You'll obviously find the link to the Git repository from there. You'll also find our vcluster pro offering from there. Obviously, it's linked on that side. And then if you want to join us, we have a very active conversation going on in our Slack community. So you can go to slack.loft.sh. It should also be linked from pretty much any website in the Git repository.
00:53:51
Speaker
But that's a really good address to, you know, if you if you run into any kind of issues, if you're thinking about using because of a specific use case, and you want to see where else has done it and what they would recommend, right? Or obviously, if you, you know, want to contribute, but you don't know what, what is right starting point, well, just join us on Slack, open up the conversation. It's really vibrant there. We have like, I think over 2500 people in a Slack community already. And
00:54:19
Speaker
Yeah, it's so exciting to see. I think right now there's like five or six members joining every day. So it's just growing so quickly. It's really, really exciting. So definitely check out our Slack if you're interested. No. And we'll include all of these links that you just shared in our show notes as well. So hopefully people can find those easily. But I want to take this time and thank you so much for joining us. Thank you, Lucas.
00:54:44
Speaker
This has been a great discussion. I learned a lot of new things. I'm sure our listeners did too. And yeah, with that, I'm looking forward to having you on maybe in a future episode when you have more features and maybe that snapshot functionality is working. So thanks again. Yeah, absolutely. Happy to join again as we go further along in the vCluster journey. It's so great to be invited to Kubernetes Bites. Thank you so much.
00:55:09
Speaker
Okay, that was a great conversation. Hopefully you guys learned something new. I'm sure I did. Let me just do my few key takeaways and we can wrap this episode up quickly. First of all, vClusters, a great open source project that you can start using today on any supported or any Kubernetes distribution. The thing that I really liked was the really small footprint, like to get started or to deploy a vCluster.
00:55:32
Speaker
The the whole orchestration capabilities that we cluster brings to get started you just need like zero point one CPU and hundred makes of ramp. That's nothing when you compare it to some of the application resources that are being consumed by parts that don't really need those resources right so.
00:55:48
Speaker
great way to get started test this out in your own environment. I really like that it supports things like OPA, network policies, ingress, security scanning tools and storage primitives as well. So you don't have to change or adopt new tools just to use the vcluster functionality. You can just manipulate the sinker functionality to see what gets translated or transmitted from the host cluster to the virtual cluster and back
00:56:12
Speaker
uh the other way around as well uh weak clusters uh but another important thing to note is like if developers inside their own weak clusters wanted to test resources like operators and custom resources they have the ability to do that without creating conflicts on the base cluster
00:56:29
Speaker
I think that also popped up was different version support for the base cluster and the virtual cluster. So as we discussed in the episode or in the interview, you can have a 1.26 Kubernetes base cluster, but you can start testing out the 1.29 version of Kubernetes as part of your V cluster.
00:56:48
Speaker
And then I really liked how Lucas was able to bring in customer examples like CoreWeave and Codefresh. Both of those seemed interesting. I'll also link those blog articles and those sessions in our show notes as well. So feel free to like or make sure you go and check those or read those out or read those for yourselves.
00:57:10
Speaker
But I think that brings me to the end of Key Takeaways for today. Hopefully this was a good episode for you. Thank you for spending the time with us. Make sure to give us a five star rating on anywhere where you listen to podcasts. If you haven't already checked out our YouTube channel, you can see interviews on video as well. So even if you prefer the audio format, which I know 90 percent of you guys do, but feel free to just still go and hit subscribe or like our YouTube channel so we can we can find new listeners.
00:57:40
Speaker
With that, this is Bhavan and thank you for joining us for another episode of Kubernetes Bites.