Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Kubernetes COSI 101 with Sid Mani image

Kubernetes COSI 101 with Sid Mani

Kubernetes Bytes
Avatar
305 Plays2 years ago

In this episode of Kubernetes Bytes, Ryan and Bhavin talk to the SIG Storage COSI Co-Lead Sid Mani about the Container Object Storage Interface (COSI) project, as it enters the Alpha phase of the maturity cycle. The discussion dives into the need for a different Object Storage standard, how it works with Kubernetes, the vision of the community, and how people/vendors can contribute to the ecosystem.

Show links


Recommended
Transcript

Introduction and Podcast Overview

00:00:03
Speaker
You are listening to Kubernetes Bites, a podcast bringing you the latest from the world of cloud native data management. My name is Ryan Walner and I'm joined by Bob and Shaw coming to you from Boston, Massachusetts. We'll be sharing our thoughts on recent cloud native news and talking to industry experts about their experiences and challenges managing the wealth of data in today's cloud native ecosystem.
00:00:29
Speaker
Good morning, good afternoon, and good evening wherever you are. We're coming to you from Boston, Massachusetts. Today is August 16th, 2022. Hope everyone is doing well and staying safe.

Hosts' Personal Experiences

00:00:41
Speaker
Let's dive into it. Bhavan, how have you been? What's up?
00:00:45
Speaker
I'm doing good. Just enjoying the Boston summer. Now that the heat wave is gone, it's more pleasant. This is the Boston summer I remember from for the last couple of years. I made my way up to the Cape. I've been to Cape, but I always get stuck in traffic and take a detour and end up at Dennis, Massachusetts. So people who are looking it up on Google Maps, that's the start of Cape, maybe a couple of towns in.
00:01:10
Speaker
the analogy is the arm right it's kind of the bicep yeah and the cape is down by the way not up okay sorry no no but but but like i went from the bicep and all the way up to the finger like i went so i did like did go up
00:01:26
Speaker
But no, I made it to Province Town this time, even though it took like three hours to get there and maybe two hours to get back. But it was good. Yeah. That town in itself has a different vibe. Cape is laid back. You have beaches. But then Province Town is, again, you have a lot of bars and you have a lot of restaurants right on the water. So I really liked it. I haven't actually been there. I've always kind of wanted to make it out there.
00:01:53
Speaker
Um, uh, and maybe this is your sign. I feel like it is, must be my sign to make it out there. I'll try to do it when there's not a lot of traffic. I'll probably do it after. Yeah. After limiting. Yeah. Exactly. How about you? What have you been up there? Yeah. Uh, again, also enjoying, uh, uh, the, I'll say it greater Boston summer. I'm not as close, um, and doing a lot of sort of, uh, uh, hiking, trail riding, those kinds of things, um, into, you know,
00:02:23
Speaker
sort of trail biking, mountain biking, motorcycle biking, all those things. There's not many places to go in Massachusetts. I've been having to do a lot of research to find the right places that are legal to ride in Massachusetts, but I'm getting excited and doing some sort of practice runs around here for a bigger trip that I'm doing in Vermont, which, you know, they had just have dirt roads everywhere, which it's going to be a lot of fun. So that's, that's been my sort of enjoyment lately, which can't complain about too much.
00:02:51
Speaker
Oh, yeah. Nice. All right. So we have great

Startups in Cloud Native Space

00:02:56
Speaker
guests on the show. But before we introduce him and get going, we do want to cover.
00:03:01
Speaker
this week's Cloud Native Storage News. Why don't you kick it off for us, Bhavan? Yeah, sure. So I do have a couple of startups that I want to talk about. The first one being Acorn Labs. Acorn Labs was started by the original set of rancher co-founders, like four people who started rancher. Now after the acquisition by SUSE, I think they left. And now they have another venture or another startup. This one looks like self-funded because they didn't have any funding news or anything like that.
00:03:29
Speaker
Maybe they had made enough money from the the rancher acquisition, but they have an interesting take on deploying applications on Kubernetes clusters. I think similar to how people have people know what Docker files are and how they help you create a container that can be deployed anywhere.
00:03:46
Speaker
And then if you look at alternatives for Kubernetes, you have something like a head chart, or if you have operators and custom resources that package things for Kubernetes. Akon Labs has an interesting take on this. They have something called as an Akon file, which is like a Docker file, a kind of a concept. It is one single file, but you define everything in that one single file. You don't have to use the words pod or persistent volumes. You just define what are the different components in your app, where to pull the code from,
00:04:15
Speaker
what ports to open for different components in the app to talk to each other, what storage to provision. And then you build it like a Docker file, you post it and then you can deploy it against any Kubernetes cluster. And it automatically deploys secrets for you, persistent volumes for you, pods and service objects for you. So it takes away all the complexity of deploying these or understanding these Kubernetes resources and deploying it across different clusters.
00:04:40
Speaker
You have one acorn file and you can deploy your app anywhere, which includes all the different components. So that was pretty cool. Again, it looks like something like which is still early stages. Like I only found like one online meetup demo. I think it's a two hour long thing that Darren did. But seems like a good direction, right? Like something new in the ecosystem.
00:05:00
Speaker
I think so. And developer tooling is so important, right? And being able to hit the right spot there. And hopefully, they're taking note on what things like Helm doesn't do well and how they can improve on that. I know when Docker and Docker files were around, when Docker Compose and those files came around, it leveled up how you built an application and connected all the dots. And it's so powerful when you get that developer tooling right. So I don't know anything about it. I'm going to have to look into it myself.
00:05:28
Speaker
Yeah. And then the second startup that I wanted to discuss was called Ghost Security. Again, they just emerged from stealth and announced like a $15 million funding round with a $15 million valuation. So that's the numbers, but they are going after the application security and the API security market in the cloud native ecosystem.
00:05:51
Speaker
and that's all the details that I have. I tried going through the website, I tried reading through the couple of blogs they have, and this is all that they say. Since they just came out of stealth, I think they want to hire using the new funding that they just bought and then actually build a product and sell it to customers. But I guess it's a startup that we can keep an eye out, see what happens. We all know that Kubernetes security and anything security nowadays is really popular, so we have another startup to try.
00:06:17
Speaker
Yeah, not surprising that we're going to probably see more companies come out of stealth in the security space. And it just makes so much sense. Yep. Cool. And then the third wasn't a startup. I think it's a free tool that I wanted to highlight for our listeners. If you are running workloads or applications on Kubernetes, worried about costs, maybe I know we have discussed like Kube cost and their open source project granulate, which is, I think, an Intel company.
00:06:45
Speaker
did open source a free tool for optimizing community's cost and they call it G Maestro. So it helps you reduce costs and apply horizontal pod auto scaling, working through the CPU and memory request changes inside your pod file. So the only reason I highlight is I know there are like paid alternatives. This is open source. Maybe you can get started with this and see if you can save some money while running communities.
00:07:11
Speaker
Yeah, tools like this are so valuable. I mean, I know I've used one from VMware in the past, and just having the insight of sort of everything in your account that's costing you money, right? Because it's easy to lose track of those things. And I know, especially even in companies who are innovating fast and moving to the cloud, the first goal is to innovate fast and
00:07:35
Speaker
It's usually secondary that you say, wow, I'm spending millions of dollars, right? So having free tools, always a benefit here. I'll check it out myself. Really cool.
00:07:46
Speaker
All right, so I just had a couple of articles I wanted to mention here. I know we discussed Kubernetes 1.25, I think, in the past. But Sysdig did an awesome blog of sort of what's new. They always kind of come out with these kind of blogs. But I will link it in the show. There's a lot of things storage related in there, deprecations and CSI migrations that's moving forward officially. So definitely check that out. It's going to be on Sysdig.com on their blog.
00:08:17
Speaker
with the plug kubernetes-125 what's new. And then I had a couple articles here. One was an intro to EBPF. I know we did a podcast episode around EBPF and it was actually still one of our more popular episodes. So I wanted to drop this link here because I thought it was a really good explanation of what EBPF is. The concept of it technically I think is
00:08:45
Speaker
sometimes a little hard to really understand of what's going on. So all this type of content is super valuable. And then the second one, also kind of in the same light about Kubernetes volumes explained for beginners, which is on dev.to. It's an article there which does a really good job of kind of walking through, you know, Kubernetes volumes, what they are, what the purposes are, and they do have a nice kind of example of diagrams and
00:09:15
Speaker
what it does with local storage versus remote storage and those kind of things. So definitely go check that one out too. And that's really all I was going to include for today.

Interview with Sid Mani

00:09:25
Speaker
So why don't we jump into the main show? We do have a really great guest.
00:09:30
Speaker
Sid Mani, he is a member of the technical staff at a stealth startup, Alcion.ai. He was previously an engineer at Mineo. And before that, he was one of the first engineers at Rancher Labs. And he has been an early adopter to, you know, technologies like Kubernetes and Docker, and actually a contributor for many years. So he's highly involved in sort of the Cozy project, the Cozy SIG. So we're gonna
00:09:58
Speaker
dive in and ask him all about what Cozy is. So without further ado, let's get him on the show. All right, welcome to Kubernetes Bites. It's so good to have you here. I can't wait to dive into what Cozy is all about and sort of what the state of it is and what your involvement is. But for our listeners, let's introduce yourself and give us a little background about what you've been up to. Thank you for having me, Ryan. So
00:10:27
Speaker
My name is Siddhartha Mani. So I guess right after I finished college in 2014, I've basically only been working on software infrastructure. I started out working at Rancher Labs as one of the first engineers. I wrote pretty important pieces in Docker, fixed the zombie process issue. I wrote the first
00:10:55
Speaker
log driver and log rotate. Then after that I moved on to Kubernetes while at Rancher Labs itself. One of the issues we were facing there was smaller service providers like Rancher Labs and
00:11:12
Speaker
other smaller cloud provider like offerings, couldn't really integrate with Kubernetes because Kubernetes is hard coded to work with AWS or GCP or Azure. So one of the things we wanted to do was break it up such that anyone could integrate with Kubernetes. And I did it for Rancho Labs where it ended up becoming a bigger feature. So that was kind of my first, I would say major open source contribution.
00:11:43
Speaker
After that, I went on to start my own company. I wanted to make day two operations in Kubernetes super simple. The idea was basically using Kubernetes as a base, it is possible to automate apps, enterprise apps, to work just like, say, your iPhone app. You should just be able to go to an app store, say download, and just click on it, and it should just work for you.
00:12:11
Speaker
That was the vision. It was a little ahead of its time. I can see more companies coming up now that are trying to solve that problem.

Introduction to COSY

00:12:22
Speaker
And so I had to kind of stop it at that point. A lot of logistical issues. That's when I went on to start working at an object service company called Minio.
00:12:34
Speaker
Um, which is again, open source. Uh, I guess, I guess everything I worked on has been mostly open source. Um, at, at Min.io is where, uh, we wanted to make it easy for people in Kubernetes to start using offerings like Min.io. Uh, and, and, uh, that's when I started to work on cozy. Okay. And yeah, so that's, that's basically my background. Got it.
00:12:58
Speaker
Well, that's a good segue too. I think to the next question I have is, uh, especially for maybe our listeners who don't know what cozy is, um, maybe we can start with that. Like, what is cozy? What does it stand for? And I think maybe the secondary question to that is, you know, um, we've talked about CSI on this podcast before and sort of understand its place in Kubernetes. And I think a follow up to what is cozy is also why introduce another standard? Like why, why is it important?
00:13:27
Speaker
That's an excellent question. So when we started out COSY, the first question we asked ourselves was do we want to introduce a new standard? And we really tried hard to retrofit COSY into CSI. Maybe I'll start with that. Sure.
00:13:44
Speaker
So COSY stands for container object storage interface. In the beginning days of COSY, the idea was Kubernetes already has an overarching kind of standard, container storage interface, which is supposed to be all storage for consuming just storage using containers in Kubernetes.
00:14:06
Speaker
However, when we started looking into using CSI, we found that there were fundamental differences between how object storage works and how block and file systems work. For instance, with block devices and file systems, the access of data is always local.
00:14:27
Speaker
you might have a middle layer that kind of translates your local requests over the network, but access is always local. So your access patterns have that assumption in place. For instance, you can read and write data as quickly as you want without worrying about latency.
00:14:47
Speaker
You can make lots of small edits to files pretty easily without, again, worrying about the round trip time of going over the network. So if you look at how files read, write, and update the POSIX API, which is the Linux API for reading and writing files,
00:15:09
Speaker
They actually allow pretty much any operation you want. You can obviously create and delete them, but also you can edit wherever. You can seek to any position and edit. On the hard drive that translates to just random seeks and making changes there. The difference in object storage is it doesn't allow you to do edits. You can create, you can delete, but you can't edit a file in place.
00:15:33
Speaker
And what that gives you, the advantage it gives you is it's always a sequential ride on the hard drive. Pretty much assuming all things considered. So you get better performance at the cost of a longer round trip time.
00:15:50
Speaker
Now, if you add to the mix big high bandwidth to the object storage service, so if you can push data at, say, hundreds of gigabytes and you're always writing sequentially, you now get performance benefits as well when using object storage. That's why object storage generally tend to favor larger files.
00:16:12
Speaker
Given this background, the way we access files, the way we access block devices, CSI was, again, hard-coded for accessing files and block devices. CSI interfaces needed actual attach and detach semantics for block devices and mount and unmount semantics for files. It doesn't have access control baked in, like object storage does.
00:16:39
Speaker
So it seemed like too much of a shock to the CSI way of things, if we were to introduce cozy. Right. Okay. That makes a lot of sense. Actually, I could see why, you know,
00:16:52
Speaker
just like some very basic things like it being always kind of a POSIX file system and mounting and kind of those kind of things, not necessarily things you need when working with object storage. So I guess a follow-up is sort of what is the state of sort of the storage interest group around Cozy? And what's your current involvement?

COSY's Strategic and Technical Insights

00:17:17
Speaker
OK, so I started this around two years ago.
00:17:23
Speaker
So we've achieved the initial goals we set out to achieve. So the course is main three goals where we provide a Kubernetes native way to consume object storage. As in you use Kubernetes objects, don't have to have a short circuit or directly talk to the object storage provider.
00:17:42
Speaker
That's one. Number two was allow DevOps personnel to provision and use object storage as long as they fit within the policy boundaries that a system-wide admin sets. And the third was make object storage portable in the sense that if you are using one object storage provider,
00:18:03
Speaker
And, you know, you should be able to move to another one as long as they speak the same protocol. So if one is three providers or another, so say Ceph to AWS or AWS to Manayo. So we've built a system that's come that far. And so as of, I would say today or yesterday, maybe, we can call it alpha. We just... Oh, nice. That's great.
00:18:32
Speaker
Yeah, so I think you already answered one half of this question. The next question that I have is around why do you not just forward everything to an object storage vendor? People have been using AWS S3 for
00:18:48
Speaker
close to 15 years, I guess, I don't know, 2007, that was like the first service that AWS launched, right? And then I was being around and Seth has been around, like, why create something that's Kubernetes native or why create this new API? Yeah, whenever, you know, unfortunately, we've been involved in creating two different standards. And I can tell you, every time I do it, I think of that XKCD comic.
00:19:16
Speaker
where we're like, oh, let's create a standard to replace all standards because there are too many and we end up with one more. I like it, yeah. We need to find that and link it in the show. We absolutely want to.
00:19:34
Speaker
So, so, I mean, most of it is kind of strategic, I would say. So say, for instance, you're a startup, you just got acquired by Walmart. Now, Walmart doesn't want to be on AWS cloud because Amazon's a competitor. Now, you know, let's say everything you've done so far as on AWS, the effort to move from one to the other is going to be very expensive.
00:19:57
Speaker
The other reason would be disaster planning and just accounting for failures. Even though it's very rare, a whole region has gone down in AWS before. It can happen in any of the clouds. So you want to have some strategy to be able to move from one to the other. Because if all your business relies on it,
00:20:20
Speaker
And you don't know how long AWS or one of the clubs is going to go down. You want to have some sort of insurance and that's where having that portability helps.
00:20:33
Speaker
And the other thing is operations. So one hard part with actually moving from one provider to another, I would say even the changes to the application code is simpler than changing the culture and the organization to use storage in a different way with a different cloud. So we standardized the operations, how you deal with permissions, who provisions and what roles have access to what policies, stuff like that is what Cozy provides on top of objects to those providers.
00:21:03
Speaker
Gotcha. Okay. No, thank you for listing these out, right? Because I know COSY does fall under SIG storage and tries to mimic what CSI does, like from an object storage perspective. So all the benefits that the ecosystem got by standardizing to the CSI API, I obviously extend to COSY as well, but I just wanted to list it out so that people understand that there is value in having that one single standard, one single way of operating your object storage buckets.
00:21:31
Speaker
Yeah, happy to talk about it. I can keep going. And your next discussion will be how do you make one standard over CSI and Cozy, right? While we were talking, I did find the link.
00:21:47
Speaker
Nice of it. I will include it. Switching gears a little bit here to more of a practitioner question is, we have a lot of people that might be wondering, how is my Kubernetes cluster going to be cozy enabled? Or another way to say this is, how is cozy installed and how do you start using it?
00:22:09
Speaker
Definitely. So I'll talk about our eventual vision because that sounds the best and we'll talk about the steps to use it right now. So eventually it's going to be a part of your default Kubernetes deployment.
00:22:22
Speaker
It's going to be a first-class object. You're going to be able to create bucket claims. Just like with CSI, you create volume claims, and then you get to use the volumes. Similar to that, there'll be bucket claims, and those bucket claims can be tied to your pods, and whichever pod it's tied to will have the necessary information to talk to the object storage provider.
00:22:46
Speaker
So, unlike CSI, when you tie it to a pod, it gets mounted at a particular location. In object storage, you need to know what the endpoint is to talk to, because it's always over the network, and you want the credentials and the method of signing and stuff like that to talk to the backend.
00:23:06
Speaker
So that information will be presented in objects that is native format. So in case of AWS, it'll provide access key, secret keys. In case of Azure, it's going to provide something called project ID. And I believe there's one more field. So depending on what the objects that is provided
00:23:29
Speaker
support protocol is, you're going to have a file that's specific to their format that's going to get mounted in your pod. So yeah, so it's going to be very similar to that. Eventually you won't have to install anything, but for now,
00:23:45
Speaker
It's a slightly different story. The bucket claim, bucket lifecycle is going to look the same. You have to install two different services if you want to start enabling Cozy in your cluster. One is called the Cozy Controller Manager.
00:24:01
Speaker
That one is responsible for listening to requests to create buckets, validating, making sure the lifecycle bucket is handled correctly. So say, for instance, someone deletes a bucket and there's data in it, how do you deal with it?
00:24:17
Speaker
So stuff like that. And then there's one more called the Cozy Sidecar, which integrates with the actual object storage driver. So if you want to have S3 support in your cluster, you'll have to install a sidecar with AWS S3's driver, similarly for other clouds. So, yeah, that's the way to do it. Okay. Now you mentioned, you know, the sort of
00:24:44
Speaker
process of creating a bucket is a claim and very similar to kind of the way that I think we've done a container storage 101 episode. I forget if we talked about cozy in it, but I think those who might be interested in going back and listening to that one and following here, but I did have one follow up is

COSY Deployment and Vendor Support

00:25:02
Speaker
those claims and the associated bucket is then associated with the pod, are buckets by default available across pods or do they also have roles and access controls where it could be a single pod versus others?
00:25:20
Speaker
Excellent question again. Yeah, unlike block devices and file systems, it's easy to allow parallel access to object storage. Again, because there's no chatter, there's no risk of two people writing on the same location because you only do reads or writes. I'm sorry, create, yeah. So by default, object storage can be accessed by multiple pods.
00:25:49
Speaker
one bucket, we can provide access to it from multiple parts. We are bringing in features to restrict bucket access to either particular namespaces or to say that for a particular namespace, this is the kind of access you can get for this bucket. So you can only get read-only access versus read-write access. This is the two kinds of security mechanism we're bringing in.
00:26:18
Speaker
Okay. Yeah, that's great. Right. Like understanding how these buckets are actually mounted or not mounted, but like used by these applications boards. Yeah. Sorry, go ahead.
00:26:30
Speaker
So I was going to say, using the word mounted, I kind of used it too with air quotes in one of the meetings. And the problem that happens is there are a bunch of lift and shift vendors, if you know what that means, that kind of allow you to go from traditional file systems to S3 by creating a S3FS.
00:26:53
Speaker
So they mount S3 locally, kind of kills the whole point of both POSIX and S3. But yeah, so people started asking, so are you going to support S3FS? And then I realized the word mounting can be dangerous here. So I stopped using it.
00:27:08
Speaker
Yeah, I think I need some time to get used to like not using the word mount, but I'll get there like when I'm talking about cozy. I think my next question was around bucket life cycle. I know we already spoke about like bucket claims and the bucket actually getting provision, but I wanted to know like
00:27:24
Speaker
If i already have existing s3 buckets or minio buckets how do i bring that in is that possible like integrating with brownfield deployments or this is something that's always fresh and how will it translate to the apis so it to my question like that's the first part the second is you said.
00:27:42
Speaker
Yesterday or today, we went into alpha with the COSY API, so I'm assuming 1.25 will have that. But with 1.25, with the alpha phase, will we have support from all of these major vendors included, or that will come in future releases?
00:27:58
Speaker
Oh, that's a good question. So, all right. So to answer your first question, bucket life cycle. So I think you use the keyword there, brownfield. I think that's a great way to explain this. So we support two kinds of buckets. We've been calling them greenfield and brownfield. Greenfield is where you create the bucket using Cozy and it manages an entire life cycle for you.
00:28:20
Speaker
Brownfield is where you already created the bucket. It's a very important use case because people might already have a lot of data in their buckets and you can't really move them to a new bucket. It's very expensive to do that. Let's say you have petabytes and petabytes, it's hard to do.
00:28:35
Speaker
So we support Brownfield deployments too. So Brownfield deployments, so we kind of have the concept of roles. So we have users of the bucket that are just users of a namespace. We have users and we have admins. Admins control
00:28:57
Speaker
at this import of a bucket, users can't do it in our system. So an admin would have to manually create a bucket object, a Kubernetes bucket object, which points to the already existing bucket.
00:29:12
Speaker
Okay. And then they can set a handle on it. They can say retention policy is delete or retain. If you set it to delete, you're kind of asking Cozy to manage the bucket for you at that point. Because if you delete the Kubernetes bucket object, the backend bucket object also gets deleted. But if you set it to retain, you're managing the lifecycle on your own, but you get to use it, access the bucket as if it was a Cozy bucket on the Kubernetes side.
00:29:40
Speaker
So that's the answer to that first question. We use Brownfield versus Greenfield. The other one is a really interesting question, I think, because it's really not very useful until all the vendors are there. Yep. Exactly. Yeah, no, it's a good question. So Koji has been going on for two years.
00:30:03
Speaker
So the simple answer to it is we have some drivers. Let me just TLDR it first, then I'll give you the explanation. So we started out with support from IBM, Red Hat, IBM has its own object storage, I forget the name, and Google Cloud. Okay.
00:30:27
Speaker
Yeah. Um, it's an old system with the IBM. I can't remember the name, but, but yeah. So, um,
00:30:35
Speaker
During the two years, we've had different vendors come in and go. We had Scality, Cloudian, a bunch of these. But the only ones who stood the test of time, who kind of sat with it while we were developing for these two years were Minio. And recently, Azure has been helping out a lot. So we do have an Azure cozy driver.
00:31:02
Speaker
which is what we used to test that the co-system works. But I'll have to check up with them to find out what's the status of it. It's testable because there's a bunch of legal process involved, I believe, before they can release it out. We have a sample driver that's available in Kubernetes SIGs org.
00:31:28
Speaker
It hasn't been used in a long time. I'll have to check up on that too. But I would say right now, it's ready for vendors to consume. So it's time for vendors to take up COSY, start writing the drivers, testing it out, making any fixes that are needed. We welcome that. This is a great time because it's stabilized to some extent. Otherwise, vendors can't really participate.
00:31:52
Speaker
Yeah. It didn't make sense, right? Given that it's alpha, maybe users can experiment with it, but nobody should be running alpha in production. Wait for the APIs to be graduated, maybe move to beta or maybe actually be generally available. But as I said, now that the standard is at a good enough point, vendors can start coming in and build out that ecosystem. Hopefully more vendors stick it out for the next phase, take it from alpha to the next steps.
00:32:19
Speaker
Yeah, I think I think this is the hardest part of starting a new project. It's that initial huge hurdle. And then it gets pretty smooth because there are systems in place that are people interested. So I think we've gotten past the hard part. Makes sense. I guess a follow up to that, that I'm curious about is sort of what are the sort of challenges for having those companies that may have come and go or ones that, you know, like Azure, who are really
00:32:47
Speaker
building back up, what are the challenges of keeping them around? Is it really just getting the project to this alpha and beyond state, and then that's the way we'll get people to come back? Or just curious there? I mean, it's a good question. So it comes down to what they have, what skin they have in the game, really.
00:33:13
Speaker
These vendors look at Cozy as something that they can also support once it's a thing. Until then, they don't have any reason to push for it. Because it really is a mechanism for you to move between clouds, why would they support that? Unless people are coming to your cloud.
00:33:34
Speaker
So until COSY becomes a thing, they don't have to participate from a strategic perspective. The ones that do are either doing it for the sake of open source or because they see them benefiting by being the first here or making sure that their system works well at COSY, stuff like that.
00:33:56
Speaker
So I think most of them are just waiting it out for it to become alpha. That was the main issue. They were all just here to look it up, see where it's at and implement a driver if everything was ready. Yeah. I feel like that's something we've seen in the past too, is when customers come to you and say, I want this like they did with... It took a while even with CSI, right? When you didn't have to necessarily use CSI, but it was there.
00:34:22
Speaker
Then you started seeing end users and customers come to you and say, well, my CIO or my architect says, we're going to base everything on this from here on and really use the standard. Then they come to you and say, I need this. Then you tie your hands up and say, okay, we're going to go in and make sure we support this well, which makes sense.
00:34:46
Speaker
The other second part of what you said around community, which in the Kubernetes space, in cloud native space, I feel like community has been such a strong pull for a lot of companies. And I think there's so much value in just showing up in the community. Even if that means you're early, just showing up and being there does mean a lot. And I feel like it's continued to grow even so much more.
00:35:12
Speaker
It's good to see. Switching back to a little bit of back to the practitioner approach question here. You know, I think one of the questions some may have might be whether these buckets are provisioned as something
00:35:27
Speaker
first class in Kubernetes, meaning is the bucket, are you sending data locally? And then magic happens on the backend, or are you just provisioning to the different storage vendor backends and it's sort of a front-ended? Or is there an option to do one or the other?
00:35:46
Speaker
Okay, I'll have to clarify a little bit more about the question first. So when you say data, you mean, so there's like control data, which is go create a bucket for me, and there's data data, like where you store the files. So you're talking about control data, right? Yeah, I think actually in this case, a little bit of both, right? Is there an opportunity for you to provision buckets and run sort of object storage on your Kubernetes cluster? Maybe you could touch on a little bit of both then.
00:36:16
Speaker
Okay, so with control data, yes, the configuration we give Kubernetes asking you to create the buckets is locally persistent Kubernetes. And until the connection to the backend and the eventual processing succeeds, we're going to keep retrying.
00:36:33
Speaker
So control data, for sure, yes. With actual data, it's a very tough thing to do. There are some vendors who do it. There's a S3 caching vendor. It's a pretty famous one. It's a good name. I forgot the name of the vendor. But they do S3 caching. So basically, for fast queries, they kind of have a caching layer for it.
00:37:01
Speaker
That's one. Then, you know, we could run something like that locally, but that's not something Kubernetes will support. The support from Kubernetes is going to be just the building blocks of building systems like that. So if you want to build a caching layer, you can do it. Okay, makes sense. Thank you. Yeah, I think I just wanted to understand
00:37:27
Speaker
where the data is actually hosted and located. Next question is around data protection, right?

Data Protection and Future Possibilities

00:37:33
Speaker
Coming from a block and file world, I know there are volume snapshots. That, again, took a few releases to go and graduate, but how does that work with the COSY standard? How do we protect the data that's being stored in those object buckets?
00:37:49
Speaker
That's a good question. So the answer to that is it comes down to this. So when you look at storing files locally or storing files in block devices, you're really just creating one copy unless like multi-path or something's enabled.
00:38:07
Speaker
And even then, it's hard to snapshot a file system while something is happening. Same with block devices. So with object storage, that problem doesn't exist because writes are atomic. So generally, the assumption is whenever you write to object storage, the data comes with some guarantees. For instance, S3 provides redundancy classes. You can choose to have it in reduced redundancy or regular standard.
00:38:37
Speaker
Excuse me. So there is really no need to do snapshotting. Okay. If you want that kind of primitives where you want to say I want to go back to what it was at some point, you do have versioning in S3. Google Cloud supports it, Azure supports it. Yeah. Okay. So it depends on the...
00:38:59
Speaker
Yeah, it depends on the provider, right? When they build those sidecars or their plugins, that's how we'll enforce these data protection strategies and then offload those to the actual S3 buckets. Okay, that makes sense. Thank you.
00:39:13
Speaker
Yeah, a quick follow up to that is I know this might be a loaded question because it is dependent on the vendor and the vendor supporting something like this next question might be tough. But given that cozy is sort of enabling you to be cross cloud and enable those multi cloud use cases.
00:39:32
Speaker
Is there ever a world where you can say, you know, I've provisioned these buckets in S3, but I do want to become multicloud and maybe move my data from there to somewhere else. And that may be a different cloud. And you could look at that like some sort of bucket replication to a different cloud, which is, I think, a tough question. But is there a world where COSY supports something like that?
00:39:55
Speaker
Yeah, we've talked about this quite a bit actually. So I can't take the names of the actual customers who do something like this, but this is a pretty common use case. People either start out on Amazon with their prototype data,
00:40:10
Speaker
And then once they need to get actual data, say it's personally identifiable information or health records, they can't really leave it on the cloud. So once everything is prototype, proven their works on the cloud, they bring everything back home to their own on-prem data centers. And sometimes the migration is the other way. When you want to scale beyond what's possible on your local data centers, you want to go to the cloud.
00:40:33
Speaker
So in both cases, you know, I've worked in Minayo, I've dealt with customers who've gone through these processes. So Cozy was designed to support something like this.
00:40:45
Speaker
So there are a few things to consider in doing this. Transferring data from one bucket on another is very tough. It's just the sheer amount of data makes it very hard. And clouds don't make it easy. So getting data into the clouds is free, basically. Sending data out is very expensive. Egress all gets you. Yeah, really gets you.
00:41:08
Speaker
And it reminds me of the story. So I was benchmarking, I believe, Presto on AWS with S3. And it was about one terabyte of data. And I ran the benchmark only once. It was the TPC benchmark, one of the standard ones. Yeah. And it's TPCDS, the data science one. And one round of the 17 queries, I think, it ended up being about 60 terabytes. So it pulls the same data 60 times.
00:41:38
Speaker
And I didn't realize at that point the data was in California, US-West 2. They charged you for that so much. The region transfers would kill you.
00:41:55
Speaker
So when I was running it, I suddenly got this thought, did I set it up in the correct region? And then I saw the throughput. It was coming in at 100 gigabits per second. You can't do that over across regions, right? So I thought it's OK. But it was actually going across regions. Yeah, that was an interesting data point. When I was transferring data across regions, it was able to do 100 gigabits.
00:42:24
Speaker
Gotcha. I was just trying to see if I got this right. There is no real migration support from between buckets today, but that can be something in the future. It'll be something that the customer will have to, or you can build systems on top of courses that do that. We allow you to, we give you the building blocks for it.
00:42:49
Speaker
This could be a company of its own someday. Yeah. Yeah. For our listeners, this is a great startup idea, you know? Okay. Thank you. Thank you. I think the next question that we have is around like roadmap. Like we are at alpha. What's next?

COSY's Roadmap and Community Engagement

00:43:08
Speaker
The most important thing right now is getting vendors on board, having cozy drivers, because once that is in place, the way people consume object storage is going to change. They'll have the benefits of cozy and just because it's backed by Kubernetes, it's going to get adopted.
00:43:33
Speaker
And they're going to see the benefits once they're adopted. Now there's that famous saying software is never bought or sold, it's adopted. In this case, yeah, you know, I don't have to go sell it. I don't have to say here are the benefits. They're going to adopt it and then they're going to see here are the benefits. Yeah, because everyone's going to be using it. Yeah, yeah. So the real benefits are that we'll kind of have a structure on how buckets are managed.
00:44:04
Speaker
So for instance, let's say you're the admin for an entire team at a large organization. You don't want to be dealing with requests to provision buckets or delete buckets for every individual employee. You want to leave that off to an admin for that team, whoever that person is.
00:44:21
Speaker
The person who's sitting at the top, setting policies for the whole company or part of the company, just wants to do that, just set policies. They want to say, overall, don't use more than 100 terabytes of space that we have. Overall, don't give more than one bucket a user, stuff like that. We don't have support for all that yet, but that's in the roadmap eventually.
00:44:45
Speaker
And going back to the workflow, the admin for that team, the person who's responsible for provisioning buckets for that team can now manage buckets within the boundaries that's set by the larger admin.
00:44:59
Speaker
Yeah. So this workflow is where one of the biggest benefits is going to come. A big cell of DevOps is not just Kubernetes. DevOps using Kubernetes is not just the fact that we can automate the life cycle of your application infrastructure with Kubernetes, but also the fact that you can now reduce the chance of mistakes in your deployments, business declarative.
00:45:24
Speaker
So it comes with tools like Kubernetes and processes also. So COSY addresses both of these. Yeah, and especially like with the increased adoption of Kubernetes, right? If this becomes, if COSY becomes something that's available by default with every installation, yeah, the overall adoption is going to increase. Yeah, anything, sir?
00:45:47
Speaker
Has there, has there been anything that really the community or the SIG or anyone involved in Cozy has really been sort of calling out as, you know, what's missing, right? Like, is there anything that's glaring that like Cozy really still needs to dive into? I mean, I know you mentioned a few things, but curious on your take there.
00:46:05
Speaker
Yeah, actually, and we really appreciate this when people try it out and they say, hey, this thing doesn't work, or we need this really badly. Yeah, because that's the feedback from users, right? And if you don't listen to that, who are we building for? So one of the things that keeps coming back is setting bucket level policies. Or like I just mentioned, where you want to say 100 buckets per namespace, no more.
00:46:35
Speaker
stuff like that people keep asking for. So far, the work has mostly been towards the basic blocks of it, but we're going to address these steps before beta.
00:46:49
Speaker
Got it, makes sense. Given that the community is so important, how has the general adoption of Cozy Ben? I know it's just turning alpha, but what's that early feedback, Ben? Has there been any real quality use cases that have come back to you or the SIG and with valuable feedback?
00:47:15
Speaker
So I've been involved in this for a long time, for two and a half years now. So I've dealt with, or I've seen customers and mostly vendors, I would say, who've come in and wanted to participate, wanted to see Cozy become an alpha API.
00:47:33
Speaker
and being disappointed that it wasn't. There are many reasons for that, we can go into it, but there are main reasons for the less well-known providers. In order to make a sale, it really helps to say one, we're a part of an official offering, cozy, we work with that well, and you have the portability to go somewhere else if you really need it.
00:47:59
Speaker
But for now, you have us. Say you're a managed service provider, you want to ask your customers to, you know, you provide object storage for your customers, you take care of the cost, you take care of operations. Once they scale to a bigger level, you want to move them from your local setup to AWS or something. So in use cases like that, you know, I think the smaller players are the ones that see the benefit
00:48:28
Speaker
are the ones that are looking at it more, I would say, because we haven't really seen AWS show up here yet. But when they do, that's when it's going to be big. Got it. Makes sense.
00:48:42
Speaker
Well, I think that's a good segue to really dive into maybe talking about the best way for either users, practitioners of someone who consume cozy or even developers. It sounds like there's a lot of opportunity if you're a developer, even new to Kubernetes possibly, or maybe a veteran in Kubernetes and looking for a new project.
00:49:03
Speaker
It being Alpha, we had a great conversation recently about the value of just showing up to a new project and everything's sort of out in the open and you can talk to everybody in the Kubernetes community and start small and really learn the process. So a little bit about where people can get started either using it or involved in the project would be great.
00:49:24
Speaker
Yeah, we're always looking for more contributors. We welcome contributors of all backgrounds. So if you're a content writer, if you want to do testing, DevOps, just write code, or just do architecture, we have roles for all of them. Or something entirely new that you show me is important here, for sure.
00:49:49
Speaker
Over the years, we've had a lot of different contributors from all these backgrounds. They've all enjoyed their time. Many, I would say, very young people, just either in college or just graduated or just first jobs kind of people. They've always been very thankful to how much they got to learn here.
00:50:13
Speaker
And that's the kind of environment, I think that's my natural style of working with people. I think I tend to like,
00:50:24
Speaker
I tend to understand what they need and if it's something that I can teach, I'm always happy to do it. I would say that will always be present for you if you came because you're learning. If there's something I can help with, it's there.
00:50:44
Speaker
Now, in terms of users that want to use it and try it out, yeah, we welcome beta testers. And otherwise, people that want to just try it out, we're actually just updating our docs. But if you were to go to GitHub.com slash Kubernetes dash SIGs dash container object storage interface dash API, it's a long thing. It would be good if you can just paste a link somewhere for it. Oh, yeah, we do that for sure.
00:51:13
Speaker
Perfect, yeah. We're gonna have all the docs you need to get started, try it out, break it, and let us know how you broke it. Great, that's awesome. Yeah, I feel like we need to invest in like a cozy.github.io, because, you know, I will post the link, but container, object, storage, interface, github.io, I know is another place to land, not to be confused with, don't land on cozy.org, it's not the center of science and industry.
00:51:42
Speaker
Oh, which I do. We do a quick Google search. Just don't get confused. We'll include all that. But yeah, I really appreciate you coming on the show. It's been a wealth of information. I think a lot of people will get a lot out of this episode. I think we've touched on a lot here. So we do really appreciate you coming on Kubernetes Spites and being part of it.

Episode Reflections and Conclusion

00:52:02
Speaker
Yeah, thank you so much. I enjoyed every bit of it. Thank you for having me.
00:52:06
Speaker
All right, Bhavan, that was a great show. I think I got a lot out of interviewing Sid. I know cozy is something I've really wanted to dive in and get my hands on more. So really getting sort of the lay of the land and sort of the reasons behind why it exists and all that was really insightful to me. But I'm curious, what were your takeaways that you would kind of like to talk about here?
00:52:30
Speaker
No, I completely echo that sentiment. These 101 episodes are a great resource for anybody who's listening to this podcast. We see that in the number of listens we get as well. But the things that I took out of this episode are from Sidwas.
00:52:48
Speaker
that Cozy, even though it has been around for two years, is now in the alpha stage. Yesterday or today, I think he said, is when they made it alpha. It's still brand new. There still needs to be a lot of work done before people can start using it in production environments. Just be mindful of that. You can't expect features similar to CSI, the standard that has been around for a few years now.
00:53:11
Speaker
That is something that I learned as well. I learned how it will work with the different ecosystem vendors building these sidecars or building these plugins and providing that functionality. Right now that the standard is set or the standard has been created, vendors can start coming in and start contributing to it. Things like
00:53:34
Speaker
where the data is stored or how it's deprecated across different buckets, things that are missing. Again, great opportunity for people who want to get started and start contributing or vendors that want to come in and start contributing. But yeah, still in alpha, not ready for production, but it's an option.
00:53:49
Speaker
Absolutely. I mean, it's definitely such an interesting space. I mean, just how much value something like CSI has brought to the market and how much adoption it's seen, seeing something like Cozy come out and sort of it's building on its shoulders, so to speak, and following very similar primitives like the same volume claim to volume sort of process and how you instantiate a bucket versus a volume, those kind of things.
00:54:18
Speaker
I think it makes a lot of sense. Also, I like the conversation around why there's a separate standard. The investigation was thoughtful in the sense that we tried to put it in CSI, but there is enough difference where it would have caused probably confusion.
00:54:38
Speaker
convoluted CSI in how you used it and how it worked. So having it be a separate standard I think makes a lot of sense. And I'm really excited to see the adoption going forward and how it's installed today. I think that's just a mark of its maturity at this alpha stage. But just like CSI in the future, Sid talked about the vision of it'll just be there. And then you just
00:55:06
Speaker
install the specific driver you need to work with whatever vendor or open source version of object storage you're using. Really exciting stuff. Again, we'll post all the links on how to get involved and started with Cozy if you want to get your hands on it or if you want to learn how to get involved in the SIG or as a developer, we will post all of that.
00:55:30
Speaker
And even include the XKCD link that's mentioned. Yes, we absolutely will. Funny comment there. Cool. So with that, as always to our listeners, please rate, comment, send us a message, send us a DM on any one of these platforms that were available on Twitter, Anchor, anything you can. It gives us a lot of helpful
00:55:55
Speaker
feedback to what the show is, what we want to do with it, new episodes, new things like that. So please definitely go take a look at that. And we are going to plug next week's episode. Actually, we have a special episode. We're talking with Brendan Burns and Ganesh from Microsoft on all things AKS and community. And Ganesh is actually an intern turned employee at Microsoft. Working on AKS has a lot of interesting
00:56:22
Speaker
things to say about, you know, how that process went and we, you know, go in, you know, we're going to go into all that detail with Brendan. So really exciting episode next week. And with that, that brings us to the end of today's episode. I'm Ryan. And thanks for joining another episode of Kubernetes Bites. Thank you for listening to the Kubernetes Bites podcast.