Podcast Introduction
00:00:03
Speaker
You are listening to Kubernetes Bites, a podcast bringing you the latest from the world of cloud native data management. My name is Ryan Walner and I'm joined by Bob and Shaw coming to you from Boston, Massachusetts.
Podcast Focus and Industry News
00:00:14
Speaker
We'll be sharing our thoughts on recent cloud native news and talking to industry experts about their experiences and challenges managing the wealth of data in today's cloud native ecosystem.
00:00:29
Speaker
Good morning, good afternoon, and good evening wherever you are. We're coming to you from Boston, Massachusetts. Today is April 14th, 2022. I hope everyone is doing well and staying safe. Let's dive into it. So, Bobbin, welcome back to Kubernetes Bites.
Catch-Up and Event Planning
00:00:46
Speaker
It feels like we haven't done this in a while, even though it's just a couple weeks ago. I don't know why that is, but it's good to be back. What have you been up to?
00:00:53
Speaker
I think that feels like it feels like a long time because we have been doing a lot of planning. So I think we have booked next four episodes and it still comes out every two weeks. I think that's it. But I'm getting ready for trade show season. I think next week I'll be out at AWS summit in San Fran and then maybe have a whole bunch of these summits and cube corners in May. So I think that's the
00:01:18
Speaker
best time to be a TME, right? Like just going and talking about the products. And this is actually crunch time. So come up with the content and make sure you have all the demos ready to go. But that's for me. Yeah, that's right. We have the shows opening back up, the world opening back up. That's exciting for sure to be able to meet up in person. I know I'm excited for
Main Topic Introduction: Kate Sandra with Patrick McFadden
00:01:38
Speaker
KubeCon in just a few weeks, really. And then we're out in LA after that as well. So yeah, I hope it stays that way. We all get to enjoy these trips and really get to, you know, talking to people in person. It's been a while. Cool. Well, today we have a really cool topic. We're going to be talking about Kate Sandra.
00:02:01
Speaker
And we do have a guest Patrick McFadden on before we introduce him and dive into the show and get him on here. We do want to cover today's or the last couple of weeks of cloud native news. So why don't you dive into it, Bob?
Funding News: Docker and Garden.io
00:02:17
Speaker
We have a whole bunch of funding grounds and an acquisition to discuss today. Let's start with the funding round. Docker, for people who are still new to this and don't know, maybe this is a good time to learn about what Docker is doing, but they raised a CDC funding round for $105 million. This is more focused towards developer productivity and making sure that they can spend more time on
00:02:42
Speaker
innovation and writing code and less time on everything else. They want to speed up the inner loop of the development cycle. I was reading through that article and I was like, what is the inner loop? I googled that and I figured out, okay, inner loop is the process when
00:02:58
Speaker
a developer on his own is writing, building, debugging code in a single machine on their own laptop, maybe. And then outer loop is when you hit it, like submit it for version control and then you let the CIC system test it. So I think Docker focuses to make sure the inner loop is performing better. So like developers are more productive and then passing it on to other orchestration systems or deployment systems. Interesting. I don't think I've actually heard of that term either. So that's a new one for me. Apparently it's a thing for a couple of years now.
00:03:28
Speaker
All right, now he learns something every day. I think Docker has done a great job of sort of pivoting to really focus on the developer. So I love to see the new funding. Awesome. The second company, another startup, Garden.io, they raised Series A round of $16 million. And the blog said to combat waste in cloud development. So again, I had to dig a bit deep and find out what does it actually mean by it? Catchy.
00:03:56
Speaker
So one of the stats that they highlight based on a survey that they had done last year was cloud developers only spend 11% of their time on average actually writing code and then everything else is just spent on building internal tools, building and maintaining internal tools, setting up dev environments, debugging pipelines and so on and so forth.
00:04:15
Speaker
Garden.io wants to make all of this easier, and their approach is, okay, large distributed apps were not supposed to run, like the whole app, you can't run on a single laptop inside a dev environment. They want you to have a production like hosted dev environment that is in sync with your dev or your SDE. So whenever you make any changes, if you have the dev mode enabled, those changes are directly reflected into the hosted environment.
00:04:44
Speaker
without you having to build a container image or push it through your CI CD pipeline and then figure out how the change impacted the rest of the application. So all of this sounds really cool. It's still series A, so they have a few customers listed, but it would be a good company to watch out for.
00:05:01
Speaker
And then from an acquisition perspective, our friends at NetApp acquired InstaCluster.
NetApp's Acquisition and Kubernetes Updates
00:05:08
Speaker
And InstaCluster is a database as a service company. I think they are focused on deploying and offering a managed service for different databases in the cloud. And just looking at their website, they offer PostgreSQL, Kafka, Open Search, Cassandra, Redis, and Zookeeper. So a few services that customers can just leverage for
00:05:28
Speaker
leverage as hosted databases in the public cloud. Yeah, it will be really interesting to see what that becomes inside NAP. Really exciting stuff.
00:05:38
Speaker
Yep. And then we have, since we are recording this on April 14th, I think in a couple of weeks, we'll have Kubernetes 1.24 out. So May 3rd is the official release date right now for the release. And this release will have an interesting change where Docker Shim, which was
00:05:59
Speaker
deprecated for the past three releases is officially getting removed. So people who are running Kubernetes clusters, they need to run, just make sure that they are ready for the new release when they upgrade. This means that if you are running the Docker runtime as the container runtime on your Kubernetes node, you need to change that to use container D.
00:06:20
Speaker
if you are using docker to just build your applications or build those container images you don't have to worry about it if you are running any public cloud managed kubernetes services they have defaulted to container d for a while so you can just verify but don't worry you you would most likely be running container d i will be linked to a blog in the show notes which walks you through a few commands that you can use to
00:06:43
Speaker
check the runtime on your nodes. And then if it is Docker, how you can uninstall that and install container D and then use that as part of your node. So a really good thing to keep an eye out for since we are approaching the release date for 1.24. Great.
Security and New Features in Portworx
00:07:02
Speaker
And then the next thing I had was another day zero vulnerability. Again, these keep coming up and something that we should keep an eye out. It's called Spring for Shell. Again, it is similar to log for shell or log for J vulnerability that was
00:07:21
Speaker
announced earlier this year, it allows people to do remote code execution, and it is specifically for environments that are running JDK version 9 plus. Then a couple of, if you should be running Tomcat server, there are certain conditions that you have to meet to be exposed to this vulnerability. Make sure you read the blog post, understand if your environment is vulnerable or not, and then apply the necessary mitigation that's already available for you.
00:07:50
Speaker
With that, I think the next thing on our list is discussing the latest version of Portworx Enterprise 2.10, something that both of us are really excited about. I know it's a big release for us. A few key features introduced. I'll get started with one and then you can add on to it as well.
00:08:09
Speaker
The feature that I am excited about is application IO control. So that allows administrators or platform admins or DevOps admins, whatever the role may be, they can now specify IOPS and throughput limits, both for read and write.
00:08:25
Speaker
for any persistent volume that's running on a Portworx storage cluster. So this means that if you want any new volumes to have that limit specified, you can update your storage class definition and use a couple of parameters to either set the bandwidth limits or IOPS limits for your storage class and then any persistent volume that gets provisioned using that storage class will automatically inherit those settings. And then for existing customers who want to start using this as part of the 2.10 release cycle,
00:08:55
Speaker
they can go ahead and use our pixie cuttle or pxctl utility to basically add a parameter to the persistent volume and the limits take effect almost instantaneously. So that's like for me, what about you, Ryan? Yeah, there's a lot of great stuff. We actually released shared v4 services quite some time ago. It's been in beta for a while.
00:09:19
Speaker
With this release, it becomes full GA, so that's taking basically NFS volumes, and you can access them in and outside the cluster. It's highly scalable and fails over by itself, so that can be fully GA'd now. There's also one that I love, which is called Trashcan, Portworx Volume Trashcan. It works a lot like naming it. Yes, yeah, Trashcan.
00:09:39
Speaker
You can't go wrong with trashcan money. You talk about something like this. I mean, recycling bin on Mac, I believe, but it works the same way, right? So you delete a file on your computer. It's not actually gone. It's just in your trashcan. It's in your recycling bin. And you got to go in there and fully delete it so it's, you know, evaporated forever. The same idea here. You can set sort of an expiry on volumes. And if if something were to occur, such as an administration error, or if you have your reclaim policy set to delete, you didn't mean to.
00:10:07
Speaker
and you delete an object and your volume gets deleted, your data is still safe in this scenario until your expiry runs out. And you can recover those volumes and trash can back into your environment in Kubernetes and really helps with that sort of, oops, data loss, lost my wallet feeling, and a really cool feature to check out.
00:10:28
Speaker
There's also some great stuff around auto office trimming and volume placement strategies for stateful sets, meaning you can basically define a stateful set and volume placement strategy specifically for the pod in that unique deployment. And it'll group your volumes or do what you want to do, anti-infinity for those kind of things. So really, really exciting release for Portworx for sure.
00:10:53
Speaker
In addition to that, I just have a couple more here.
Data Inconsistency Warning and Innovations
00:10:58
Speaker
There's an article about etc 3.5 specifically for 01 and two versions, which basically has a data inconsistency issue. So if you're running etc 3.5 01 or two, you definitely want to take the link
00:11:17
Speaker
in the show notes, which can actually lead to certain corruption issues. You can check your cluster as we recommend not upgrading to 3.5 and waiting till 3.5.
00:11:30
Speaker
three releases out and that will be fixed. So definitely go ahead and check that. etcd is a big part of all cloud native applications or Kubernetes in general. So definitely something to go check out. And then this last one is Postgres container apps. I thought it was a really cool article by crunchy crunchy data came out with this basically way to extend the internal Postgres deployment to
00:11:56
Speaker
be able to run the container. So there's this little create extension and run container method that you can actually run a container from within the PostgreSQL CLI. And it'll run a Docker container with something that you want to do good for, kind of quick and dirty metrics, deployments, those kind of things. If you don't want to go through the trouble of fully deploying
00:12:18
Speaker
your Kubernetes sort of init containers and things like that, or sidecar containers, which probably is still the way I see running and production happening, but definitely a cool little way to see sort of the ability of running containers to make it into the application. So go check that out as well.
00:12:36
Speaker
And I think with that, Bhavin, that's all our news.
Meet Patrick McFadden: Background and Work
00:12:39
Speaker
It might've been a lot of news, but I think those are all really worthy articles. So we can dive into our topic today on Kate Sandra. And again, our guest is Patrick McFadden. He is the co-author of the upcoming O'Reilly book, Managing Cloud Native Data on Kubernetes. And he currently works at DataStacks in the Developer Relations group and as a container
00:13:04
Speaker
a contributor to Apache Cassandra project. So really excited to get Patrick on the show. So let's go ahead and do that. Awesome. So we have Patrick on the call now. Patrick, welcome to Kubernetes Bites. We are so glad to have you on to talk about Cassandra. Why don't you introduce yourself and tell us what you do.
00:13:26
Speaker
Yeah. Hi. Well, thanks for having me today. Let's see. What do I do? I do lots of things right now, mostly work on open source things. And what are those? So my name is Patrick McFadden. I work at Datastax. We're a database company, mostly cloud now. And
00:13:42
Speaker
We do actually all things data. You'll hear me say the open data stack a lot. It just kind of worked out with her name. But I've been working on Apache Cassandra for over 10 years now. I've been part of the project for a long time. Apache Pulsar is another project I have involvement in. Spark less now, more before. But also lately, a lot of Kubernetes stuff. Right now, I'm in the middle of writing a book with my co-host,
00:14:11
Speaker
co-writer Jeff Carpenter called Managing Cloud Native Data on Kubernetes, which is a ridiculously large topic that we're trying to squeeze into a book form. We're about three quarters of the way through. You can get early access now that's out there. Thanks to Portworx. Thank you. And I think part of that
00:14:36
Speaker
or part of that writing of that, it probably will be very helpful for some of the things I've learned for your listeners. Awesome. We'll definitely include a link for those early accesses, the 75% of the books that you mentioned, so people can download it and start reading it.
00:14:51
Speaker
But you already spoke about the topic for today, right? It's Cassandra. Can you tell our listeners what Cassandra is? How is it different from other data services or databases out there? And why are we talking about Cassandra on Kubernetes at all? Why even talk about that, right?
00:15:12
Speaker
Yeah, Cassandra is one of those databases that affects your daily lives. And I love that fact, because I've been working for a long time to make that a reality. But it's a NoSQL database. It was in that early days of NoSQL in the 2008-2010 years that is when we were trying to figure out how to scale data.
00:15:40
Speaker
This is one of the solutions and it's really taken off. It's it's a very popular database, but essentially what it is is this shared nothing database fully distributed. It leans more towards the availability and partition tolerance. So one of the useful and great features of Cassandra is that it just it.
00:15:59
Speaker
withholds, it holds up to a lot of abuse. So you can lose an entire data center, or you can lose half of your cluster, things like that. And it'll still keep tracking. That's, you know, these are, these are just the way that has been built. These aren't new features. These are just from day one, this is how it works. Having, you know, in working with it is a different kind, it's a different type of database. And I spend a lot of my time helping people build applications. But it's a transactional database, you
00:16:29
Speaker
it's better closer to your users. And it's, and why on Kubernetes, because it actually has always followed the same suit of the ideals of Kubernetes, which is, you know, this always on like, you know, the the self healing, it scales in a linear fashion, so that you just if you need more, you add more. And, you know, it, it
00:16:53
Speaker
It is a really great way.
K8sandra Project Overview
00:16:55
Speaker
The elasticity, which is becoming more a part of it, is a great way to manage costs. This is a typical problem with Kubernetes and cloud is you always add infrastructure and you forget to remove it.
00:17:09
Speaker
really good for that. And so, and the resilience is a key thing, you know, like, just thinking about like, if you run infrastructure, you know that eventually something's gonna fail and probably a lot of something. And what does it mean to your customers? So Kubernetes, Cassandra, they both follow that same thing of like, well, you should be online as much as possible.
00:17:34
Speaker
That's like a great way to put it together. But okay, so how do I get started with Cassandra on Kubernetes? I know there is a project called K8 Sandra. How do you say it? Let's start there. How do you say that? Yeah, I know. We got clever inside, you know, and it's funny how this has been the problem, but it's Kate Sandra's and Kate, Kate, the name, Sandra. Now everybody knows. Yeah, everyone knows because you're one or two million listeners. I can't remember. It was a huge number.
00:18:02
Speaker
But, you know, they're all going to be like, now see when you're at a party and you talk about it, you can sound super cool. And if somebody mispronounces it, you can sound pretentious. Like, I'm sorry. No, it's cute. Because, you know, there's people out there listening to do that. Actually, this is how we do it. It'll be one of those cute and cute CPL things. Oh, yeah. Well, this kind of dates me. But when Linux first started to become a thing,
00:18:31
Speaker
There were people that called it Linux. Oh yeah, that was a faux pas. Yeah, well it's because everyone tried to make it Unix. It's Linux. Get it?
00:18:56
Speaker
But yeah, the Kate Sander project is another open source project. It's not in CNCF yet or in Linux Foundation yet, but it's a project that Datasax is sponsored to change the way we deploy Cassandra and just make it completely Kubernetes native.
00:19:17
Speaker
Cassandra is a large distributed system and to install it, you got to be ready for that. Wouldn't it be great if you could just do a Helm install of Kate Sandra and have a running Cassandra cluster? Wouldn't it be great?
00:19:37
Speaker
Here's the secret, right? So I mean, you know, it being sort of the project to really run Cassandra on Kubernetes, maybe we can talk about some of the challenges of not using a project like Cassandra. And what are some of the challenges of deploying Cassandra on Kubernetes without an operator? And what does it solve?
00:19:59
Speaker
Yeah, well, you know, that's this is a classic open source problem. We the reason case Andrew exists is because data stacks was solving this problem.
00:20:11
Speaker
And we went down, so we have Astra, which is our Cassandra service. And, you know, that that's a cool cloud native service. You just click a button, you get a Cassandra cluster. But, you know, like when you're building a SaaS product, clicking a button and you get a thing, there's like a bazillion things behind it, right?
00:20:32
Speaker
The first incarnation of Astra was not Kubernetes-based, and it was not easy to run. It took a lot more humans than we needed, and it wasn't as robust. It just had every tagline you can think of from KubeCon of, why do we switch to Kubernetes?
00:20:53
Speaker
We went through the process of converting all our infrastructure to Kubernetes. And in that process, we learned a lot about running large-scale, stateful workloads in Kubernetes to a point where we felt OK putting up an SLA.
00:21:11
Speaker
And all that knowledge, we were able to, and because we're an open source company, we believe this should be in the open source. So our operator, and then eventually Kate Sandra, and there are more innovations coming, but it's taking what we've learned running a cloud service with Cassandra and open sourcing it.
00:21:34
Speaker
Okay, like that's really cool, right? I didn't know Astra was running on Kubernetes. Like Astra basically deployed Cassandra instances on Kubernetes. So that's awesome. And you're taking all of those learnings and putting it out there for everyone to leverage. That's great. But like, okay, how do I like get started, right? Like it's open source, but how do I run it against my Kubernetes cluster? Do I have to create a thousand line YAML file for it? Or how do you see this to get started?
00:22:01
Speaker
Oh, come on now. If you start with Kubernetes, you know there's going to be YAML in your life. Hopefully not much. If you go to KeithSandra.io, which is our K8Sandra.io.
00:22:15
Speaker
Make sure we get that. We have a getting started that will take any and all environments you can think of. We can do it locally with Minicube. We can do it on any cloud provider. As a matter of fact, all the cloud providers now have Kubernetes as a service. We have startup for that.
00:22:34
Speaker
But the simple getting started steps are getting the Helm chart installed in your local repo. Yes, there is a YAML file. And the bare minimum is pretty minimum. It's like, yeah, this is my image. This is the namespace I'm installing in. And here's how many nodes I need. Walk away.
00:22:57
Speaker
And because I know you're really excited about this, you can go much crazier with that YAML file if you want. But that's enough to get like a small three node cluster running on Minicoot.
Deployment Challenges and Solutions in K8sandra
00:23:09
Speaker
And all the smart defaults that you have, like it will store like three replicas, like the replication factor will be set to three and nodes will be configured according to best practices.
00:23:20
Speaker
Yeah. And, um, and I know something near and dear to your heart, um, then you get into like, what kind of storage are you going to use? And, um, as with any stateful workload, storage is pretty important aspect of that. Um, in MiniCUBE, you know, you're going to use local files. Um, as you evolve your, your footprint, you're going to be thinking about, all right, what storage, how am I going to store this data?
00:23:46
Speaker
And we really push hard on you and got to make sure that you use a good class of storage. Storage is kind of the make or break for a good running cluster. So that's the first thing. And then the second thing is making sure you have a good clean network because it's a distributed system.
00:24:05
Speaker
You mentioned something before that you have Astra and Caitzandra both as sort of an offering to deploy Cassandra in various ways. Now, I'm curious, who's the Caitzandra project really tailored to versus why wouldn't you just go and deploy using Astra? Who comes to one of those services versus the other?
00:24:31
Speaker
Yeah, and I don't think this would be a surprise to anybody. There's a lot of folks out there that are not trustful of handing over their infrastructure to a service provider just yet. As much as Amazon, Google, and Microsoft have told you, oh, just leave the driving to us. EC2 still exists. But I think what it comes down to is there's a lot of companies that are
00:24:59
Speaker
standardizing around Kubernetes and want to own their own infrastructure, their own infrastructure story. That's okay. What's interesting is that, and this is what I've talked to a lot of folks in our industry, especially around data, is that data is the crown jewels. That's the thing that they want to hold onto. Don't hand your data to a Cloud because then you're locked in.
00:25:29
Speaker
And the clouds threw in the towel a while back and they said, yeah, OK, Kubernetes is a thing. And now they all have standard Kubernetes services. So you have a very portable, you know, I mentioned this in my book, shameless plug, that, you know, we went from creating virtual machines and using virtual machines and eventually containers.
00:25:54
Speaker
Now we're creating with that big YAML file that you love so much about it. We're creating virtual data centers and we're deploying that in the cloud and we're just renting the compute network and storage as a commodity.
00:26:10
Speaker
And when you know you're gonna get this best in class database, the thing that like runs Netflix and Apple and all these other companies, and you can just deploy it anywhere you want and pay for it as a commodity, those are the companies that are thinking about doing like a Kate Sandra in their own Kubernetes. Oh, interesting. Okay, like an add on service to add like on top of a managed Kubernetes service. Interesting.
00:26:35
Speaker
That is the trend. Yeah, it's, you know, because we're, I think we should be done inventing new infrastructure. You know, we have plenty of databases, plenty of streaming, plenty of analytics. We're moving into an era with Kubernetes and cloud that we can, it's more about the architecture, like how we assemble the parts, then how, if we need to build like yet a new database, we should get away with that.
00:27:05
Speaker
Okay, I can get on board with that idea. But okay, so we spoke about like how we can get started with Kate Sandra. But the reason like one of the main reasons I had reached out to you was to talk about the latest feature or latest release of Kate Sandra, which covered that multi cloud or multi data center deployment.
K8sandra's Multi-Cloud Support
00:27:24
Speaker
Can you talk about that?
00:27:26
Speaker
Yeah, well that goes into what I was talking about is, you know, when you're in control of your own destiny, sometimes you want to have choices installed in your architecture. And it's very typical to see any company that's on a cloud journey, you know, the quote unquote, cloud journey. Let's start with one step. Download Bodo.
00:27:54
Speaker
The cloud journey is usually like, oh, we're going to do a little bit in the cloud and a little bit on-prem.
00:28:01
Speaker
Well, Kubernetes in that environment, you just got yourself a multi-cluster problem. Kubernetes doesn't span hybrid like that. So you have to start thinking about that. Another thing that's really turning out to be super common is large companies have their footprint in more than one cloud, two or three sometimes.
00:28:25
Speaker
And yeah, there's transit costs. Yeah, there's other things. But there's also this feeling that you're not stuck in one place. And managing that is tough. So for the Cassandra project, one of the things that people didn't like about Kate Sandra is that it held back the superpower of Cassandra because it didn't allow for multi-data center, which is kind of a fundamental thing.
00:28:51
Speaker
So with this release, those latest release 1.2, we have all the mechanisms in there where you can manage Cassandra across multiple Kubernetes clusters in one control plane command. So if you need to expand, contract, do all those things, it will look at the entire Cassandra cluster as one thing, even though it spans multiple data centers and multiple Kubernetes clusters. It's just easier for operators overall.
00:29:21
Speaker
I know I'm like looking at a blog that was published around the site and it's like so simple like I can just have different data centers defined and I can still specify how many Cassandra nodes I want in each data center, what their size and like heap size should be.
00:29:37
Speaker
And everything is just put together for you. So I don't have to worry about even logging into different clusters and deploying those Kubernetes pods or stateful sets or however you want to deploy your Cassandra instance. So like this, again, like this simplifies it so much, it's hard to put in words on a podcast, but yeah, I'm just happy. Well, I'm glad you said that. Thank you for making that point. Because if I made it, it would sound totally derived, but you saying it is so much better.
00:30:03
Speaker
But you know, that's the thing that we're trying to create is I really don't want the next generation of infrastructure people to go through what I went through with infrastructure. We should evolve. But we want to have interesting stories then. Yes, they will. Different stories.
00:30:25
Speaker
that you could do a kubectl command and completely destroy your entire virtual data center in one command. Let's talk about that command.
00:30:43
Speaker
Cool, good to know. And I know that the CAS operator, which is also from Datastax, is now in favor of Kate Sandra, right? And I think I've used that in the past as well. So it's worth calling out because part of me wondered, what's the difference? And then I went and looked and it was just like, go use this thing. So I don't know what the status of that, if you have anything to say about that as well.
00:31:05
Speaker
Well, the key standard project is kind of a collection of operators in a lot of ways, because it's not just Cassandra that installs, it also installs a backup tool called Medusa. And there's another, hang on your hat, tons of Greek mythology in here.
00:31:25
Speaker
there's a, there's another tool called Reaper that does, um, like background repair processes. So it is meant to be kind of like the batteries included thing. Like when you do a helm install, um, then you should get all the stuff that, uh, someone may have to go through a few training classes to learn how to run a Cassandra cluster. So, and each one of those things has an operator. Um, and we have, um,
00:31:56
Speaker
you know, we have this situation where, um, you know, we, we can make it easier, but operators are the ones that are like the robots, right? Um, I love that analogy of the operators, like the robot in your data center that takes the commands and does stuff. Well, when we have, again, when we're creating these multi process architectures, then we need more than one robot. And so having more, more operators is not a bad thing.
00:32:24
Speaker
Awesome. So, okay. Can you talk about like a few customer case studies around like people, how they're using Cassandra or Kate Sandra, sorry, for their deployments, like on their DIY clusters? Yeah. I don't know how many names I can, I mean, it's so funny, you know, the infrastructure is people are like, well, we don't want to give away our secret sauce, but, um, yeah, exactly. Uh, there is,
00:32:52
Speaker
Yeah. If I, if I tell you, then you're going to go out and this is the thing that I love about Netflix. They will totally tell you how they do everything. They got so much swagger, right? Then they can open source everything. They talk about everything like, this is how we built it. Go ahead and try. Yeah. So that's a great analogy of like, yeah, sometimes it's not the tool, it's the people.
00:33:19
Speaker
Um, but yeah, there's a company that, uh, in the UK that does a lot of really cool stuff around Kate Sandra and what they're using it for specifically is, um, managing their on-prem and multi-cloud presence. They, what they do is they, they provide a API control plane for all the developers who do above the line for mobile web, that sort of thing.
00:33:43
Speaker
And they use Cassandra quite a bit for that because Cassandra is just default active active multi cluster or multi data center. But moving to Kate Sandra, you know, it's if you look at like how people adopt have adopted Kubernetes, you know, they've already got all their more their stateful or stateless infrastructure running Kubernetes for years now, microservices, that sort of thing.
00:34:10
Speaker
getting out of the business of having two different deployment mechanisms and two different CICD pipelines and unifying them was a big deal. And now for them, I think what they're seeing is that they can move on to do other things. They figured out that the robot running their Cassandra cluster is doing the job for them.
00:34:34
Speaker
And it wasn't as painful as they thought. I think there was a lot of fear around running staple workloads inside of Kubernetes. And as long as you get to storage right, you're golden. So yeah, I wish we had more names than we could talk about. But yeah, it'll come around. We need a conference. Yes. There may be one coming up soon. I know. Yeah, I heard. Yeah.
00:35:06
Speaker
Nice. Um, so, you know, I think there's, there's always a what's next, right? And there's a what's next in Kubernetes. And there's a what's next for data stacks and Kate Sandra. Um, now I'm curious, you know, where do you see sort of, you know, are we still at the, at the beginning of people adopting, right? As your example just stated, people are still a little fearful putting stateful stuff on Kubernetes. Are we still the beginning there? What's next for, you know, Kate Sandra and then maybe in your view, um, with Kubernetes.
00:35:50
Speaker
That community has done some really good work in trying to understand who's doing this and how many of us are doing it. There's a recent survey that showed that companies that are moving fast and are making a difference with Kubernetes have already adopted data inside of Kubernetes. It's the biggest open secret.
00:36:04
Speaker
Well, another disclosure, I'm very involved in the data on Kubernetes community, so I'm kind of...
00:36:13
Speaker
It's not so taboo anymore, but what's next and what does that mean is there's this problem that I think we're solving slowly or quickly depending on who you talk to. If you talk to the infrastructure people building it, it's too fast. If it's the business people, it's too slow.
00:36:35
Speaker
Yeah. But it's, we have had a really large, a really established large group of data infrastructure software out there that's been proven battle hardened. And it was never designed to work on or in or around Kubernetes. And that's changing rapidly. Now I can speak specifically about Cassandra.
00:37:02
Speaker
Cassandra is just meant to be working on, it works in any kind of environment. It's very commodity base, as we put it, but now we're starting to move Cassandra closer to Kubernetes. At DataStacks, we're looking at specific things like what things do Kubernetes provide that are redundant in Cassandra? For instance, consensus and control plane management.
00:37:34
Speaker
that for free. And treating a database more like a microservice instead of a monolithic beast to be installed.
00:37:46
Speaker
We, I think this is gonna change a lot. Again, this is from our running Cassandra at scale in Astra. We have thousands of clusters in there. You learn a few things. So it's this cloud natifying of the older versions of things. That's really what's next for the industry and definitely what's next for Cassandra and Kate Sandra.
00:38:12
Speaker
Interesting. Yeah, I think, you know, to me, it sounds like it could be a huge benefit in making it even more scalable as it already is, right? Being potentially more lightweight, letting Kubernetes do more things.
00:38:25
Speaker
It's interesting to see how Kubernetes will evolve into where it will actually play in the role of applications. Because you mentioned it before that we're now putting together the pieces. We're focusing more on the business value of the application these days, in many ways, rather than just building and exploring Kubernetes. We've definitely adopted it by now. So it's an interesting insight.
00:38:52
Speaker
I think I know we already covered what's next. But I did want to like, ask on another
Data Protection and Learning Resources
00:38:57
Speaker
question. Maybe that should have been before but I don't data protection. I know we spoke about how Kate Sandra is that Kate Sandra, I eventually learn how to say it. Is that one one big umbrella that has different operators. I was reading through the the one dot two release notes and then it says
00:39:14
Speaker
Medusa might be missing or it might not support this multi data center. Is that something that's coming in the future? How does Kate Sandra handle data protection?
00:39:24
Speaker
Yeah, so data protection in Cassandra is a fascinating and fun topic, because just inherently, the way it's designed is to protect your data. So you can get into this mode where you think, maybe I don't need a backup ever, not in a traditional sense. Like, I used to be at Oracle DBA. And if you didn't do your incrementals and backup your database all the time, you were just asking for data loss.
00:39:53
Speaker
But the funny thing about my experience with Oracle, which is the thing about data protection inside of a Cassandra cluster, is I never had to restore a database because of a hardware failure. Ever. I always had to restore databases because some programmer munged the actual data. You know? Equally important. Yeah.
00:40:19
Speaker
Yeah, that's a real problem, is it's easy to ruin your data with a command.
00:40:32
Speaker
Yeah, humans are right, whether it's an operator, dev, hudman, you know, I think there's many ways to munch what could be a Cassandra data loss issue, whether it's from a Kubernetes administration point of view, or like you said, from a developer or Cassandra operator, it sounds like.
00:40:51
Speaker
Yeah, so the data protection in that realm, let's reset the problem. It's not because hardware failed. It's not because we lost our cluster. It's because somebody did something. Now we can go further down, like for recovery. So backup, you should always be doing snapshots inside of Cassandra. And that's what Medusa does. And Medusa manages that pretty well.
00:41:16
Speaker
where you put your snapshots, a very important topic. Do you put it in slower storage? Do you put it in cheaper, bulkier storage? That's important. And then there's this aspect of what if somebody does delete your entire cluster, having your storage configured so that your PVs don't just drop out and delete themselves.
00:41:42
Speaker
That's important because that's a fast way to get a cluster back online is restoring the nodes and reattaching the PVs. So you can get a cluster running pretty quickly based on that if you lost everything, for instance.
00:42:00
Speaker
Then there's the long-term storage data protection, especially in regulated environments, financial health is storing a snapshot at a certain time and then putting it in the coldest cold storage, cheapest storage you possibly can because it's got to be held for what, seven years or something like that.
00:42:22
Speaker
These are not all Kate Sander topics, but these are ones that anyone managing a Kate Sander cluster should really be thinking about all the time.
00:42:32
Speaker
Good to know. Okay. We'll add it to our list. But with that, like, I think I'm ready for my final question. Like, how do we learn more about Kate Sandra, Cassandra? How do we get started? I know personally, to get ready for this podcast episode, I did go through the YouTube series that you have on data science developers about Intro to Cassandra. But apart from that, what else what other resources are available?
00:42:54
Speaker
Yeah, I mentioned before, but just reiterating, Kate Sander to IO, it should be your first stop. The docs are really awesome. And if they aren't, we always accept PRs. It's a really cool setup. If you don't like it, you click a button on the page, edit it, and it will turn into a PR automatically. Nice.
00:43:13
Speaker
Yeah. But Kate Sander.io is a great place to get started. And as a matter of fact, if you really don't want to do anything, Kate Sander is available in Google and Amazon marketplace as well. So you can just go over there and click a button and it'll do it for you, which is pretty, pretty slick. Yeah, I did.
00:43:35
Speaker
Yeah, we just did a Amazon sponsored this fun thing that we did with Kate Sandra where we built a thousand node cluster. Nice. Using Kate Sandra, which it was cool. A lot of money, but Amazon wanted to show off the marketplace. Like, oh, look, you can just go into marketplace and spin up thousands of dollars worth of infrastructure in seconds. Isn't this great?
00:44:03
Speaker
Create your own Netflix, if you said choose. Yeah, exactly. I don't even think Netflix, well, Netflix has more than a thousand nodes, but not in one cluster.
00:44:16
Speaker
But yeah, if you want to get started, and then we have a Discord, if you want to get in and talk with us, it's pretty active Discord. The links are in katesander.io. We use GitHub discussions for a lot of things as well. We have Discourse out there. Just, you know, join in the community. We would love to hear from you and talk about your use case and if we could help make it better. Anything going on at KubeCon Valencia? Will you be there?
Patrick's Involvement in Upcoming Events
00:44:40
Speaker
I'm not going to be there in person. I'm still doing the virtual thing, but I do have a talk, um, that I'm doing. Um, I'll be at data on Kubernetes day, which is a couple of days before. Um, I would, yeah, it's KubeCon is turning into like a festival, right? There's like other conferences popping up around KubeCon now. Um, I'll be there. Uh, and yeah, so I'll be giving talks at all those places. Yeah.
00:45:09
Speaker
Great. Well, I think we'll definitely include all those links in the show notes for anyone who's listening that wants them. And you can go get started with Kate Sandra, learn how to say it, learn how to deploy it, whatever you want to do. And I will say, Patrick, it's been a pleasure having you on the show today. And I know I learned a lot. And so hopefully those listening did as well.
00:45:32
Speaker
Yeah, well, thanks for having me. This has been fun. And I really hope to hear from some of your listeners on Discord. Just say shout out. Hey, I heard your talk. I want to join in. I would love to have you. Sounds good. All right, thanks a lot.
00:45:48
Speaker
Well, that was a great conversation with Patrick.
Final Reflections and Future of Data Management
00:45:51
Speaker
I have definitely learned new things about Cassandra, Kate Sandra, and all the ways you can deploy it and leverage Cassandra as a NoSQL distributed database for your applications. One day, Bhavan, you will say Kate Sandra correctly. Yes. I will applaud you now that we know how to say it properly because Patrick gave us the
00:46:13
Speaker
the hint. Thank you, Ryan. Why don't you kick us off with the takeaways? Sure. I think the big one for me is the world of managed services and BYOK, bring your own Kubernetes, is going to exist for the future. The comments around
00:46:32
Speaker
Organizations really wanting to still own the stack even though they're putting legos together to certain extent now that a lot of these pieces of the architecture are becoming more mature they can kind of pick and choose which ones they want to build themselves which ones they want to buy but ultimately they still want to have their hands on the infrastructure own their own data.
00:46:51
Speaker
for those that don't want to give away that sort of to these managed services. But there's a world for both of them. And that's what I heard, right? And the BYOK, as I want to coin it, if it's not already coined, BYOK, I like that one.
00:47:08
Speaker
is definitely something we're going to see going forward. And we're only at the beginning, I think, of people building their own company's architecture. So really cool that Astra and both Kate Sandra can live in the sort of the same ecosystem for data stacks.
00:47:22
Speaker
Yeah, and for me, right, just learning that Astra actually, like the managed service that DataStax has, runs on Kubernetes. Like I loved how Patrick laid out the whole journey, right? Like they didn't use to use Kubernetes, they used to run it without Kubernetes and offered that managed service, but then they eventually realized that to run these Cassandra instances at scale, you would really benefit from having an infrastructure like Kubernetes. So that was eye-opening for me.
00:47:51
Speaker
And then, again, I think I repeated this during the episode, but I really wanted to talk about the multi-cloud or multi-data center support that Kate Sandra's new operator has. It allows you to leverage Cassandra's distributed architecture, like how you can have different data centers of Cassandra and
00:48:13
Speaker
You can get immediate consistency or read and write operations by using the defaults, like the replication factor of 3N and just having different rings across different regions. But now, the newest version of Operator, how it has made this easier to set up for anybody who uses Kubernetes, that's really great. People should really check it out. We'll definitely include a link to a demo or a blog in the show notes.
00:48:40
Speaker
But yeah, those were the couple of takeaways that I had from this episode. Yeah, I really like that as well. Just, you know, really, really kind of doubling down on the operator pattern, something that Cassandra's already really good at doing, but have to manually do those things to set it up and having the operator do that really, really exciting stuff. And I like where that's going. So with that, you know, all the show links that we mentioned and Patrick mentioned in here will be posted with this episode.
00:49:08
Speaker
So definitely go take a look at those because Kate Sandra to IO docs, YouTube series, those kind of things. We will also encourage you to go check out our other episodes on Apple podcasts, or if you go to anchor.fm slash Cobra days bites, you can take a look at all our episodes there and even send us a message for how we're doing or what you want to hear or who you want to hear as a guest. That'd be great.
00:49:30
Speaker
And please leave a review if you so choose. So next week, or sorry, I always do that two weeks from now, is we're gonna be talking about MySQL, a really exciting episode. So we're kind of sticking with this database theme as we're going forward for the short term. And that's the end of the episode. So, you know, I'm Ryan. I'm Bobbin. And thanks for joining another episode of Kubernetes Bites.
00:49:59
Speaker
Thank you for listening to the Kubernetes Bites Podcast.