Oops!Zencastr was unable to start because Javascript is disabled
To fix this problem, check your browser's settings and enable Javascript

Become a Creator today!Start creating today - Share your story with the world!

00:00:00

00:00:01

Kubernetes Backup and Restore 101

S2 E10 · Kubernetes Bytes

426 Plays3 years ago

In this episode, Ryan and Bhavin go back to school and talk about Backup and Restore 101 when it comes to Kubernetes and containerized applications. We discuss why Backup is important, why you need a new approach for Kubernetes, and how do you plan for things like Kubernetes operators and custom resources. We also talk about how backup tools can help you protect from ransomware attacks and look at managed backup solutions to help alleviate administrative tasks when it comes to building and maintaining a data protection solution for Kubernetes!

Show Links:

Kubernetes v1.24 is GA - https://kubernetes.io/blog/2022/05/03/kubernetes-1-24-release-announcement/

Storage Capacity API - https://kubernetes.io/blog/2022/05/06/storage-capacity-ga/

https://github.com/kubernetes/autoscaler/issues/4517

Volume Expansion support - https://kubernetes.io/blog/2022/05/05/volume-expansion-ga/

Cloud-Native PG - http://cloudnative-pg.io/

Quick primer for Kubecon - https://blog.alexellis.io/when-in-spain-a-quick-primer-for-kubecon/

Recommended

Diving Into Kubernetes: The Developer’s First Steps with New Relic image

Diving Into Kubernetes: The Developer’s First Steps with New Relic

S5 E2 · Kubernetes Bytes

00:52:20·4 months ago

Database as a service with Percona Everest image

Database as a service with Percona Everest

S5 E1 · Kubernetes Bytes

01:02:44·5 months ago

KubeCon NA 2024 News Recap image

KubeCon NA 2024 News Recap

S4 E23 · Kubernetes Bytes

00:58:24·8 months ago

Increasing AI adoption using Kubernetes image

Increasing AI adoption using Kubernetes

S4 E22 · Kubernetes Bytes

00:52:03·8 months ago

Monolith to Microservices using Kubernetes at Guidewire image

Monolith to Microservices using Kubernetes at Guidewire

Kubernetes Bytes

01:06:28·9 months ago

Inference in Action: Scaling Al Smarter with Inferless image

Inference in Action: Scaling Al Smarter with Inferless

S4 E20 · Kubernetes Bytes

00:55:17·10 months ago

Container security with Wiz image

Container security with Wiz

S4 E19 · Kubernetes Bytes

01:02:33·10 months ago

Dagger.io Deep Dive with Co-Founder Sam Alba image

Dagger.io Deep Dive with Co-Founder Sam Alba

S4 E18 · Kubernetes Bytes

01:06:24·11 months ago

Running Ray on Kubernetes with KubeRay image

Running Ray on Kubernetes with KubeRay

S4 E17 · Kubernetes Bytes

00:53:06·11 months ago

Building scalable data platforms using Data on EKS image

Building scalable data platforms using Data on EKS

S4 E16 · Kubernetes Bytes

01:02:20·1 year ago

Deploy and fine-tune LLM models on Kubernetes using KAITO image

Deploy and fine-tune LLM models on Kubernetes using KAITO

S4 E15 · Kubernetes Bytes

00:44:17·1 year ago

The business case for cloud-native and Kubernetes image

The business case for cloud-native and Kubernetes

S4 E14 · Kubernetes Bytes

00:54:24·1 year ago

Building the AI Hyperscaler with Kubernetes image

Building the AI Hyperscaler with Kubernetes

S4 E13 · Kubernetes Bytes

00:54:56·1 year ago

Shifting Minds: Exploring OpenShift's AI Landscape image

Shifting Minds: Exploring OpenShift's AI Landscape

S4 E12 · Kubernetes Bytes

01:05:07·1 year ago

Training Machine Learning (ML) models on Kubernetes image

Training Machine Learning (ML) models on Kubernetes

S4 E11 · Kubernetes Bytes

00:55:29·1 year ago

The evolution of service mesh technologies image

The evolution of service mesh technologies

S4 E10 · Kubernetes Bytes

01:08:00·1 year ago

What are Vector Databases image

What are Vector Databases

S4 E9 · Kubernetes Bytes

01:03:06·1 year ago

KubeCon EU Paris News Recap image

KubeCon EU Paris News Recap

S4 E8 · Kubernetes Bytes

00:47:39·1 year ago

Open Policy Agent (OPA) 101 image

Open Policy Agent (OPA) 101

S4 E7 · Kubernetes Bytes

01:07:20·1 year ago

Ops Ops Hooray! Navigating IDPs from an Ops perspective image

Ops Ops Hooray! Navigating IDPs from an Ops perspective

S4 E6 · Kubernetes Bytes

00:58:17·1 year ago

Transcript

Introduction and Setup

00:00:03

Speaker

You are listening to Kubernetes Bites, a podcast bringing you the latest from the world of cloud native data management. My name is Ryan Walner and I'm joined by Bob and Shaw coming to you from Boston, Massachusetts. We'll be sharing our thoughts on recent cloud native news and talking to industry experts about their experiences and challenges managing the wealth of data in today's cloud native ecosystem.

00:00:30

Speaker

Good morning, good afternoon, and good evening wherever you are. We're coming to you from Boston, Massachusetts. Today is May 13th, 2022. I hope everyone is doing well and staying safe. Let's dive into

Hiking Adventures

00:00:44

Speaker

it. Speaking of doing well, Bobbin, how you doing?

00:00:48

Speaker

I've seen better days. As you can see, I have a bit of cold going on. My voice isn't the best, so I'm pretty sure Ryan is going to do most of the talking in this episode. But I think there's a reason for not feeling well. Last week, I had just too much fun at Bryce Canyon National Park and Gabriel Reef National Park.

00:01:07

Speaker

I was telling you on Slack, maybe we hike like 20 miles, close to 20 miles in like three days. And one thing I realized is I like hikes that go up first and then on the way, like when you're tired, you have to come down. Bryce Canyon, not that way, right? It's a canyon, so you hike down, you have fun going down and then you really hate yourself and you have to hike, like in an eight mile hike, last one and a half mile is like,

00:01:35

Speaker

1500 feet of elevation. I completely agree with you. I've done those types of hikes, even multi peak hikes, if you're doing mountains, like you do two peaks in a row, but you got to go back and basically, unless there's a route around. I spent some time in the Grand Canyon and obviously the same thing, but

00:01:55

Speaker

the, uh, guide that we used, um, we got a bus ride out, which was awesome. So you did all that. You did like five days in the bottom camping and rafting, and then they bus you out. It was the best thing ever. You're like, all right, I'm exhausted after these five days. Yeah. Next time get a bus out, I guess. Thank you. It was good. You know, um, we spent a lot of the time sort of outside hanging around.

00:02:26

Speaker

Um, still doing a lot of yard work. So, you know, boring stuff, I guess, no bright, no, uh, natural park or anything like that.

00:02:35

Speaker

I need to catch up on my yard work. A week before we left, I think my wife and I went to Home Depot, bought weed killers and new rakes and stuff like that. We just did some raking and then just bought some bored and tired weed. We haven't done the weed killer stuff or planted new seeds for the grass. It's just like, okay, things to do. Things we'll eventually get to. The joys of new home ownership.

00:02:59

Speaker

Hey, it's a blessing that occurs, right? All right.

Kubernetes 1.24 Release Highlights

00:03:03

Speaker

Well, today's topic is how backup and restore work sort of at a one-on-one level for Kubernetes. But before we dive into that and sort of dig into it, I'll be Bobbin and I talking about the topic today. No guests today. We'll dig into sort of various aspects of it. But we do have a bit of news. Bobbin, why don't you kick it off?

00:03:25

Speaker

Yeah, I think one big thing was Kubernetes 1.24 getting GA. I know the release was pushed back by a couple of weeks because of a bug in Golang. But the release is officially out, I think May 3rd, maybe 10 days back. And it has a whole lot of new features, right?

00:03:42

Speaker

One of the important ones is the Docker shim being removed. And I know we covered that in some detail in one of our previous pods. So again, just a recap, Docker shim is no longer valid when you're running Kubernetes 1.24. Other features.

00:03:57

Speaker

beta APIs that were available that you can use in Kubernetes till 1.23 are off by default. So again, I think people were getting the image that, oh, even if it's a beta API, it's still supported. I can run my production environment. That's not the case. Beta APIs are a name just for that reason. They are off by default. You can still turn them on, but again, just something to keep an eye out for.

00:04:23

Speaker

And I know you have a couple of storage ones. I'll skip those. But then I saw the entry plugins have the migration has begun. I think in this release, they tackled the Azure disk and OpenStack Cinder driver. They are now officially migrated out of tree and into their CSI plugins. You'll see more and more of these entry plugins being migrated out in the future releases.

00:04:45

Speaker

Another interesting feature when it comes to networking in Kubernetes, I didn't know this was an issue, but apparently if you are using static IPs in the same range that you assign Kubernetes to use for service-to-service communication,

00:05:01

Speaker

Now there is a feature that you can turn on and you can allocate certain IPs or reserve certain IPs in that range that you can manually assign. So there's no IP collision inside your service site. So something that you can start using if you upgrade to Kubernetes 1.24. But yeah, that's like a quick recap for 1.24. And before we move on to the other things that I had, I do want to cover your storage enhancements.

00:05:27

Speaker

Yeah, so as part of those releases, or just release, sorry, 124, the big one is volume expansion is now a stable feature. So if you probably have been working with persistence in Kubernetes, you might have already used this feature and didn't know it wasn't stable, because for the most part, it works pretty well, right? And that's, I think, the

00:05:49

Speaker

The main point here is that it's visually generally available. And this is when it first was released, I believe, alpha in 1.8. So it's been a long time. And beta in 1.11. And now we're here at 1.24 in this GA. So this gives you the ability to expand the size of the volume, PVC specifically.

00:06:12

Speaker

on the fly through the YAML and other automated fashions. So it works really well. Lots of driver support. We'll put the link in the show notes where you can look up how online expansion uses versus offline expansions, et cetera. And the other one is

00:06:34

Speaker

really about storage capacity. And storage capacity is also GA in 1.24. So this is really something that I think we'd have never talked about on the show. And it's something that's definitely going to be more and more interesting as we migrate to use CSI solely, right? This gives the ability to CSI to kind of publish the available storage. Now Kubernetes itself as a scheduler is aware of things like memory,

00:07:03

Speaker

CPU and is able to schedule pods based on sort of known capacity, right? So intelligent scheduling to say I can't just

00:07:12

Speaker

try to put this pod on this node that I know is full, it's not going to run. This brings storage into the mix to say, OK, there's enough capacity as far as networking compute go. Now, how do I know that when I schedule this pod here that there's going to be available storage? Now, whether this is sort of available is based on the CSI driver capability. But the point being is that they can say,

00:07:41

Speaker

we will have no problem provisioning this volume for this container. Now, there are things that aren't perfect about this solution. We'll put the link in the show notes as well. But there's a couple of problems they line out that they haven't quite solved yet. Things like what if a single container has more than one volume? It's not as good at detecting and reconciling the capacity there if, say, one would have no problem

00:08:11

Speaker

being provisioned, but the other wouldn't. And then there's sort of the ability of integrations like auto scaling into how this works. And so there's a connection between sort of this capacity integration and the auto scaler, which does have some limitations as far as tying into CSI. But there is actually the auto scaler has

00:08:39

Speaker

the ability for you to kind of set a feature fly to get that thing working too. But anyway, really cool stuff I think coming out of the 1.24 release and these are definitely two that stood out to us in the storage space for sure.

00:08:54

Speaker

And then just to follow up, more around the storage and data management ecosystem in Kubernetes, EDB actually opensource their Postgres

Open-Source Postgres Operator

00:09:03

Speaker

operator. So now there is a new operator called Cloud Native PG that's available for anyone to use. I think it has the Apache license associated with it so you can contribute and start using it.

00:09:13

Speaker

One of the benefits, again, since EDB has been working on this and has customers that were already using it, the version that's generally available in open source right now is not 1.0. You get that benefit of customers using this for a year and a half already in production environments.

00:09:31

Speaker

So now, even though it's an open source project, it actually is back. It came from a vendor who used to work on this. So I think it's 1.15 or something like that. So they definitely have a head start on and it's not something that just somebody put together in a couple of months. This has customers that have been running it for a year and a half. And I think Gabriel, one of our previous guests on the podcast who spoke about Postgres,

00:09:53

Speaker

worked on this and I think he reached out and said anybody who's going to KubeCon and wants to learn more about this he'll be definitely happy to and they also have like a meet and greet that EDB has organized so if you are going to KubeCon next week do check that out and I think based on that I see from show notes you have something really interesting for people headed to KubeCon.

00:10:16

Speaker

Yeah, yeah. Before I dive into that, I do want to say that, you know, that operator has a lot of great stuff. It has, you know, advanced sort of architectures for disaster recovery or standby clusters or even PG bouncer. And, you know, I think we had a conversation about operators in general recently.

00:10:33

Speaker

This is one of those ones that I would gravitate towards first if I was looking at Postgres just because of its use and sort of how much is built into it. It's a really good example of, I think, what operators can really provide as far as value goes. So really cool stuff. I'm going to definitely look at it myself.

00:10:50

Speaker

Um, yes, this, uh, next and last, uh, piece of news, I guess you could call it news. Uh, Alex Ellis put out a really cool blog called, um, a quick, uh, when in Spain, a quick primer for KubeCon. If you are going to KubeCon in Spain, I think many might be wondering, do you have to know Spanish? Well, he lays it out and says, you know, there's obviously a lot of people that learn English in that country.

00:11:16

Speaker

But and you may also have Google Transly on your phone, but this whole blog is really about some basics of Spanish of

00:11:25

Speaker

introducing yourself to people, how to order a coffee, how to order a food or eggs in particular, how to say what day it is, how to ask for directions, how to speak socially, like where are you from and what you like doing. So I think it's just a really cool blog to dig in. For those who may be wondering about this stuff,

00:11:51

Speaker

and are going to Spain, maybe for the first time, maybe have no Spanish background at all, and have a little bit of wary. I think, go ahead and read this. I think you'll find it super valuable. Maybe you can even print it out, even your podcast. I know. I'm going to take this article. I have a friend who's traveling to Spain, not for work, just for fun, in July. I'm going to send this article to him, like, okay, to make sure you at least know these phrases.

00:12:16

Speaker

There you go. That's super useful. So I had to put that in there. Yeah. All right. So that is the end of the news. And we can dive into today's

Kubernetes Backup and Restore

00:12:26

Speaker

topic. So again, as a reminder, today's topic is how backup and restore works one on one. And we are going to be focusing on backup and restore from a Kubernetes perspective. And when we say that, really, it's how backup and restore solutions or projects work in the Kubernetes space.

00:12:46

Speaker

Yeah, so why don't you get us started? Like, what is backup and restore? And like, from one to one? Yeah, that's a good question, right? So what is backup and restore? You know, so there is the need to take copies of data often, right? And when we're talking about an application that has state, and often we're talking about databases, but not always, there's a lot of applications out there that do

00:13:13

Speaker

provide some level of state and save it down to the disk on the particular node. So a reminder, Kubernetes runs on a set of nodes, and your pods or containers run on those nodes. And they may or may not provide a way to write state to disk.

00:13:31

Speaker

This is where PVC is coming to play where CSI comes into play, we won't recap all that but the point is, if they're using state that state typically will be only written to that volume now many storage systems have some sort of

00:13:48

Speaker

know, data replication, so it might be replicated on that back end. But the point being is that it's written to that one place, it's probably done some sort of replication, but there's no real method to move that thing out of that production environment, right? And the key here is that you want to make a copy at a certain

00:14:09

Speaker

point of that data. Maybe you have an infrequently accessed system or a system that's used a lot more during the day than it is during the night, and you want to take backups at a certain time. Well, you want to copy data throughout the day or possibly at certain times and move it somewhere that maybe isn't affected by certain failures, right?

00:14:31

Speaker

And failures can mean a lot of things, right? It can mean physical failures. It can mean physical disaster. It can mean, you know, bad actors, hacking. We'll talk about that a little bit later. And the point being is that you want to move data

00:14:47

Speaker

Off to a maybe cheaper type of storage and have the ability to bring that data back in, in case of that disaster or move it around, right? So really at its core, I think I would describe backup restore is a method for copying and moving data. When you really boil it down to its core facet is you are just making a copy of certain data.

00:15:11

Speaker

and moving it out. Now, backup in Kubernetes is generally different. We've talked a little bit about this on different podcasts, right? In the sense that for backup and restore for VMs and where we've come to for Kubernetes, I think there's a lot that backup systems have had to adopt and change in order to understand Kubernetes. Mostly the fact that applications are no longer contained in a single

00:15:40

Speaker

VM or node. And they have to understand how to capture all the metadata associated within Kubernetes cluster, as well as the physical data and move that thing around, right? Not that that's a requirement, you can definitely get away with just making copies of volumes and, you know, managing a whole slew of processes to manage the metadata to bring it back to life. But

00:16:06

Speaker

You know, that's something to be said there. Yep. And like, clearly, backup and restore when it comes to Kubernetes is definitely different. But based on your earlier point, I wanted to highlight that some things don't change, like even with virtualization and with people that were running and managing and backing up virtual machines.

00:16:25

Speaker

everybody knows that snapshots are not backups if you're just storing something local and maybe you have a snapshot of the virtual disk that's not enough you need a backup to restore from a failure like if you lose your vms snapshot won't be important so even if you have a snapshot of your persistent volume stored locally on your communities cluster what if you lose your cluster or

00:16:45

Speaker

What if you lose your entire namespace and all you have is just a snapshot of the persistent volume? As Ryan said, you need that entire state. You need the Kubernetes objects and the deployments, the objects that you had. You need all the application data, all the configuration, and the persistent volume. It has to be a whole group of resources rather than just a single entity.

00:17:07

Speaker

Yeah, it's a good distinction, right? I mean, if we take another step, a snapshot is a backup when it's moved off its primary location, pretty much, right? Not to say that's a remote location, it can be moved to a local backup, or you could definitely have a local backup, it's just not where it originated.

00:17:24

Speaker

Um, but, uh, you know, there's a whole bunch of, we, I think we've talked about the golden, uh, backup rule before where you sort of have a local and then you do one remote. Yeah, exactly. Um, so there's, there's many sort of aspects to why you'd want to use backup and restore, um, you know, from just recovering from disaster, as we talked about already to, you know, just compliance reasons, you know, you can do it around for a long time.

00:17:49

Speaker

Yeah, I think one of the things that I use in my presentation is that, again, this might be an older piece of information, but when GDPR came out, they specifically had a condition in it that to be compliant with those regulations, you need to have regular backups and you need to prove that to be able to show that you are actually compliant with those regulations. So even if you don't care about your data, I think to just comply with regulations, you need backup.

00:18:16

Speaker

Yeah, if someone tells you you have to have it. There you go. Good reason. Yeah, I think, you know, I think this next item we had here, which was, you know, why do we really think differently about backup and restoring Kubernetes that sort of ties into the state in which we have to take a backup, I think, you know, technology itself,

00:18:40

Speaker

has, if we, if we kind of focus on technology first, the technology itself has changed, right? Meaning that you can't, you can't necessarily just take what you've used for backup in the past and just apply it to Kubernetes. I mean, you could, I mean, there's definitely, you know, rolled solutions with things like our sync that you could get away with, you know, moving data from here to there. They may not be the most efficient or, or a solution that's built for, you know, a backend,

00:19:06

Speaker

you know, data system which targets, you know, you know, lungs and volumes and move those moves those around or

00:19:13

Speaker

You definitely can't take something built for something specific like KVM or vSphere and apply it to Kubernetes because there's a whole new set of APIs, a whole new set of how applications work, a whole new set of architectures of how applications are deployed. That's the core reason why we say we have to think about it differently as we think about our backup restore and how to do it properly.

00:19:38

Speaker

If you think about the journey, I think, again, this is something that most people might know, but I just like to reiterate. When we moved from just bare metal or physical machines to virtualization, we had to modernize the toolset around that ecosystem. You couldn't just back up your underlying servers and assume that all the virtual machines on top of it, on top of your vSphere host or KVM host were protected.

00:20:01

Speaker

It's the same set of things. You chose a solution that spoke to that vCenter APIs and enumerated all the virtual machines that were running on those hosts and protected those. The same approach, the same shift in mindset is required when we are talking about Kubernetes.

00:20:17

Speaker

no longer is a tool that can just talk to your vCenter API, for example, can tell you what pods you have running on your Kubernetes worker node. You need a tool that can talk to the Kubernetes API server, help you list all the different namespaces, all the different Kubernetes objects and volumes and everything that you have running on top of your cluster and help you protect that.

00:20:39

Speaker

So again, I think, not me specifically, I didn't go through the bare metal to virtualization transition. I directly started in virtualization, but I think we have people in the industry who have gone through the first transition and know that, okay, understand that there are differences needed. There are differences and they need to look at a modern solution that can work with Kubernetes as well. But just to draw on, to make that point obvious, like there is a change. Absolutely. Yep. I agree completely. So.

00:21:10

Speaker

The next question we have is, what types of backups are there? Now, I think this question is really directed at the types of applications and ways in which we back up those applications. I think we start with, what applications are we running? In this podcast, we talk a lot about Staples, so there's obvious ones like databases.

00:21:33

Speaker

Then there's stateless web servers, business logic in between that may not be running steep, it has some sessions, those kind of things. And backups, you don't necessarily need to backup all of it, but you may want to take a snapshot of what that entire connection of pods and containers looks like when you're working with a backups scenario.

00:22:01

Speaker

I will say I think the organization and how it uses Kubernetes will probably change how you look at this problem, meaning that if you're a smaller organization, you don't have a lot of Kubernetes clusters, maybe you're actually only working with one or two database types or staple services. It may make sense to just use a backup tool that's built for that specific application. I think we had a great conversation the other day about

00:22:30

Speaker

MySQL or PostgreSQL in the past of each one of those solutions has its own backup tool or tools. In a lot of cases, they work great and they were built for the thing, so why not use them? Well, I think it's really a matter of scale. Kubernetes is built in a way in which it enables you to run many thousands of applications across a organization if you wanted to. If you're at that level of scale,

00:22:59

Speaker

Many teams might be using a slew of different data services and for a DevOps team or infrastructure team to offer a backup service to their internal customer, they may need something more generic versus app specific. Now, I will say that it doesn't mean you shouldn't consider

00:23:20

Speaker

Anything that has to do with the application specific nature of backups right i think there's a lot of solutions out there that will provide you with a generic solution which takes sort of a snapshot copy of data that's in run time.

00:23:40

Speaker

make sure that's a consistent and crash consistent snapshot, take that thing and offload it to a remote backup target. But a lot of those solutions do also consider what application you're using, because I think they know this about the individual teams that are running these applications.

00:24:00

Speaker

they may want to trigger something application specific. We've seen this in a number of different products and solutions out there where you can take a backup snapshot, but before you do that, run some ad hoc commands. If you're running a Cassandra cluster, you can flush data to disk using the Cassandra CLI. Or likewise, the Postgres or MySQL CLI, you can run specific commands to freeze data, flush it to disk,

00:24:28

Speaker

offload OPLOG and MongoDB to a certain file, and then snapshot things and offload it, right? So you can definitely take advantage of the application-specific needs, meaning you're not just saying to your developers, oh, we have this one solution, you have to use it this way, and you can't now use anything, any specific tool for your data service or staple service that you've used. I think this is the industry being aware of those needs.

00:24:57

Speaker

Yeah. And so like next question I had for you was, I know we have had discussions around operators and custom resources. How do we back that up? Like how do applications that rely on these custom resources can be successfully backed up?

00:25:13

Speaker

Yeah, so that's definitely something I would put in the category of what you need to be aware of. Backup solutions for Kubernetes, whatever way you look at it, are new. There's still a lot of change going on. They're imperfect in a good way. They're designed for the cutting edge.

00:25:35

Speaker

And it's just so happens to be that cutting edge happens to be kubernetes and it comes out with a new version all the time right so there's a lot of change going on that's a good thing but the point is here that you know things like operators and crds provide you with flexibility at the sort of control level.

00:25:54

Speaker

being able to define an application based on a CRD that you have in your cluster. It's not necessarily going to mean that if you back up the thing that was deployed from that CRD or that operator, that it's just going to run perfectly in a destination or a restore point. And the reason being is because those operators and CRDs act as sort of a control

00:26:24

Speaker

loop, or I guess big brother. I don't, you know, but they have to be aware of everything that you are doing with those, right? So the point being is that if you have an operator that deploys databases from it, then you back up that only the database in a say namespace and then restore it. If you don't have the necessary CRDs or operator in that new cluster,

00:26:50

Speaker

Kubernetes won't know what to do with it. Kubernetes won't know what to do with it, right? Because it is that custom resource. So you have to have the custom resource available. And now, not all backup solutions will necessarily be aware of, hey, this CRD doesn't exist over there. So should I back it up? Should I not back it up? You know, operators, I would say, is something that falls into the category of you shouldn't back up and operate. I just did that. Maybe that's more of my opinion, but it is its own sort of

00:27:19

Speaker

control loop that does its own backups and in many cases in assume certain things. So there are solutions out there where you can say, you know, this application is part of a customer resource and this customer resource is XYZ.

00:27:34

Speaker

So please be aware of that. If you are backing it up, maybe take that thing as well and make sure to apply it in the destination cluster. But I'm of the, I guess, opinion that the CRDs, the operators and everything is probably something you should consider as sort of your DevOps model of how you're deploying and getting that infrastructure ready, meaning that

00:28:00

Speaker

You know there's always some part of a kubernetes cluster that needs to be initialized so even if you're storing you now have to have a kubernetes cluster that's ready for that stuff to be restored to now. Does backup should backup restore everything needed for an application run i don't think so right there's there's definitely some.

00:28:22

Speaker

administrative sort of... Environment-specific things and operator needs, right? Yeah, that should be deployed back into an environment before you run your restore. So this is more around procedure. I don't think, my point being here is that, you know, Kubernetes backup isn't

00:28:41

Speaker

you know a one stop shop you still have to be aware of everything going on and understand how applications run to make sure that your stores work and i know you know i worked outside of the vendor side for quite some but i learned a hard lesson of. Hey having backup restore is not good enough you have to practice them.

00:29:00

Speaker

Make sure your stores work because, hey, in a disaster, are they going to work? You better know the answer to that, right? Saying, hey, you know, we're running the backups so we, you know, we can restore from, we can restore, don't worry about it. It's not good enough, right? You have to actually do it and kind of run through those, you know, same reason we do fire drills, right? I think in my opinion is you've got to do this with Kubernetes as well.

00:29:25

Speaker

Yeah, makes sense. And the reason we spend a few minutes on custom resources and operators is because it's the next thing, right? The Kubernetes ecosystem, again, has matured enough where people understand why you need a different solution for Kubernetes.

00:29:42

Speaker

Operators are a bit different. Not all tools will support it. And you definitely need to test, as Ryan said, test and make sure that the solution, whatever you're using, can help you restore your applications. And if they don't, 200%, maybe change them or make notes. Like, okay, in my case, I'll have to go ahead and before I perform that restore operation, I do need to install that operator. So all of that will help you make sure that

00:30:08

Speaker

When a disaster strikes, you're not in the room panicking when there is no fault. I don't know, postmortem happening and you don't, you're the person who doesn't feel guilty in the room. Exactly. Yeah. It doesn't just stop at CRDs and operators, right? You know, secrets. Sometimes you don't want these things in backups or maybe they need to be different in a new environment. I know one that bit me early in sort of the days working with these technologies is

00:30:35

Speaker

namespaces and networking and how service discovery works. So a lot of the times you can sort of build into your applications how it communicates with another service. Now that communication may assume certain things like the name of the namespace in the DNS record that it's contacting, meaning that if you know you have a Cassandra instance in

00:30:58

Speaker

um, you know, namespace one, two, three, it might be, you know, Cassandra one dot namespace one, two, three. Now there's a much, there's better ways to do this with things like, you know, service meshes and whatnot, or so you can just contact them through the, just the generic name. Um, but if you're going across namespaces, I remember this was one of those things where, okay, you, you restore to a new environment. Maybe you don't have the same, or you don't want to restore to that same original namespace to test it.

00:31:25

Speaker

and things break and you're like, well, my restore seems good, but it's broken. Well, that's because the YAML itself is just basically being backed up, not transformed and restored. Now there are, there are solutions on both ends of that spectrum in the sense that you can be aware of these things and design your application

00:31:44

Speaker

in a way that's more forgiving to when it moves around from cluster to cluster or cloud to cloud, or there are solutions that backup and restore does have transformations where you can say, I've taken this backup, now I want to apply a transformation on restore because I know of something that needs to change. Not necessarily saying you're changing things as it's sitting,

00:32:08

Speaker

in long-term storage because there's a lot of ramifications there as well. But yeah, there's a couple of different ways you can tackle that. But all the more reason to test the solution.

00:32:20

Speaker

And again, like to your example, right? I don't manage a production environment. I just do demos for my day job. But I think I remember using one of your demo applications and we had hard coded the value of the backend database and the namespace had to be demo.

00:32:40

Speaker

I spent so much time just trying to troubleshoot, like, okay, why is this application not working? This was working on a different cluster before. Then I realized, okay, then the URL or the endpoint doesn't matter. Yes. Yeah, we take shortcuts in demo ware for sure. But yeah, that lesson learned there, take a little more time to design it so you know what's going to break when you restore it. True. Yeah, ran into that one a lot.

00:33:10

Speaker

Okay, so I think all of this might sound complicated, but we see a lot of overhead from a day zero and day two perspective, like, okay, if I have a really big enterprise with tens or hundreds of communities clusters, how do I protect them?

Managed Backup Services

00:33:25

Speaker

Do I need a backup solution for each cluster? And then managing the lifecycle of the backup solution itself.

00:33:31

Speaker

So we have seen a trend in the industry where multiple vendors are releasing an as-a-service solution where you can connect your clusters to a centralized endpoint and just start those backup jobs and perform those restores. So like Ryan, what are your thoughts? Apart from the savings or the time savings that you might have in installing and configuring such tools, what other things can you think of when it comes to these managed backup solutions?

00:34:00

Speaker

Yeah, I think the managed solutions are really nice, and I would actually argue that there's sort of different types of managed solutions, right? We talked about operators earlier, and there's managed operators, so to speak, right? There's certain database vendors that sort of deploy their own operator software in the cloud and let you use it.

00:34:22

Speaker

Or you could look at an operator as sort of a managed software itself because a lot of operators provide a backup solution and tooling that you don't really have to care about how it works. You just use the thing, right? You say, okay, deploy my database and yeah, take backups, right? Now, this might be running on your infrastructure, so it's not true managed backup. And I would say,

00:34:47

Speaker

for the truly managed service, you know, sort of the SaaS model. The benefit is really that Kubernetes is really abstracting the way in which we can run applications anyway, right? You know, Kubernetes can be deployed in all major clouds and as well as on-prem and having to worry about, you know, how your backup solution

00:35:12

Speaker

from a control standpoint and just security sort of role based access control how users get access to that, how they connect their clusters, you know, is the right networking setup so you know they can use both their cloud instance Kubernetes and their on prem one.

00:35:28

Speaker

I think that's where a lot of the benefit comes in, right, in the sense that these SaaS services sort of enable you to use Kubernetes in this way across, you know, the multi-cloud sort of architecture. And I sort of see that as a feature model, right, and also a reason why

00:35:47

Speaker

folks are using managed solutions in general. Now, I think there's managed solutions which apply across cloud, and then there's managed solutions per cloud, right? So you have managed databases in AWS. And those are great if you're running things in AWS and you want to connect to something. But you don't necessarily use a third party. You could. And I think that's a big benefit when it comes to things like backup or just

00:36:17

Speaker

I think managed services in general, when you think about SaaS, just the way in which we're moving from an ecosystem into multi-cloud, which really enables some pretty interesting use cases. Even when you think about more advanced architectures, when you think about compute at the edge and Kubernetes at the edge, and how do you do backup in those situations, well, managed solutions, I think would really help you.

00:36:45

Speaker

Okay, so like last question, how do we protect against any bad actors?

Protecting Against Ransomware

00:36:50

Speaker

Or I know ransomware has been a huge buzzword and like concern for organizations and enterprises for the past year, I guess. Does that apply for Kubernetes? And how do we protect against those?

00:37:02

Speaker

Yeah, you know, ransomware has definitely been super buzz wordy, but notably for a good reason, right? We've seen a lot of ransomware attacks. It's just malware. But yeah, ransomware being, you know, they take your data, they say, I won't give it back until you give me money and

00:37:19

Speaker

Sometimes you give them money and they don't give it back, and there you go. There's a good taste of money, right? A terrible way to even lose your data. But I think in general, the right backup and restore solution, whether that's Kubernetes or not, in this case, we're talking about Kubernetes, needs to enable you to protect yourselves from attacks like this, right?

00:37:41

Speaker

an internal bad actor who is clearly messing with an environment and you need to shut that thing down and bring it back up in a fresh environment without certain access or a full blown ransomware attack from an external bad actor.

00:37:56

Speaker

that you might wanna make sure that you have data that hasn't been affected, right? Ransomware is notorious for getting in, spreading and touching all bits of data throughout the stack and really kind of holding it hostage. So backup solutions with things like object lock in object storage for targets really enable this sort of use case where even if a ransomware or malware

00:38:26

Speaker

Code gets in there and starts manipulating data. You're aware that your backups that are object locked basically in a certain either governance or compliance mode.

00:38:39

Speaker

can't be touched or altered. So you know that data is going to be good because even yourself who took the backup can't manipulate it. And that's really enabling us to, say, protect ourselves in a way that lets us move faster from attacks like these to say, yes, we've seen that we're under a ransomware attack, but we can kind of shut things down and bring data back in an intelligent way that we also know

00:39:09

Speaker

shouldn't have been affected by these attacks and gives us the ability to restore over and over again in case we get it wrong. So those are incredibly important. And I think backups are a huge part of this. We talk about compliance law, we talk about disaster a lot, natural disaster, but more and more we're seeing reasons around security and hacking events.

00:39:34

Speaker

Yep. And in this case, like prevention is better than cure, right? Like you need to make sure that you have those backup stored in something that has like an object lock where it's right once read many, and you can't even mess it up if you want to. Rather than having like a ransomware insurance policy, I think I've seen like TV advertisements or YouTube ads, not TV, but yeah, YouTube ads definitely for there are companies around who will offer you insurance if you get attacked by ransomware. And I think even

00:40:01

Speaker

the people that are attacking your organization, like those hackers, do know that you have policy. So they'll obviously ask you for more money. Interesting. Yeah, to them, it's great. Yeah, people that will pay regardless. It's really of whether you get your data back once you have made that payment, right? That's not always a guarantee. And then how useful is your insurance then, right? If your data is still gone,

00:40:28

Speaker

You just have angry customers. You may go under as a company in general. Yeah, you might become one of those statistics line items like, okay, X number of enterprises never recover from ransomware attack. At the end of the day, we're all just trying not to be a statistic, right? Let's use backup and restore it not to be one. Good point. Awesome, man. I think that's a perfect end to the episode. Let's do a summary. I don't want to talk about anything else now.

00:40:56

Speaker

All right, let's jump into takeaways. So I'll go first in terms of takeaways. I think the big ones for me are backup restore is essential, is the first one. You have to think about it from the perspective of Kubernetes, meaning don't try to just apply what you've been already using for backup restore.

00:41:18

Speaker

look to see if that vendor, that tool you're using for backroom restore has something that specifically looks at Kubernetes. I'll even say that application-specific tools only get you part of the way there. They back up the data from the application itself, but not from a Kubernetes perspective. So even if you use something like MySQL dump and you get all your data, you still have a whole bunch of things to restore that to a Kubernetes cluster in general. There's a lot of YAML metadata, secrets, all that stuff we talked about earlier.

00:41:47

Speaker

look into that Kubernetes specific sort of way of doing things. And then I think the last one for me is definitely the conversation around being aware of all the moving pieces, all the things that maybe backup solutions may not work perfectly with like CRDs or you might have to make sure in your restore environment. And again, test your restores, test your backups. It's not good enough just to put them on a schedule and know that they're showing up in your target, go through the scenario.

00:42:17

Speaker

Yeah, I think that, that was my one key take. I will like test, test, test, like even trust, but verify, right? Like, okay, you took that backup, but at least make sure you can restore from it once every two months, once every six months, make sure that whatever workflow you have actually worked. So test, test, test. Absolutely. Well, that brings us to the end of today's episode. Uh, as always, you can find all of our episodes on anchor.com and send us a message there. There's also a whole bunch of links on there.

00:42:45

Speaker

that can, of course, help you find where to listen to this podcast. Also, if you are listening to a place, a podcast, and we're not there, let us know. Send us a message. We'll get on there. We're on most of them. But definitely go take a gander at that. And next,

00:43:03

Speaker

episode will be KubeCon recap. You know, so we'll be talking all things KubeCon. I'm sure there's going to be tons of announcements. There always is. This is going to be a big event. Um, I want to say I want to see post COVID, but yeah.

00:43:20

Speaker

I don't know if I can say that yet, but it'll be a fun show. If you are at KubeCon, do talk about our podcast. Let everybody know that Kubernetes podcast or Kubernetes Bites podcast is the podcast to listen to and spread the word. We count on you to get us more listeners and being in person, being at the trade show over a beer, over a coffee, or maybe at your lunch table, go ahead and talk to people about Kubernetes Bites.

00:43:44

Speaker

Absolutely. Well, that's a good way to end today's episode then. Well, I'm Ryan. I'm Bobbin. And thanks for joining another episode of Kubernetes Bites. Thank you for listening to the Kubernetes Bites podcast.