Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
MySQL on Kubernetes image

MySQL on Kubernetes

S2 E9 · Kubernetes Bytes
Avatar
364 Plays2 years ago

In this episode, Ryan and Bhavin interview Andrew Sillifant, a solution architect at Pure Storage working on all things databases and solutions. The discussion starts by talking about Andrew's early days with the LAMP stack, and how he sees the community evolve around containers and Kubernetes. We dive into the MySQL Operator and how it helps Kubernetes operators or developers deploy MySQL InnoDB clusters using simple YAML files. We talk about the benefits and challenges associated with running MySQL on Kubernetes and talk about the different ways users can protect their MySQL databases.

Show links:

Andrew Sillifant - https://www.linkedin.com/in/andrew-sillifant/

Oracle MySQL Operator - https://github.com/mysql/mysql-operator

Percona MySQL Operator - https://www.percona.com/doc/kubernetes-operator-for-pxc/index.html

Bitpoke MySQL Operator - https://www.bitpoke.io/docs/mysql-operator/#

Pokemon Go - https://cloud.google.com/blog/products/containers-kubernetes/bringing-pokemon-go-to-life-on-google-cloud

ARMO raised $30M to build Kubescape - https://www.prnewswire.com/news-releases/armo-raises-30m-for-the-first-open-source-kubernetes-security-platform-301533986.html

Platform9 named a leader in Managed Kubernetes solution platform radar from GigaOM - https://research.gigaom.com/reprint/gigaom-radar-for-evaluating-managed-kubernetes-solutions-platform-993544/

NetApp Astra adds support for private AKS clusters and VMware Tanzu - https://cloud.netapp.com/blog/astra-blg-astra-enhances-azure-integration-application-handling-hybrid-cloud-and-adds-vmware-tanzu-support?linkId=100000122358216&spr=100003061438458

Low-Ops Kubernetes storage with MicroK8s and OpenEBS - https://community.cncf.io/events/details/cncf-cncf-online-programs-presents-cncf-on-demand-webinar-low-ops-kubernetes-storage-with-microk8s-and-openebs/

Why and how should you reboot Kubernetes nodes - https://elastisys.com/why-and-how-should-you-reboot-kubernetes-nodes/


Recommended
Transcript

Intro to Kubernetes Bites

00:00:03
Speaker
You are listening to Kubernetes Bites, a podcast bringing you the latest from the world of cloud native data management. My name is Ryan Walner and I'm joined by Bob and Shaw coming to you from Boston, Massachusetts. We'll be sharing our thoughts on recent cloud native news and talking to industry experts about their experiences and challenges managing the wealth of data in today's cloud native ecosystem.

Hosts' Recent Activities

00:00:30
Speaker
Good morning, good afternoon, and good evening wherever you are. We're coming to you from Boston, Massachusetts. Today is April 28th, 2022. I hope everyone is doing well and staying safe. Let's dive into it. Bhavan, how have you been?
00:00:47
Speaker
I'm doing good. I'm doing good. Like I came back from the AWS Summit in San Fran last week. Felt weird that as I was traveling, the mask mandates were off. So everybody at the airport in flight, like 50%, 75% people are not wearing a mask. And then I think at the conference, I gave away those masks as well. So I feel we're getting back to normal.
00:01:10
Speaker
Yeah, it's definitely a little flip-floppy for sure. Even with KubeCon,

Mask Mandates at Events

00:01:15
Speaker
right? It was like, I think one day an email came out that said, no masks, don't worry about it. And then the next day it was like, just kidding. You need them now still at KubeCon. I think that's sort of the norm. You know, people want to get back to normal, but they're trying to do the right thing, make sure they're comfortable, have a good time.
00:01:31
Speaker
I'm still excited for KeepCon, no matter what. I'll wear whatever they want me to. It'll be nice to see people in person and really get on the show floor since the first time in a while. I don't think I was even at the last KeepCon you were.
00:01:46
Speaker
Yeah, but and I'm excited about spring like weather is getting better. I've already planned a few trips. So next week and I'll be headed out to Bryce Canyon National Park. So I'm excited about that. Just trying to squeeze some like, I don't know, nature between all the conferences that we have to attend in June timeframe.
00:02:04
Speaker
As you should, for those who don't know on the, who are listening to the podcast, Bhavan is like a serial hiker. Um, like he'll just go on these awesome trips and just be like, yeah, I went to Acadia and did like this mountain and you're like, Oh, okay. Um, so lots of cool things, hopefully for you in the future. That's good. Yeah. I'm excited for the weather too. Um, you know, can't get outside enough these days. So.
00:02:26
Speaker
That's good to know. Well, today's show is an exciting one.

Database Focus: MySQL with Andrew Sillifant

00:02:30
Speaker
We're going to talk about databases again. I feel like we can do that a lot, but I think that's fine. We're going to talk about MySQL today. We do have a guest on the show from Pure Storage, Andrew Sillifant. He's a solutions architect for the portfolio solutions, working really on all things databases. So we're going to talk about MySQL today, but he's had his hands on many, many databases.
00:02:53
Speaker
So I'm really interested to see what he has to say around my sequel and Kubernetes today.

ARMO's $30M Fundraising for CubeScape

00:02:59
Speaker
But before that, we're going to dive into a little bit of cloud native news. Why don't you kick it off, Babin? Yeah, so I'll start with a funding round. ARMO, they raised $30 million. They didn't specify whether it was like seeds, series A or series B. Just said they raised $30 million to help
00:03:17
Speaker
enhance the CubeScape open source project. The CubeScape is the first end-to-end open source Kubernetes security solution. Again, that's what they say. I don't know which ones, which are the ones that are available in the ecosystem, but CubeScape is one of them. It helps organizations scan configuration files or your YAML files or Helm charts that you might have. It helps you scan
00:03:38
Speaker
Kubernetes clusters and worker nodes that are part of that cluster for any misconfiguration or known vulnerabilities that are published in either the NSA guide that we like to talk about a lot, the other repositories where CVEs are mentioned, and make sure that you get notified and alerted whenever you have those vulnerabilities in your Kubernetes environment. So if you're looking for an open source solution, CubeScape is one and Armo is working on making better, I guess.

GigaOM Report on Platform 9

00:04:05
Speaker
In the second part of news, I wanted to talk about a new analyst report from GigaOM. I know GigaOM does a lot around Kubernetes. They do one around container storage or Kubernetes storage as well, but what's this part of? This time they are talking about managed Kubernetes solution. I just found it interesting because when you think about managed, you look at public cloud vendors like EKS and AKS or GKE.
00:04:32
Speaker
This report lists Platform 9 as the leader. I had to dive into the report to understand why that is and figure out that Platform 9 provides that solution that has easy to deploy, easy to use, and you can have worker nodes that are running on-prem, on bare-metal machines or VMs, or even in the public cloud. They add another layer of extraction where you can
00:04:59
Speaker
go that multi-cloud route or use multiple platforms and still have that unified control plane. So that makes sense. So if people want to learn more about all the different vendors that are listed there and what are the strengths and challenges, go and read that report that we link in the show notes.
00:05:12
Speaker
Yeah, that's a great report digging into. I mean, GigaOM has a lot of them, a bunch of them, even on Kubernetes specific spaces that we suggest going to take a look at. But, you know, they'll dive into key criteria like, you know, does it have a hybrid cloud, what the pricing model is, multi-zone application lifecycle, security, all sorts of stuff. Really insightful into, especially if you're looking at getting into using a managed cloud for, you know, business purposes.

NetApp Astra's New Features

00:05:38
Speaker
And then the last thing that I wanted to cover was an update or the next update for NetApp Astra. I think this came out a couple of days back, but they added support for private AKS clusters and AKS clusters that use Azure Active Directory or Azure AD for authentication for your AKS cluster. So I think they are just going down the route of adding new and new features or more and more features when it comes to these Kubernetes clusters.
00:06:08
Speaker
They also added for their on-prem version. They also added support for VMware Tanzu. I think that was missing from their solution. So now with support for Tanzu Kubernetes Grid, TKG, TKGI, both of those are supported if you're using Astra Control Center.
00:06:26
Speaker
Remembering your acronyms. That's like a quick recap of the news that I saw for the past couple of weeks. How are you? Yeah, just a few here for me that I wanted to point out.

Rebooting Kubernetes Nodes

00:06:37
Speaker
One was a blog post on elasticis. And it caught my attention because it dives into why and how you should reboot Kubernetes nodes.
00:06:44
Speaker
I think this is a lot of the time we sort of forget about these types of operations. We're so excited to be like, get this giant stack up with your cloud native storage, your security, your monitoring, and you're like, poof, it's all running. But the reality is you run these things in production, and especially with modern day ransomware, malware attacks, or even just in healthcare, making sure you're in compliance, up to date, patched.
00:07:08
Speaker
You often, no matter how good Kubernetes is, you will reboot notes. I really like this article because it really defines a nice image around the layers of an onion and the attack vectors. The hypervisor, the hardware, the Linux kernel, Kubernetes itself,
00:07:28
Speaker
as we've seen a lot of CVs come out or Kubernetes lately, the container run times themselves, container D, Docker, those kind of things. And then the individual containers, what you put in your Dockerfile may have its own vulnerabilities, right? Just the way any application would. So I think I really like the way that it kind of dives in here. One thing I think it doesn't touch on, which is the reason I brought it up was, the storage component of this, right? How you reboot your storage
00:07:56
Speaker
infrastructure and in its kind of its own attack vector. I guess you could lump it in the Kubernetes space and container space, right? Because a lot of CNS runs as containers, but you want to make sure you have the right things there. You want to understand how rebooting your Kubernetes nodes only affects the application, but your storage infrastructure because you do want to
00:08:18
Speaker
at least in many cases, kind of roll through those updates and make sure things are, you know, coming back, your replicas are healthy, all that stuff. So really cool article that I just wanted to point out.

CNCF Webinar on Kubernetes Storage

00:08:30
Speaker
that we often sort of forget about digging into. The second one is an up-and-coming CNCF on-demand webinar, really cool topic. It's called Low Ops Kubernetes Storage with MicroKs and OpenEBS. So again, talking about running MicroKs on your laptop, but with CNS through OpenEBS, really cool topic to enable you to get started. I think it uses MicroKs 1.24, it says in the description, and it really gets, you know,
00:08:59
Speaker
the hands-on with the distributed storage on your laptop, which is, I think, just something that's so cool to be able to do these things and experiment in such a sort of developer-friendly way. So definitely go check that out. All right. I think that is it for the news this week. And without further ado, let's talk about MySQL with Andrew Silver. Andrew, glad to have you on Kubernetes Bites.

Guest Expertise in Databases

00:09:23
Speaker
Welcome to the show. Tell us about you and what you do.
00:09:27
Speaker
Thank you very much, Ryan. So I'm a solutions architect. A solutions architect in Pure Storage takes the products, and we kind of ignore them for a little bit. And we go and we focus on applications. And we say, you've got a bunch of problems in this application. You've got database. This database wants to do this, this, and this. And then we bring the products and we say, OK, this is where the product might solve the problem. And we go and we prove it out. And we give customers advice on how to do that. Very, very simple and easy.
00:09:55
Speaker
Nice. Even though it's not one of our stated values, customer first actually makes sense for a solution of architect. Awesome. I know talking to you on our internal Slack, you are the database person that I reach out to for any questions. Can you tell us how long have you been working with databases and specifically MySQL since that's the episode for today?
00:10:19
Speaker
So it started all when I was an undergrad, um, they were like, Oh, we need to go and do web development because web 2.0 is going to be your entire future. Um, we actually had this big thing where they were like, Oh, Oracle, you just, the only database you need to know is Oracle. It'll be fine. However, Oracle is really difficult to deploy. It is big. It is a enterprise.
00:10:37
Speaker
business critical database, you cannot ask an undergrad who knows how to write a few scripts out to do that. So when we were doing web development, it was, okay, we're going to focus on PHP, we're going to go focus on essentially what is the LAMP stack. So I had a MySQL database, I had PHP and I had everything going and everything was wonderful.
00:10:57
Speaker
And so I was doing this web development. I was like, why the bloody hell is it so slow? And so then I had to go and figure out how the database worked in the first place. So that would be my first real encounter with MySQL. Um, and it was definitely an interesting jump from the, we're just going to look at SQL to here's your own database. Let's go and develop a web application environment with a pretty front end.
00:11:21
Speaker
Yeah, that makes sense. I started with the LAMstack too. I know like today if we ask undergrads, they might have a completely different answer that includes containers and Kubernetes. But I think a few years back, LAMstack was the way to go. So like what made you move to containers or like what are the benefits of running MySQL on containers?

Running MySQL in Containers

00:11:43
Speaker
So, okay, so we're talking about LAMP stack because that's a really good example of this. Let's say I'm Facebook and I was like, ah, Mark Zuckerberg. And was he coding up the front end of Facebook? I got PHP. He's got the whole story publicly and he's like, I've got MySQL, the backend and everything's working. What happens when you go from, I've got a thousand people using my database to 50,000 people using it really quickly. You've got serious problems with scale.
00:12:07
Speaker
And I think that's one of the really good things that containers help solve is the, okay, I need to very quickly take the small thing I wrote and scale it, even though it maybe wasn't the first thing I even remotely considered. So that was the first, why should you run it in containers?
00:12:24
Speaker
It's much more portable. You then stop thinking about, oh, there's a database and the database has problems. And you start thinking about everything as you've got an application stack with a bunch of components that are defined in the same place. So when we're talking about LAMP, take the Linux out of it, and you're suddenly thinking about the WordPress style of scenario. Because as opposed to deploying MySQL now, you're deploying WordPress as an application stack.
00:12:47
Speaker
MySQL is still in there, but you can ignore pieces of it in favor of a bigger story. And that's where Containers really does shine. Yeah, and I want to say you mentioned earlier that you had to go figure out the performance of MySQL when you're working with LAMP stack. Why is it so slow? And I know one of the things that a lot of people kind of question when getting into Containers is, oh, there's another version of virtualization. What does it do to my database performance?
00:13:16
Speaker
they're apprehensive to putting their databases in containers because they're like, well, it works so well on this giant beefy machine outside of any virtualization, right? So why would I put it next to my applications? And, you know, what experience did you have there, I guess, working with at all, you know, the performance outside of containers inside containers?
00:13:37
Speaker
What I've found is that regard, so you mentioned virtualization there, which was quite interesting because we went from five years ago, it was VMs will never be as fast as bare metal to, oh, VMs are fine. They work okay. And then containers is now the exact same argument again, which is more of a resistance to those who are set in their ways.
00:13:56
Speaker
And instead, I now focus on it from a perspective of a database will run as fast as you tell it to run and as fast as the resources you give it. So in virtualization, you're still going to be giving it CPU allocations. If you put 50,000 things onto something that can only handle one thing, it's going to run slower just because the way you do it. Containers and containerization and the orchestration layers do actually help you spread that a lot more evenly, whereas virtual machines allow you to do crazy over provisioning.
00:14:24
Speaker
where you've also got a provision for operating systems and things like that. Take the operating system provisioning out of the equation and suddenly you have, ah, we are only provisioning for the application. I have a MySQL database. I'm going to put four of those onto, let's say two or three nodes.
00:14:40
Speaker
That's doable. All you then have to do is take a look at, do the nodes have enough physical grunts to be able to do what I want? And the nicer thing about that is the node is the stateless story of where you're going. So the node exists in AWS. It exists in Azure. It's a bare metal server. It's even a VM.
00:15:00
Speaker
So the arguments we use, and we've already solved this, which is even funnier of, well, VMs aren't that bad. Well, if you can run containers in all of these environments, you can't really make the same argument for, well, it's going to slow it down. Instead, the containerization of it is just a new way in which to address the same thing.
00:15:18
Speaker
Yeah, it's a really good way of putting it, right? And this is, I think, what we're starting to see and with everyone who we talk to is people are kind of getting over that initial hump of, you know, should I be doing it? And now it's like, well, it's been proven. You can do this. You can put it on bare metal. It really doesn't make a difference. You're provisioning resources for the application. So really what you have to do is run MySQL, right? Run your application stack well. And I think that's the adoption we're seeing. And that kind of leads into my next question is like,
00:15:47
Speaker
Once you've explored putting a database into a container or even your application stack into a container, you naturally progress into how do I run this thing at scale, right? And that's where, where does my SQL make sense on Kubernetes? And that's my question to you is, you know, why Kubernetes instead of just, you know, running it in a container on a, on a node somewhere?
00:16:10
Speaker
Okay, so I'll take you back to LAMP because for the first time in a very long time, I had a Google last night and I was like, how the hell do you get WordPress to scale? Because I haven't touched this in years. So then I said, all right, Google and how does WordPress scale? And I went through this very, very complex blog. I was like, I just do not want to do this. I don't even want to touch it.
00:16:31
Speaker
So here's the thing, I'm going to build my business operations in a certain way with a certain expectation that I'm going to scale for, let's say 20% growth

Rapid Scaling in Kubernetes

00:16:40
Speaker
a year. So then you're architecting your environment for that. What happens if your growth and your user base suddenly goes through the roof because you're much more popular than you think? To give you examples of applications that encounter this, I'm showing relative age of youngness here of
00:16:56
Speaker
five six years ago with snapchat everybody suddenly started using it overnight and the only thing you can think of is how in the world does anybody get that scale so all right you've got the world of non containerization i want to scale okay it's a vm you add another vm you install another operating system you copy your database over you copy your configuration files you somehow get them all to talk to one another that is a lot of work
00:17:20
Speaker
And frankly, people do not want to do all that work on a continuous basis. So what we've done is we've innovated ourselves into laziness, which is, okay, I've got these environments. I'm going to set them up this way for this scale. But if I want to add new things onto the software components of that environment, that should not be difficult.
00:17:39
Speaker
So as opposed to, I've got a MySQL database, I need to add a more database load to it, but let's say read things, as opposed to doing all the copying, it is a change in the YAML namespace, which is a single line of code. And that makes it a hell of a lot easier for your administrator. So, oh, I need another node to be able to handle this. Your Kubernetes environment should already be over-provisioned to be able to handle that load.
00:18:01
Speaker
So it's okay, we're going to very quickly increase our overall database capacity, our overall application and business capacity, and that could be done very, very quickly. And that's the goal we want to do, because you can innovate faster from the application perspective, as opposed to worrying about, well, is Bob down in IT going to have my new server up tomorrow? You don't want to worry about those things anymore.
00:18:24
Speaker
Yeah, that's a good point. And you bring up Snapchat, but there's maybe another good example, which is Pokemon Go, right? We'll date ourselves there, too. But actually, it was, I think, one of the largest deployments on Kubernetes back in 2016. I just did a quick Google here. And to your point, they estimated sort of a 5x worst case traffic estimate, and they'd have to scale for that. What they got was 50x, right?
00:18:51
Speaker
And running on Kubernetes, running on Google at the time, that enabled this sort of
00:18:58
Speaker
their developers to focus on the application, rolling new features, not really worrying about the scale. I think to your point, that's really the power that I think drives this, why put your things on Kubernetes? I think why co-locating them is a different question. Bring your data close to your application, but definitely one of the reasons.
00:19:24
Speaker
Before you asked the previous question that you said, people are asking like, should I be running this on Kubernetes? And we have changed the question now, how should I be running this on Kubernetes? I just wanted to, like, okay, one of the challenges that Kubernetes solves, Andrew clearly highlighted that, like, it's easy to scale. If you want to add replicas, you just
00:19:43
Speaker
update your YAML file and apply it against your Kubernetes cluster. But then Andrew, are there any other challenges that Kubernetes helps solve? Things are better because it runs on Kubernetes or the other way around as well. Are there any things that are new challenges because we are running it on Kubernetes? I want both perspectives.
00:20:01
Speaker
So our newer challenges are we're taking existing code bases and we're containerizing them. So then your newer challenge becomes, how do we bundle all of this up to be able to make it a shippable product? When I say shippable, I'm not talking about giving it access to customers. We're taking environments and we're trying to very, very quickly debug those.
00:20:24
Speaker
So if you think about it, I've got a bug. I don't want to have to wait for Bob Dunne and IT build me a new test dev environment and copy this all the way over there. So the newer problems are we are solving the application stack in production and we've got the ability to copy it to a test dev environment.
00:20:44
Speaker
Now what we're looking at is how do we make these two things part of the exact same pipeline, which is DevOps, which is I'm going to have my day zero. We spend two years developing an application. Then we're going to pipeline the application once it's released into the wild. Oh, we've got a problem. We then need to worry about getting the application to another place to be able to go and find the problem that is probably a typo in the code for all.
00:21:09
Speaker
and then taking that and making it available into production. So to slightly change your question away from what new problems are we creating, we are creating new innovation routes and the problems within that are we're moving away from the traditional application space and moving to a higher plane of thinking of orchestrating business operations as opposed to orchestrating applications.
00:21:33
Speaker
And when you can orchestrate at the business level, everybody's happier. You make accountants happier, lawyers happier, you name it. So if we were to say, what new problems are we creating?

Challenges in Orchestrating Business Ops

00:21:44
Speaker
I'd say we're creating too much time to spend on
00:21:51
Speaker
How do I articulate this? We're spending more time hiring the stack, whereas the problems might be deeper. So we need a very clear way to get deeper into the application stack, which I think is actually, I don't think we're creating new problems. I really don't see it because I was following a line of thoughts around, okay, so we can copy data everywhere. This is good. We can make the business people happy. We're good.
00:22:17
Speaker
DevOps is the only real problem you can articulate there, which is already a solution to an existing problem. Nice. Awesome. That's a great way to put it.
00:22:28
Speaker
I know Bob and we had a conversation the other day about the challenge of putting things into containers. Databases have translated fairly well, apart from those bigger monolithic Oracle ones, although they still have those. And it begs the question of certain applications, there's a bigger challenge if you want to take an existing application, break that down. That's actually creating new problems. Whereas starting sort of green,
00:22:56
Speaker
you're innovating in the way I look at it. But I think database technology to our conversation the other day about Cassandra, which is what's the future of that too? Do we eventually break down the individual databases with what they do internally and run them separately inside Kubernetes versus the whole database just running as a container? Different conversation, but I think there's a lot to be said within that to Andrew's point.
00:23:25
Speaker
Yeah, so, okay, next question. Like, how do we get started? Like, I know there is a MySQL operator. Is that the best way, like, to deploy an InnoDB cluster? And what does architecture look like? Okay, so I actually had some notes about this, which was, let's take a look at primary and secondary stuff. Yeah, all right, you've got the MySQL operator. I believe as of today, correct me, this could change in a second's notice. The MySQL operator is still in beta access.
00:23:53
Speaker
It is a really good way to show how we've packaged up all of the different technologies. So let's take the operator and what it does. It is in a DB cluster in a DB cluster is a solution made up of multiple components. The first component is group replication, which is how the different databases talk to one another and the modes they sit in. Then you've got the MySQL shell that sits on top of that and coordinates it.
00:24:17
Speaker
and then you've got router on top of all of that saying this is how we're going to get to the underlying databases of this is where right is going to go this is where reads going to go and so on so if i can just for people that like that was a router as well like in a different accent
00:24:42
Speaker
South Africans also like to say database as opposed to database. That's all right. So you've got these three components. If I'm doing a non containerized deployment of it, you have to worry about each of those components. When you, when you're dealing with the operator that you stop caring about the in-depth of the components and they're already there. And then the entire thing you're worrying about is what's sitting in the AML namespace? What are we configuring for? And you go and do it.
00:25:10
Speaker
So is it the best way to do it? That is the single hardest question you can ask MySQL, simply because MySQL can be deployed in too many different ways. To give you an example, that's the MySQL operator. That is just it. That's the community edition. Oracle loves it, goes and does it. For the vast majority of cases, it would probably work.
00:25:29
Speaker
it gets even more complicated. You've got Galera cluster as a replication option. I'm pretty sure you can make Galera cluster work in a containerized environment, but there might not be an operator that does it. I'm sure there is. You've also got the variants of MySQL. So I'm not sure if it's planet scale or there's a few others inside, which I've taken MySQL's code, customized it for what they want to do, and then gone out there, which is why it makes that a really, really hard question to answer.
00:25:57
Speaker
The operator, as I said, a very good starting place. However, there's nothing saying, I'm just going to deploy MySQL and say, I want to scale it because the way the operator works is maybe not what I want to do. So then you have the ability to say, all right, we're going to have one primary, many replicas with this stateful set. And then you're essentially telling Kubernetes to give you just a common IP to work with. You can configure that within yourself.
00:26:24
Speaker
So it entirely depends what your application is really doing. Do you want a supported, likely to have continuous development through its life solution? MySQL operator, fantastic. Do you want to do something that the operator doesn't necessarily meet your requirements for and you want to innovate further? You can take MySQL and you can go and do even more things with that. So that is a really hard question to answer simply because of how dynamic the MySQL community is.
00:26:51
Speaker
Yeah, and I would like to say that because the operator pattern is definitely one of the newer concepts within Kubernetes as far as dating the entire thing, I think there's this sort of mass rush to create operators for things. MySQL being one of them where, like you said,
00:27:13
Speaker
if you go into and start googling how to run MySQL and Kubernetes and look for operators, you're going to find a whole bunch from companies who have customized things to the community editions and you're just like, what is going on? You have to do your own research of what does this operator do versus that one versus licensing payment, all those things. And I think that is, I hope, going to normalize over time of the
00:27:40
Speaker
the standard ways to run community versus if you want to go with vendors. I think operator patterns are definitely going to be the way to do it, although you're always going to have people that are going to design their own way of deploying databases. That's never going to stop. But I do think over time, these operators will get very robust in how they handle things, how they allow you to do customizations. Right now, they're very opinionated, in my opinion.
00:28:08
Speaker
And they don't let you kind of go outside the scope of what the operator can do. I'd really like to see more customization and say, oh, if you want to use this part of the operator, but not this part, go for it, right? Which leads me to the next question, right? Operators do a great job of sort of handling
00:28:25
Speaker
deployment, you know, day two operations, those kind of things. And we can go into that a little bit more. But I think at its core, one of the values that I think we hear over and over again is handling failure on Kubernetes, right? Could you talk a little bit to how this would, you know, generally look like without something like Kubernetes and sort of maybe the benefits you get with MySQL and sort of a, you know, primary secondary sort of way when you are running it on Kubernetes?

Managing MySQL Failures in Kubernetes

00:28:53
Speaker
Okay, so if we talk about take the Kubernetes side out of it and just focus on the database, you've got primaries, you've got secondaries, which is a replication style environment. I have day one, I've decided, okay, I'm going to implement my MySQL database for my PHP web service and everybody's happy.
00:29:11
Speaker
You then find that you have Wikipedia levels of growth and everybody suddenly very interested in what you want. So, oh, I've got loads of reads. Everybody's keep updating this and everybody keeps reading it. I've got like 90,000 people reading it and I can only do one update a day. What do you do? Your ability to scale is either vertical or horizontal. Horizontal scaling is what we're going to focus on here because vertical scaling is we're just going to give it loads of power and hope it doesn't fall over.
00:29:36
Speaker
So your horizontal scaling then says, okay, let's go and copy the database. Let's keep it in sync with your primary database and go from there. And then what we'll do is we're going to do read style operations to your secondary. This allows me to scale in a better way. And weirdly enough, that's how you design a website. You should, in theory, get a hell of a lot more reads to your website because that's driven traffic than you should be updating it. Unless you're Facebook and that's something else entirely because that's constantly changing.
00:30:06
Speaker
Okay, so you have this kind of environment of how you're going to look at making things more highly available. That was the original style, and that's MySQL replication. Then we started building in the, well, what if the primary goes offline? How screwed are we? And you are very screwed if you're using standard replication styles. So that's where the more complex technologies like Galera cluster, group record replication,
00:30:30
Speaker
and those kinds of things came along. Because then you started to look at multi-primary or multi-right style replication environments, which is a much more difficult programmatic problem to solve. So kudos to the developers who thought those up. Because think about it, if I've got 16 replicas and I'm doing synchronous style replication, every single one of those replicas has to accept that, yes, I'll accept this change before you can do anything. That's going to slow down your whole website.
00:31:00
Speaker
What they did was they looked at this near-style synchronous replication or eventual synchronicity or eventual consistency over time. Really, really cool things started to come out of that of, we could scale these databases, but your synchronous style would, let's say you've got 16 nodes, we'd worry about three of them getting it and then the rest can bugger off and we'll worry about them later.
00:31:24
Speaker
So what does this mean in the concept of Kubernetes? Okay. I've got 16 nodes to worry about. That is a lot of stuff to go and then orchestrate and make it a hell of a lot more complex. So here's the thing that I'd love to know.
00:31:41
Speaker
If my secondary systems go out of sync, they're going to need to come back into sync. How do we ensure that when they're out of sync, there is a mechanism to bring them back into sync very quickly? Kubernetes actually, and I haven't seen it properly exploited. I've seen everybody touching at the edges of it.
00:31:58
Speaker
of, okay, this thing is two days out of sync, as opposed to trying to bring it in sync, we're just going to wipe it out, copy the main one and start again. And that's the kind of behavior you're starting to enable in programmatic environments like this, because Kubernetes is a programmatic orchestrator.
00:32:14
Speaker
So that's fantastic. If I did that without the programmatic orchestration of Kubernetes, OpenShift, all of those things, what do I have to do? I have to go and write a script that says, oh, what was the last bin log entry? If the bin log entry was this far out, I'm going to wipe you out. I'm going to do you. I don't want to worry about that stuff. I want those kinds of mechanisms to be built in already.
00:32:36
Speaker
I that was a very long form way of answering your question, but I thought it was building the bricks first. I think it was great. I think it was great. And a lot of those built ins just like, you know, how these individual replicas are identified, how they get IP address to service discovered, right? All these things that are built in, I think,
00:32:54
Speaker
is the real value. And it's one of the first times I've heard someone focusing on just wipe out the thing that's out of sync and copy it. And it begs the question of, since this is also a podcast that focuses on storage, are there benefits to doing this with or without a cloud native storage?

Cloud-native Storage Benefits with MySQL

00:33:16
Speaker
So if you do this without it, MySQL does everything at the application layer.
00:33:22
Speaker
I presume I'm no MySQL expert. But with cloud native storage, I assume there's benefits of doing things like copying data or bringing things back into sync if you have some sort of persistence under there.
00:33:37
Speaker
There is definite value in doing it. So if you think about it from the MySQL perspective, all MySQL cares about is, am I pointing towards a directory that has everything working and am I happy with it? That is, it's their mode. What MySQL will try and do in those replication environments is it's going to say, what was the last bin log entry? Do I need to copy this? It's very lazy. It's kind of like a very slow moving trolley. And it says, okay, we're going to lift it over here.
00:34:02
Speaker
In storage, we have a hell of a lot more capabilities. Think about it. MySQL is pointing at a directory. If that's all it's pointing at, all we have to do is stop it pointing at it, stop it in its entirety, replace the directory itself with whatever's up to date, and then move on. Very, very easy and straightforward. You can do that more easily at the storage layer because of complex technologies that we have simplified like snapshotting.
00:34:25
Speaker
um, copy data management, pointing to existing data. Those are the kinds of beautiful things that are coming out of storage technologies now. And you're finding storage is getting uplifted into the application layer because of conversations like this. Nice. So, okay. The snapshots, right? Like what are the other tools that can help you like protect your database instances? Like how can I back it up? How can I restore it? Like how do I plan for all of those disaster events or like just weird events?
00:34:56
Speaker
Oh, so let's focus on MySQL with that. MySQL backup and recovery and data copy has been a really fun area for many years because they did things and then they were like, oops, this doesn't work. We need to do something else. So you have a whole bunch of tools now. Let's start at the beginning. So MySQL dump, the first, foremost, and probably one of the more commonly used one. I've got a small 10 gig database. I'm going to dump the logical data out of it into a bunch of files. Everything's good. No one worries too much about it.
00:35:24
Speaker
MySQL dump becomes absolutely terrible when you're talking about like terabyte size databases. Because think about it, you have to logically read every row and every table in the whole database and make sure you've got the data definitions for it, the data manipulation, and that just gets old real quick. So then they came up with, ah, okay, so we're going to, a company called Prokona was very clever and they were like, we're going to develop something called extra backup.
00:35:48
Speaker
And they were like, oh yeah, we're the top dogs with this. And they were for quite a while, where they combined freezing the database, taking backups of the physical state of it, moving it somewhere else, and ensuring it didn't interrupt any actual running operations. That's kind of how most data backup tools work these days.
00:36:08
Speaker
MySQL and MariaDB got real clever. And they were like, OK, we're going to take your code and we're going to own it. It's open source. So we're going to take it. We're going to copy it. We're going to call it something else and then improve it. So now what you have is you've got the option of using MySQL dump. You've got MySQL backup and Maria backup.
00:36:26
Speaker
So my SQL backup is extra backup with extra steps and a new name and a new lick of paint. And it's frankly pretty good. I mean, both tools are very good in their own thing. And what's happening is MySQL backup is innovating. So we're starting to get interesting things like backing up directly to S3 and things like that. And to slightly rewind it, you're talking about operators, one of the roundings of operators. So we built the house. We're now going to send it down to a really nice machine.
00:36:54
Speaker
is, OK, let's start looking at how these backup tools are implemented within containerized environments. Is it OK, the operator itself is going to go and talk to all of the different things, or do we do it at a more logical layer? And those are the kinds of to even go back to what problems I'll recreate.

MySQL Backup Tools Overview

00:37:11
Speaker
We've solved one problem. Is the problem we've solved previously still applicable for the new thing, such as backup and recovery, which we're starting to see interesting tools come out of that.
00:37:23
Speaker
Yeah. And why would, you know, why would someone choose to use the MySQL specific tools over say a more generic solution at the storage layer? Since we talked about snapshots and things like that. Okay. Most of the time you do things with the specific tools like MySQL dump.
00:37:41
Speaker
because you're talking about changes in architecture. So for example, off the top of my head, I think MySQL supports PowerPC. You cannot take the data from a PowerPC instance of MySQL and just shove it into an x86 instance. So you'd have to dump it out and move it there.
00:37:59
Speaker
That's also one of the ways in which we're doing a little bit of advice around, well, how would you move data from a non-infrastructure as a service style deployment through to managed service deployments for databases? And you do that in the same way. So you'd use them for that reason. MySQL Backup is also very good where you want to go to a specific endpoint like S3,
00:38:23
Speaker
and existing tools just don't have their technology yet. So would I say we're going to see these kinds of tools around for a while? We probably will. However, I do see lower level Kubernetes backup superseding all of that in the exact same way we saw Veeam take over, VM level backups over backing up databases directly from VMs. Yeah. I don't know, with talking with a number of our guests, one thing we're starting to hear is that
00:38:53
Speaker
not relying on one of those things is actually not a bad thing, meaning that go ahead and take both, right? Shove them into S3, take your logical backup as well, put that in a snapshot, throw that somewhere else. It's kind of like throwing the boat at the whole situation and just like, we have what we need if we need it down the road, right?
00:39:11
Speaker
But I do see a lot of that, especially in the operators, right? That definitely needs to, I guess, you know, be more succinct in sort of the suggested way of like, here's the right way. And I do find that people using specific databases do want, obviously, to use those specific tools for obvious reasons that you just said.
00:39:34
Speaker
Well, I mean, this, I think is a good point to switch gears a little bit and say, you know, in your sort of, uh, journey to working with MySQL on Kubernetes, where would you suggest people go get started,

Starting with MySQL on Kubernetes

00:39:48
Speaker
right? If they were to, um, want to move past all the frustration of which operator to use to have a good experience or just to get their hands on it, where would you suggest they go?
00:39:59
Speaker
I definitely recommend going towards the community additions. They're very good. They're very well supported. Because the MySQL operator is quite complex as a starting point, I'd say don't even bother with the operators. Get the database working. One of the greatest challenges for me when first encountering containerables was, why can I not get an IP address to get to this thing from outside?
00:40:20
Speaker
And that required me to marry up knowledge I didn't have around Kubernetes, which is actually quite Google-able to my existing knowledge around databases and MySQL. And once you start solving those problems, take what you are doing today
00:40:35
Speaker
Make that interact with only a database in the containerized environment and slowly start moving towards it. That way your skill set will marry up much more easily between what you currently know and what is the newer way of doing things with the container orchestrators.
00:40:53
Speaker
Yeah, that's a good way to put it. And I find those who are getting into Kubernetes, knowing Kubernetes, you do have to kind of branch out because, especially in a DevOps culture organization, you're probably gonna touch some things that maybe you normally wouldn't. I mean, maybe some organizations don't necessarily expose you to certain aspects and they do the deployment and everything for you. But I think it's so vital to know sort of at least a basics,
00:41:20
Speaker
a sense of how all these things kind of are working under the covers, right? Because if something does, and if you are on the hook as an SRE or as a part of a DevOps team, you kind of need to know those things. So definitely go check out all of those good resources. We'll actually put links to the community additions, both the operator, non-operator. We'll put some links in there. Andrew, if you have any, we'll post those as well. But I think this is a good point to stop.
00:41:48
Speaker
I appreciate you being on the show. I think there's a lot of good information here. We can only cover so much in about half an hour. There's so much to talk about when it comes to certain databases and how to run them, what the challenges are, where the future of Kubernetes is going with them. I think you did a really good job of helping myself and hopefully our listeners as well. Thank you very much for being on the part.
00:42:14
Speaker
You're welcome, gents. Hopefully you'll have me back to talk about new crazy things. Of course. We'd love to have you back. All right. That was a great conversation. I know, um, with each database we talk about Bavin, I feel like I learn how much I don't know. Um, and, and it's nice to have these guests such as Andrew on here really digging into.
00:42:38
Speaker
you know, the different back ends for MySQL or the challenges before it was on Kubernetes. And I just think that, you know, the takeaways for me were, you know, the first one I want to talk about is just the concept of being able to put your databases in

Debate: Databases in Containers vs VMs

00:42:56
Speaker
a container. And maybe some of you listening to this are new to this, or maybe you're already running databases in containers or just staple things, but, you know,
00:43:05
Speaker
like VMs as we talked about in this conversations, there was apprehensiveness to even put certain things in VMs when VMs were new.
00:43:14
Speaker
And I think we're getting over that hump, right? We've crossed that chasm as we've talked about multiple times on the show before and it's here to stay, right? I think the most recent survey says like, you know, 93% of people are using it in production and over 70% are running stateful things. So definitely something that I think, you know, is worth kind of showing that MySQL is just a process on a server.
00:43:43
Speaker
Kubernetes enables it to deploy and you give resources to it. You have to make sure you're doing the same things for a bare metal environment, VM environment, or Kubernetes environment with containers. So definitely worth tying into that. That was a good point for me. And then the second one was that certain aspects of how you operate and maintain MySQL in a non-container or Kubernetes infrastructure
00:44:09
Speaker
Like how do you manage when a secondary node is out of sync for quite a long time? You know, Andrew talked about, you know, there may be the case where you as an operator would say, well, just, you know, remove that thing entirely, take a copy of the primary and get that thing going again.
00:44:24
Speaker
That's the type of, I think, logic that is finding its way into things like staple sets and operators. I think that probably is more appropriate for an operator. It's very specific to MySQL rather than an abstraction-like staple set. But I think that's where I think the maturity of these operators is really going to show in the next few years of really what
00:44:47
Speaker
robustness can be built into those with those specific types of concepts. Really interesting stuff. How about you? For me, I think when we started the episode with him, he started with LAMstack and brought in or started with an application-first perspective and that was really helpful.

Deploying MySQL in Production

00:45:04
Speaker
Databases are great, but then you should think about your whole application as an end-to-end basis. If you're running your applications on Kubernetes, now you can also run your databases on Kubernetes and think about it as a whole unit.
00:45:16
Speaker
to get started. There are so many options out there, like even with the operators or without operators, deployment models are different. The ways you can back it up, he listed like three off the top of his head on how you can protect MySQL database instances. So all of these, like community does a great job of providing options. But if you are adopting this as part of your organization,
00:45:37
Speaker
If you're trying to run this in production, make sure you test it first. Look at the options. We'll have links in the show notes for the few options that we see available in the ecosystem. Test it with your application, see how it works, and then push it to production. Because as Ryan said, each operator handles failures or handles these replications differently. You might want to choose one over the other. So make sure what works for your environment and then go ahead with it and run it in production.
00:46:07
Speaker
Absolutely. Well, as always, we will put all the show links we talked about in the podcast itself under the show links sort of text. We'll make sure to put everything in the news we put in there. We'll put in the community editions that Andrew talked about, how to deploy it, where you can find the operator, the Pokemon Go link that I was referring to early in the show. I think really, really interesting article.
00:46:33
Speaker
if you haven't seen that one yet. And as always, go ahead and leave us reviews wherever you can. Apple Podcasts, Anchor, send us a message, and wherever you want to. We'd love to hear from you and what you want to hear on the show.
00:46:52
Speaker
what you like, what you don't like, send it and everything. Um, next two weeks, we have a really, uh, interesting guest on the show. We're talking about other database. This time it's going to be Mongo DB, uh, with Michael Lynn.
00:47:04
Speaker
Really excited about that episode and what we'll find out about Mongo. I know. He does a really great job with his own podcast, the MongoDB part. I'm excited to have him on our podcast and share his insights around MongoDB and around MongoDB on Kubernetes. Absolutely. Well, this brings us to the end of today's episode. I'm Ryan. I'm Bobbin. Thanks for joining another episode of Kubernetes Bites.
00:47:32
Speaker
Thank you for listening to the Kubernetes Bites Podcast.