Oops!Zencastr was unable to start because Javascript is disabled
To fix this problem, check your browser's settings and enable Javascript

Become a Creator today!Start creating today - Share your story with the world!

00:00:00

00:00:01

Databases on Kubernetes, Why Database-as-a-Service matters

S1 E5 · Kubernetes Bytes

532 Plays3 years ago

In this episode, Bhavin and Ryan interview Umair Mufti, the product owner of Portworx Data Services; the new database as a service for Kubernetes. They discuss Umair's experience at Dreamworks Animation creating a database as a service platform as well as why he felt is was necessary to build it and some of the lessons learned along the way.

Show links

Recommended

Diving Into Kubernetes: The Developer’s First Steps with New Relic image

Diving Into Kubernetes: The Developer’s First Steps with New Relic

S5 E2 · Kubernetes Bytes

00:52:20·3 months ago

Database as a service with Percona Everest image

Database as a service with Percona Everest

S5 E1 · Kubernetes Bytes

01:02:44·4 months ago

KubeCon NA 2024 News Recap image

KubeCon NA 2024 News Recap

S4 E23 · Kubernetes Bytes

00:58:24·6 months ago

Increasing AI adoption using Kubernetes image

Increasing AI adoption using Kubernetes

S4 E22 · Kubernetes Bytes

00:52:03·7 months ago

Monolith to Microservices using Kubernetes at Guidewire image

Monolith to Microservices using Kubernetes at Guidewire

Kubernetes Bytes

01:06:28·8 months ago

Inference in Action: Scaling Al Smarter with Inferless image

Inference in Action: Scaling Al Smarter with Inferless

S4 E20 · Kubernetes Bytes

00:55:17·8 months ago

Container security with Wiz image

Container security with Wiz

S4 E19 · Kubernetes Bytes

01:02:33·9 months ago

Dagger.io Deep Dive with Co-Founder Sam Alba image

Dagger.io Deep Dive with Co-Founder Sam Alba

S4 E18 · Kubernetes Bytes

01:06:24·9 months ago

Running Ray on Kubernetes with KubeRay image

Running Ray on Kubernetes with KubeRay

S4 E17 · Kubernetes Bytes

00:53:06·10 months ago

Building scalable data platforms using Data on EKS image

Building scalable data platforms using Data on EKS

S4 E16 · Kubernetes Bytes

01:02:20·10 months ago

Deploy and fine-tune LLM models on Kubernetes using KAITO image

Deploy and fine-tune LLM models on Kubernetes using KAITO

S4 E15 · Kubernetes Bytes

00:44:17·11 months ago

The business case for cloud-native and Kubernetes image

The business case for cloud-native and Kubernetes

S4 E14 · Kubernetes Bytes

00:54:24·11 months ago

Building the AI Hyperscaler with Kubernetes image

Building the AI Hyperscaler with Kubernetes

S4 E13 · Kubernetes Bytes

00:54:56·1 year ago

Shifting Minds: Exploring OpenShift's AI Landscape image

Shifting Minds: Exploring OpenShift's AI Landscape

S4 E12 · Kubernetes Bytes

01:05:07·1 year ago

Training Machine Learning (ML) models on Kubernetes image

Training Machine Learning (ML) models on Kubernetes

S4 E11 · Kubernetes Bytes

00:55:29·1 year ago

The evolution of service mesh technologies image

The evolution of service mesh technologies

S4 E10 · Kubernetes Bytes

01:08:00·1 year ago

What are Vector Databases image

What are Vector Databases

S4 E9 · Kubernetes Bytes

01:03:06·1 year ago

KubeCon EU Paris News Recap image

KubeCon EU Paris News Recap

S4 E8 · Kubernetes Bytes

00:47:39·1 year ago

Open Policy Agent (OPA) 101 image

Open Policy Agent (OPA) 101

S4 E7 · Kubernetes Bytes

01:07:20·1 year ago

Ops Ops Hooray! Navigating IDPs from an Ops perspective image

Ops Ops Hooray! Navigating IDPs from an Ops perspective

S4 E6 · Kubernetes Bytes

00:58:17·1 year ago

Transcript

Introduction to Kubernetes Bites Podcast

00:00:03

Speaker

You are listening to Kubernetes Bites, a podcast bringing you the latest from the world of cloud native data management. My name is Ryan Walner and I'm joined by Bob and Shaw coming to you from Boston, Massachusetts. We'll be sharing our thoughts on recent cloud native news and talking to industry experts about their experiences and challenges managing the wealth of data in today's cloud native ecosystem.

00:00:27

Speaker

Good morning, good afternoon, and good evening wherever you are. We're coming to you from Boston, Massachusetts. Today is September 29th, 2021. Happy fall for everyone that experiences fall. I know in the Northeast it's quite beautiful and maybe in the Midwest, but I haven't experienced fall out in Colorado. Maybe it's still going, like it's already 50 today.

00:00:53

Speaker

I know, it's getting there. It's coming in head fast. I hope everyone's doing safe and we're going to dive into today's episode.

Guest Introduction: Omer Mufti

00:01:05

Speaker

But before we dive into the news and topic, we are going to be having another guest, which we did tease in the last episode. Omer Mufti is one of the product owners within Pure working on a database as a service.

00:01:21

Speaker

that was actually recently announced. So we're going to definitely talk about that a little bit. He's one of those guys that's worked at various different data service operations jobs. One of the ones that we know he came from was DreamWorks Animation, where he led sort of the adoption of site reliability engineering, and developed and built a database as a service platform in DreamWorks Animation. So

00:01:46

Speaker

We're going to ask him a lot of questions. I'm sure about his experience on that. He's also an avid Dodgers fan. So, you know, how you feel about that is up to you. But yeah, we're really excited to have him around the show in just a bit here. But I think there's some cool news going on in the world of cloud native storage, Robin.

00:02:07

Speaker

Yeah, so one of the things that I saw Google announced this week was early access for backup for GKE. I think they're just getting ready for the cloud event next month. So this is just one of the things that they announced. Now, again, still in early access, but customers who are using GKE can use a data protection solution from Google that allows them to protect not just your applications, but also the cluster state for GKE.

00:02:34

Speaker

into an object store bucket and then restore it to the same or a different GKE cluster in the same or a different Google Cloud region. So customers who have all of your workloads or Kubernetes clusters running in Google Cloud, this might be a solution for you.

00:02:49

Speaker

Yeah, I think, you know, if you have sort of a tool that's built into GK, I mean, I know every time I spin up a GK cluster, it's just stupid simple. So I think they've done some really good tooling out there. Wasn't there also part of that announcement was the file system support that supports replication across three availability zones in a region? Yeah, that was, I think, yeah.

00:03:14

Speaker

I think that was more generic to Google Cloud. It obviously works with GKE, but I think it also was focused around the NAS offering. Yeah, and I think for Kubernetes, if you consume that, you get some local high availability, but also with the backup product, you get some backup built into your Kubernetes

Omer Mufti's Journey with Kubernetes

00:03:32

Speaker

workload. So yeah, really cool stuff. What else has been out there?

00:03:37

Speaker

I know VSAN 7 update 3 came out, which I only saw this morning, so I know we're still diving into the details there. Probably would be great to have someone on post VMworld to talk about what's going on in that space.

00:03:55

Speaker

even in the Tanzu space. I know you've been doing a lot of work there, but I think they've added some, you know, officially the file service capabilities, the stretch clusters for vSAN. So really some cool stuff there. Yeah, I'm looking forward to like VMworld sessions, like deep dive sessions from some of the PMEs there to like, to learn more about these topics and see what's new in vSAN 7 update 3.

00:04:20

Speaker

Yeah, exactly. And then we wouldn't be doing our job if we didn't talk about the Portworx Data Services announcement from Pure. And that's near and dear to our hearts. But that's, you know, one of the most exciting things that we've been up to lately is really working on this sort of database as a service for Kubernetes. And we're actually going to have Umer

00:04:43

Speaker

I mentioned the product owner of pds come on and tell us really about his experience of why he started developing you know what works data services or even before that database as a service at other companies what drove him to go down that road.

00:05:00

Speaker

So, I'm super excited to talk more about this topic, since it's... Yeah, and don't worry, guys. This is not going to be a sponsored episode. We are not going to spend the whole episode talking about Portwold's data services, but it is a pretty cool announcement.

00:05:14

Speaker

Yeah, exactly. Like I said, we have to do it. We have to say it. And if you want to try it, go for it. There's sort of early access you can sign up for. But Umer has a lot of experience beyond just his short tenure here at Pure. So we really want to dig into that. And that's going to be definitely more generic about just data services on Kubernetes and why you try to do this for an organization.

00:05:41

Speaker

as sort of an, as a service and why it matters. So I think that's a good intro. Should we have a little jump on? Let's do it. Let's, let's bring him on. So welcome. Umair, welcome to the show. Welcome to Kubernetes Bites. Um, I think the audience is really excited about this topic. We're excited to have you on, especially since you're an LA Dodgers fan. You know, we're still, we're over here in Boston on the other coast. Um, yeah, please tell the audience a little bit about yourself in your background with Kubernetes.

00:06:10

Speaker

Sure. Yeah. Excited to be on the show. Thanks for having me. Yeah. So as you mentioned, I'll actually talk about the Dodgers a little bit because as of this recording, Dodgers are looking good and hopefully will come out on top for the ninth consecutive year. Again, thank you for Mookie, for that Mookie trade.

00:06:33

Speaker

So just a little bit about my background and specifically around Kubernetes. So my Kubernetes story or journey actually kind of began in my last job, which was at DreamWorks Animation. Interestingly enough, I actually kind of go back a little bit before Kubernetes. So when I joined DreamWorks, I joined them in about 2014 or actually exactly 2014.

00:06:58

Speaker

And I was hired there to be a database administrator for Cassandra. And in my previous to DreamWorks, I had done some work on, I had a startup and we had kind of hacked at Cassandra to make it into like a real-time analytics database. So I had some fair amount of experience with Cassandra. And right around that time, DreamWorks was beginning a journey of moving into the microservices world, I think as a lot of companies were at that time and

00:07:28

Speaker

and starting to think about databases outside of just Oracle. Up to that point, they were just sticking everything inside of Oracle schemas and not thinking about it too much. But the guy that ended up hiring me there had a really forward-looking vision about the way that databases should be

00:07:50

Speaker

kind of thought of and more NoSQL and eventual consistency models and that sort of thing. So he was like, let's start with Cassandra. So they brought me on board. And within my first, and this is January of 2014, right off the bat, my first project there was like, let's figure out how to get Cassandra to run inside of a container.

00:08:12

Speaker

And at that time containers didn't even mean Docker. So this is like LXC days. This is like, got it. This is like really super early. So obviously before even Kubernetes existed, right?

00:08:22

Speaker

So there was no Kubernetes. There was barely Docker. I think Docker maybe Solomon had mentioned it or something at a previous. Yeah, he was doing early demos, I guess, at PyCon. Yeah, exactly. And so about a month later, my boss, Ellie, and I went to the scale expo. I don't know if you know it here in Los Angeles. And it's like a Linux expo.

00:08:49

Speaker

We just sat down over lunch and we're like, you know what, we should build a database as a service. We should think of Cassandra as a service to our team internally within DreamWorks. And so that kind of set me off on this like eight

Challenges and Solutions with Kubernetes

00:09:04

Speaker

year journey of trying to make this happen. But so early on, again, this was sort of pre pre Kubernetes. We were experimenting with first LXE containers, then Docker containers. Docker, when it first came out, it didn't

00:09:17

Speaker

even Linux containers at the time didn't really have any notion of network namespaces or a lot of these new primitives that we have to think about in terms of isolating workloads inside of containers or just isolating those workloads generally. And so we had to build a lot of that stuff from the ground up. And then

00:09:41

Speaker

Fast forward to the Kubernetes world, maybe like a year later, 2015, I started seeing Brendan Burns. I think he was still at Google at the time and he had his project for Kubernetes and we were literally like, oh, this is great. I can have somebody manage the orchestrate, the containers for me.

00:10:04

Speaker

up to that, up to then we were, we had implemented what we call meatware, which was just a bunch of humans going in there and being human schedulers and saying, okay, this Docker container is going to run on this machine. And if that container goes down, then I'm going to.

00:10:20

Speaker

start a Docker container on another machine and change IP addresses and manually copy the data over to this other physical machine. And so the notion that there might be something that could kind of orchestrate some of that work and take it away from humans was really appealing to us back in 2015.

00:10:41

Speaker

And so I was literally, I built a system to pull down source code from Git and compile it, build it into RPMs that we could deploy our own Kubernetes clusters. And literally by the time that the job would end up building,

00:11:04

Speaker

There would be more commits on the source code and things would be broken. It was almost like it was this endless loop for about two months of just trying to compile and get a stable build of Kubernetes. Finally, when that happened, by the time I got some stable builds, I realized that

00:11:25

Speaker

It completely was missing any sort of notion of, I'll just say state, but, you know, it's like, okay, this is great if I want to move a pod around, but what about if I need to save that data to, you know, somewhere? So I was like, so we basically completely scrapped Kubernetes for years and continued forth on our own. We built our own platform and it wasn't until

00:11:55

Speaker

The Portworx guys actually came to us a couple of years later. And this is actually kind of interesting. So the Portworx guys had come to us and said, hey, we're building this thing called software defined storage. We think it could help with your Docker containers. Again, this is before Kubernetes had stateful sets or any of this stuff. So we were still doing things in just raw Docker.

00:12:19

Speaker

And so yeah, so we started looking at, oh, software defined storage. We can actually use this thing called Portworx to manage the data, this data storage behind our Docker containers outside of Kubernetes, just like on our bare metal machines. And that kind of,

00:12:43

Speaker

started us down the path of as we kept upgrading Portworx and started realizing that the Portworx development effort itself was starting to move away from Docker and into the Kubernetes space. And every, all of the other Portworx customers, by the way, we were like customer number two for Portworx, we were like super early. But we started seeing the, you know, the development effort move away from Docker and kind of

00:13:11

Speaker

uh, sort of congeal around, around Kubernetes. And we're like, okay, maybe we need to look back into Kubernetes. And by then stateful sets had been really, you know, had come out and that, that started us down what we thought was going to be, um,

00:13:27

Speaker

you know, an easy path to making databases run on Kubernetes, but that that's a whole nother story. Anyway, I mean, I think because at that time, right, Docker Swarm and even Mesosphere at the time were sort of seemed like the front runners. And then once Kubernetes really matured, right, that it was pretty much lights out, I feel like. Yeah, for sure. And so even Docker Swarm, like for us, Docker Swarm never made a lot of sense. It seemed like it was more worth than it was worth.

00:13:55

Speaker

Mezos, we had done some POCs with Mezos. I probably shouldn't say this publicly, but we did some work on kind of POCing, DCOS, comparing it with Kubernetes. There were a number of issues that came up with

00:14:17

Speaker

with running Cassandra in DCOS, that we actually basically determined that our own solution, our own platform that we had developed that was basically just raw Docker was actually superior than Mesos. And we started telling them, hey, this is the Mesosphere guys. We'd meet with them and be like, you guys have no notion of network namespacing at all. So if I need to run two Cassandra clusters on one

00:14:44

Speaker

On one node, one server, I have to expose different ports and have some sort of port mapping. And I'm like, this is not going to scale if I have more than like two databases.

00:14:55

Speaker

Yeah, I mean, when I was at Athena Health, we originally went with Mesosphere because the maturity level it was at the time. And the universe and the universe packages were, you know, appealing for, you know, they had Kafka universe packages, they had, you know, all those things.

00:15:15

Speaker

And then I think we realized down the road as well that the universe packages were very, very opinionated. So it was hard to change things. You had to pin, you still had to pin, you know, certain things to nodes and that sort of, you know, defeated a lot of the purpose. And then, you know, our eyes wandered towards the Kubernetes in the room. So you guys had actually asked us to start contributing our stuff and put that into the like, basically, we they wanted us to become the provider of some of these

00:15:44

Speaker

You know frameworks whatever in. Anyway so it was it was like you said i mean i think by then a lot of the industry was already starting to see that kubernetes is going to be the path forward. And you know it obviously look at where we are a few years later it's like the world so.

00:16:06

Speaker

Before Portworx came to you at DreamWorks or to that team that was building this, did you try to manage any of the state components yourself? There was some primitives still around at the time, and like you said, Portworx was early, but did you try any other either solutions, open source, or pinning to host paths? I've heard of all of those. It was a lot of that.

00:16:32

Speaker

There was no solution from a vendor or anything that we were doing. We were just trying to figure out how to do it on our own. Basically, the solution ended up being a lot of host-mounted volumes.

00:16:44

Speaker

and then doing things around labeling of where those volumes. So on top of just the Docker, we had built a config man. We use config management to deploy out those containers. So I'm going to get into a little bit of the weeds. Sorry. It's OK. This is great. Yeah, we have that. So the containers themselves were one thing. We'd have our own container images.

00:17:13

Speaker

But even within them, even those we wrapped in unit files. So we were Red Hat Shop at DreamWorks. So everything was system D. So we had our own unit files that we would effectively create on top of a host. So there was like a unit file.

00:17:33

Speaker

And then we had config management that would not only go

Role of Operators in Kubernetes

00:17:36

Speaker

and create that unit file there on that host, but also create a bunch of logical volumes and then configure the unit file to use those logical volumes so that when you did, for example, service foo or whatever it is, it would automatically mount those volumes and it would be kind of pinned to that.

00:17:57

Speaker

to that. And we'd also inject IP addresses in there. I don't know how much I should say about their environment, but basically there's a flat network. And so the idea of being that any Docker container that we had started anywhere in the studio would have a dedicated IP address, so then that container could be routed to from anywhere else in the studio.

00:18:22

Speaker

Um, and some of that thinking has gone into later, like the Calico guys or who turned into the Calico guys had come to us again very early on and, and, uh, what they were showing us that they were thinking about in terms of.

00:18:40

Speaker

Um, just, you know, just the project and sort of their BGP solution. It was a real easy fit for us because we were like, we do exactly this already, except we use, um, we don't use BGP. Um, we, you know, we, we use a different protocol, but other than that, it's the exact same thing. And we just set up our own Quaga routers on every single one of our Docker servers that manages the, uh, the networking layer. So.

00:19:09

Speaker

Anyway, uh, that was, that was, sorry, I went on a note working tangent. We ran exact, almost the exact same thing, but with BGP and athenia health. So yeah, it was, uh, we managed all the Quaga, uh, configuration, everything with, uh, Ansible and it was, you know, I would say it worked great when it worked, uh, but it definitely needed a constant maintenance. Yeah. I mean, for us, it actually worked great. Um, the only problem ever was just.

00:19:35

Speaker

uh, tracking the allocation of subnets and making sure that, you know, I am standing up a new Quaga router over here. Let's make sure that the network that I'm assigning to it is, you know, unique inside of my, you know, we had like a slash 16 or something that managed like all the IPs for all of the, all the containers. But anyway, on the storage side, the answer, the answer to your question is no, we never looked at sort of an automated or vendor solution. We, we.

00:20:05

Speaker

Maybe we were naive or maybe we just never found it or didn't look hard enough, but it was always, we're just going to host mount volumes. And, um, that was pretty common. I feel like part of that also was we, we were doing everything on bare metal. So we didn't have any, um, we didn't have to worry about, like we felt like we could get away with, um, are we not even, we could get away with, we needed sort of that like line level.

00:20:31

Speaker

speed, these were databases that were running inside of these containers. And we had maybe a misconception that if we had any sort of software layer in there, that it might affect the performance. And then the Portworx guys actually kind of cleared that up for us. They would show us demos. And the great thing about some of the early stuff that

00:21:00

Speaker

you know, goo and the other guys at Portworx were showing us were around like, you know,

00:21:07

Speaker

This can actually increase your performance. So first of all, you can tune the IO profiles and you can say, hey, I want to F-sync, I want to bundle up my F-syncs and send them at this RAID, which was eye-opening to us. But then also parallelizing read requests across multiple nodes could actually make your read requests faster than what we were doing by just hitting a single volume on that host.

00:21:32

Speaker

We're like, whoa, OK, this is this. There's there's something here. So in this journey of like going to as a service, you obviously you were responsible for Cassandra. Were there other data services or databases that you were looking at to and doing all of this in parallel? So you had to figure out how to run all of those on the same bare metal servers or was this like, oh, let's try Cassandra first. If we figure that out, we can extrapolate and bring in other apps as well.

Vision for Database as a Service

00:22:00

Speaker

Yeah, very good question. So

00:22:04

Speaker

Yes, there were many data services there. So we started with Cassandra. We figured if we could prove out that Cassandra ran in a container and sort of on demand, then we could probably do that for other things. Working backwards, as I mentioned, we kind of had a platform to automate this stuff with config management and unit files and stuff. So what we wanted to do, or what we thought was the right way,

00:22:33

Speaker

was the thing that's ultimately getting deployed, the thing that's containerized. Let's treat that as a commodity and not care whether that thing happens to be Cassandra or happens to be Elasticsearch or happens to be Mongo or, you know, in total, there were like over 10 different data services that we ended up doing this for. So it's like, let's not care

00:22:59

Speaker

about what it is that we're deploying, if we all commonly agree that Elasticsearch is going to behave the same way or Cassandra is going to behave the same way, then all those things can basically be deployed using the same automation. So that same template that we had for the unit file could be used, the same config management tool that we had that would

00:23:26

Speaker

create the files and set up the volumes and all that other higher level automation we could just plug into. And so then the core of our work started turning into, okay, what is that common set of interface, if you will, or like, you know, what is the specification for that, for that application in order for it to be

00:23:51

Speaker

deployed using the other tooling. So we basically created what we called, there was an internal project at DreamWorks called Stella. And so we basically created this Stella image specification. And so we knew if the Elasticsearch image met that specification, then it could use the rest of the tooling. Or if Cassandra met that specification, it could use the rest of the tooling. So yeah, we ended up blowing out beyond just Cassandra into many, many different

00:24:19

Speaker

I think a natural question I have is, you know, database as a service or getting a production database in Kubernetes.

00:24:31

Speaker

takes a lot of moving parts, right? You've gone into networking, you go to persistent storage, you go into configuration, right? There's a lot of things to do here. And I think that's, I think we're getting a good sense of, you know, why, you know, building a service like this matter to your team at DreamWorks. But I think the question here is, you know,

00:24:52

Speaker

If you were thinking about running production level production grade databases on your Kubernetes clusters, say, you know, fast forward today, a lot of

Reflections and Key Takeaways

00:25:00

Speaker

organizations are running Kubernetes today. A lot of them are running stateless components and stateless services, and they want to add databases. Do they go the route of

00:25:09

Speaker

building their own database as a service, right? Finding all those layers and what works for them, or do they look for sort of a solution like you built at DreamWorks? I think, I don't know how long it took you to go from conception, it sounds like around 2013 or so, to a workable Stella in production, but I'd love to hear your opinion there. Yeah, for sure.

00:25:37

Speaker

I only touched on the tip of the iceberg on what we ultimately found out is even harder than just figuring out networking and storage, to be honest with you. Some of that stuff, in the end, ends up being the easy part. And most of that journey that I was talking about was even before we got to Kubernetes. So that was just in Docker.

00:26:04

Speaker

And there are some added complications we should all get to in Kubernetes. But there's also tooling and primitives and that sort of stuff that made it easy, like I don't want to say easier, but made it possible to accomplish in Kubernetes to automate some of the things that we were doing again as Meetware at DreamWorks. And so the first thing is the first thing we should really talk about when we're talking about databases

00:26:31

Speaker

And we thought early on, OK, there's this notion of a stateful set that Kubernetes has now. And so that's going to solve all of our problems with state. And what we quickly realized is that was not true. And Portworx had come along and done a great job with software-defined storage and did answer the problem of state where I'm going to save my data.

00:26:56

Speaker

But what our team discovered is that that's not the hard part about running a database in Kubernetes. The hard part is that most of these database applications are actually distributed applications. And even to this day, if you look at when people are talking about pods or talking about containers or images, it's always on this notion that the image that I'm building

00:27:21

Speaker

fully encapsulates an entire application. It's like, OK, if I want to run nginx, I just deploy this nginx container. And I have a fully running nginx pod or container that's running here. And that's great. But databases aren't like that. Databases, for the most part, are distributed systems.

00:27:46

Speaker

When I'm talking about Cassandra, I'm not talking about a Cassandra container or a Cassandra pod. I'm talking about three of them running or seven of them running. And how do they communicate with one another? And how do they know that when, if a pod starts and this is where Kubernetes makes things hard, right? Is this notion of like, um, this ephemeral nature of your pods where when we were in our old raw Docker system,

00:28:14

Speaker

We didn't have to worry about IPs changing or things moving from one server to another or any of that stuff, but we'll come into that. But in any case, so if I have a three node Cassandra cluster and I want to scale it to five nodes, how do those two new nodes know where the other three nodes live? How do the other three nodes that already were there, how do they know that the nodes that are joining that they should be part of the cluster or

00:28:41

Speaker

In a more likely scenario, let's say I had a five-node Cassandra cluster, one pod crashes, Kubernetes reschedules it and brings it back up. How do I know that that new node that's joining is the same node that crashed earlier and not just a new node that's... So this whole notion of cluster membership and failure detection for distributed applications, those questions were really hard to begin with. As you know, the two hardest problems and

00:29:10

Speaker

in computer science are naming variables and distributed systems, right? So it's like these are not easy things to solve outside of Kubernetes and then trying to containerize these applications and figure out how do I deploy them in a robust manner inside of Kubernetes is really where the challenge is. And this is where you have to get in at the application level and understand what the applications are doing.

00:29:40

Speaker

What I was just describing about with the Cassandra cluster membership, those questions are answered completely differently for Elasticsearch or for Postgres or for Couchbase or for Kafka has their own system.

00:30:00

Speaker

There has to be an application level understanding for each one of these things and whatever system you build and whatever images you build have to be aware of what is specific to Cassandra and how do I re-replicate the data in a Cassandra cluster, how do I re-partition data in Kafka and those types of things.

00:30:19

Speaker

I forget what the question was. The question was, do you want to handle this stuff yourself? It's a lot to think about and if you maybe just have one database that you care about and you have the expertise in-house to both know at the application level what it means to repartition your Kafka data and also at the Kubernetes level where you know how to

00:30:47

Speaker

create that automation, whether it be as an operator or just building the images and all that stuff, then I would say you are in the minority. There are only a few big organizations have those kinds of resources to dedicate to much different data.

00:31:08

Speaker

It's a Venn diagram, right? And the overlap of the Kubernetes expertise and the database expertise, that overlap, unfortunately, I don't think is that great in volume.

00:31:23

Speaker

Yeah, not very large. I mean, that's definitely, uh, I think what we've seen in our day jobs is, you know, uh, organizations are, are, you know, using Kubernetes as a de facto, uh, you know, container orchestrator, but there's still many of those are early days in, in considering, okay, if I, if I want to run my staple things on there, like the Cassandra and Kafka and last I shared that you're talking about.

00:31:49

Speaker

That's a whole new set of concerns right and we were the question was really around like what what other things besides persistent storage would you really be need to consider to run a production database and you've answered a lot of these and i think.

00:32:05

Speaker

One of the important ones I heard was, you know, Staples sets didn't fully solve the problem because it was a Kubernetes level abstraction and you need that application level integration. I think a follow up to that would be, you know, now that operators are around, does that help in this story, right? Or are there still, you know, things that you, you know, you would build into a database as a service that even operators don't really kind of target?

00:32:33

Speaker

For sure. Yeah. So for sure, operators help. But they're also required because of the added complications that Kubernetes has. Like I was saying, with that ephemeral nature of a pod, it can be reassigned to new IP at any time. So now you have to code for that. And the only place to code for that and have that logic would be in an operator.

00:32:58

Speaker

And we'll, we'll come to that in a second. I, I just want to point out, like there are other complications besides just, um, you know, we talked about cluster membership and failure detection there. And I don't, I don't need to go into detail about them, but like the other really big thing that's complicated with these distributed applications. Um, and again, it is because they're distributed, not because they're stateful, but because they're distributed. My, uh, there's this notion of ingress.

00:33:24

Speaker

And I use Ingress with a lowercase i, not a capital i, not like the Ingress object in Kubernetes. Sure. But just the notion of directing traffic into your pod. And the reason I bring it up is because, again, these things are distributed systems. So imagine, let's talk about a Postgres cluster, for example. So maybe you've got

00:33:49

Speaker

three pods that are running that are a Postgres cluster. One of them is a primary and the two of them are replicas. So from a client perspective, when you're writing data or when you're using that Postgres cluster, you need to send your write requests to the primary. You can't send your insert

00:34:11

Speaker

values into your secondary or your replica because it'll fail. So you have to know how to find that primary. Likewise, if it's a reader request, you could maybe send it to any of those three, but most of the time you would want to purposely send it to a replica because you want to offload the traffic from the primary. So there's this notion of

00:34:37

Speaker

of being able to direct traffic into specific nodes in your distributed application. And that's something that the native Kubernetes primitives don't allow for right now. You've got a notion of a service, which is backed by many pods. But what it's really going to do is it's going to load balance the traffic between your three Postgres pods, which is really completely useless to

00:35:05

Speaker

for Postgres or for any database or for any distributed application.

00:35:10

Speaker

And so the way you get around that is by creating your own custom resources or your own objects in Kubernetes to handle that logic, but to create the primitive, the custom resource itself, and then also the logic of how this thing should behave. So that's the operator, right? So you've got your CRD and you've got your controller that's handling all the logic.

00:35:35

Speaker

So for sure, you need at a minimum, if you were going to create this system by yourself, you would at a minimum need to use operators to solve some of these problems. And to their credit, a lot of these database vendors, a lot of these ISVs realize that also and have started

00:35:57

Speaker

um, creating, creating their own operators. Um, you know, certainly crunchy's got one for, for Postgres. Um, you know, their data stacks just moved from home to an operator for, uh, for their Kate Sandra as well. Yeah. Side note. Um, I was very, um, involved in the community for, for data stacks and the Cassandra operator development. Um, and working with that team for some time as well. Um,

00:36:27

Speaker

Honestly, offering a lot of some of the lessons learned that we had at DreamWorks from the STELA development and the STELA operators that we had built and kind of helping trying to guide some of the, how do you handle some of these things? Yeah, which I imagine could be an entire podcast episode and some of the lessons learned.

00:36:49

Speaker

You know, because we're getting towards the end of our time here, I wanted to sort of ask one more question, which is, you know, it sounds like there's a lot of components that go into database as a service. There's, you know, the deployment, the administrative tasks, the how do you have clients, the ingress, the managing distributed systems and members. And, you know, it is a very complex, you know, orchestration of things. And so as a team,

00:37:17

Speaker

And maybe this relates back to your original decision of why you chose to go on this eight or nine year journey that you've been on of why did you decide that this was a good fit for your teams at DreamWorks and why would an organization or a team gravitate towards a database as a service technology rather than trying to figure out all these Kubernetes primitives or using operators?

00:37:42

Speaker

Yeah, for sure. So at DreamWorks, the goal really was to provide a platform for our developers, right? We didn't expect our application developers, the ones developing out the microservices.

00:37:55

Speaker

to either know anything about databases or to care where they were running, right? It should have just been abstracted from them. All they care about at the end of the day is a connection string, right? I'm developing a new microservice. It's got a couch-based backend. Just tell me what I can connect to and where I can insert my data.

00:38:14

Speaker

The ops team, the DBAs, you guys figure out how to keep it running and make sure it scales and do all that stuff. So that was the vision at DreamWorks, is just to provide a connection string, complete abstraction for developers. Here's a platform, and you know that you can spin it up on demand with an API, whatever it is.

00:38:33

Speaker

Or go to UI and and you've got a database and here's your connection string and and and that's it and so that it's it's the easy button for them, right and and what You know what I've had a great relationship with the poor works guys as I mentioned over the years and I

00:38:56

Speaker

in having shown them what we had built at DreamWorks, I think they immediately got that there was an application for all this hard work that we had done. And so my work here is continuing and solving all those hard problems and making that available in like here.

00:39:15

Speaker

We think all developers, regardless of where you work or whatever, whatever even the size of your businesses, if you're a startup and you're a developer, if you're developing an iPhone app and all you care about is a database endpoint, or if you're in a big multinational telco and you're an application developer and all you want to focus on is your

00:39:39

Speaker

is your application and you just need somewhere to insert data. That's a common thing across organizations of different levels of sophistication, market size and firmographic segmentation or whatever. It's a common need. And so really the idea is like, okay, let's let developers

00:40:04

Speaker

just focus about their applications. In the same way, as I went through all that complicated stuff, it becomes obvious. If you want to solve those problems yourself, sure, you might be able to come up with an implementation. It might be naive. It might be sophisticated. But keeping that implementation, have it evolve over time, make sure that when version 1.22 of Kubernetes comes out, it doesn't break. Or when Cassandra version 4.0 comes out,

00:40:34

Speaker

as it did, the changes that they made in their architecture don't break your automation. In the same way, hopefully people know that these are hard problems and you need a dedicated team that's constantly answering these questions and responding to them in the same way. Sure, I could build my own password authentication system for my application. Why not?

00:41:02

Speaker

It sounds easy, right? Or, you know, I can understand that when I start keeping passwords and storing them and I started getting all sorts of compliance issues and I have to like, there's so many things that I need to worry about. I'm like, oh crap, maybe I should just let somebody else, you know, do the heavy lifting here.

00:41:24

Speaker

Yeah, I think those are those are words of wisdom to end on. I think here is let someone else worry about it. You know, I think it's somebody else's front office. Exactly. So, I mean, I think it really, you know, shows that these these systems of running stable components on Kubernetes are.

00:41:44

Speaker

are complex, you know, hard to manage, you have to continue to maintain them. And, you know, if you're running a single database or those kind of things, like, sure, but once you start scaling and you're running distributed systems, like having that easy button is so important. So I've learned a lot today, Amir, and it's been, it's been such a pleasure to have you on. Yeah, these like details are really important. And you can only talk about these if you have gone through the process, right? As you said, it took you like seven years

00:42:12

Speaker

to build a database as a service platform that you felt was easy enough to use by developers who can just use a UI, use a CLI, ask for a connection string, and that's all they care about. It takes time for every enterprise to reach this point.

00:42:27

Speaker

Yeah. Amen. Well, Amir, thanks for being on the show. Really appreciate it. And until next time. Sure. Let me know if you want to want me to come back for that deep dive on lessons learned. I think we might have to have you on here. Yeah. It's really great talking to you guys. Thank you so much. Thanks, Amir. Take care. All right.

00:42:48

Speaker

Well, that was, I think, fantastic. I don't know about you, Bobbin, but I really enjoyed that conversation with Amer. As we have more guests on the show, whether we work with them for long periods of time, short periods of time, or not at all, I think it's really eye-opening to hear their background and why they did things, especially with Amer's experience at DreamWorks, conceptualizing the whole thing and why he wanted to do it. That was really interesting. What about you?

00:43:16

Speaker

Oh, yeah. I know the episode ended up being a longer one, but there were so many great details. The level of detail that he was able to go into and discuss his seven-year-long journey about building such a platform, it's all about identifying your risk-reward ratio. If you are going to bet $1,000 and if the reward that you're going is also $1,000,

00:43:39

Speaker

you might not be willing to put everything at risk. But then if by just placing a bet of $1,000, you're getting like 10,000 in return, that's where you should be investing your money. So that same thing applies for building a platform. Like if it's going to take you seven years, it might take you a long enough period of time to earn the rewards. But if you can just rely on a vendor solution, somebody who has gone through all the process, you can get to those rewards

00:44:07

Speaker

really quickly and start adopting these new and modern solutions like right now instead of waiting for seven years. So that was like one key takeaway for me, like instead of spending all the time and resources, I should just look at people who have done this before, gone through the same journey and know what the best practices are for building such or operating such an as a service platform.

00:44:31

Speaker

Yeah, I would absolutely agree with that. Figuring that out as an organization is key. I think, for me, what struck me is just the complexity of the whole thing. Obviously, seven years, it's not an easy thing to do. So taking the approach, we often look at a problem, especially on this podcast, from the storage perspective, the data management perspective.

00:44:56

Speaker

We've really hone in on those pieces and those data services. But when you really think about a data service is really just another application. Any application could be a custom built application that isn't a database. It's still going into production. It needs a lot of things, right? It needs the monitoring capabilities. It needs user friendliness. It needs the networking components and all this other complexity that gets wrapped into this easy button.

00:45:25

Speaker

Right. That Amer was really driving towards. So I think it really showcases just how hard some of these problems are. And Kubernetes, to the point he made around staple sets and operators, is that those help with some of the problems you're trying to solve. But Kubernetes doesn't just solve things for you, right?

00:45:49

Speaker

It's a platform for building things like it can give you an orchestration layer, but then you still have to figure out all the additional details.

00:45:57

Speaker

around any application that you might want to deploy a data service or not. Exactly. So full circle, don't build your own database as a service. Go and use one. There's a really cool one that just came out. I'll say that one last time. Go take a look at it. But if you can and please want to review this podcast, do that on

00:46:21

Speaker

Apple Podcasts or wherever you can review podcasts. We are going to have another guest on the show talking all about how you can use Kubernetes as a team of one for really managing and scaling if you don't have a whole team of engineers, which I think may be near and dear to many of our listeners. So that'll be exciting.

00:46:45

Speaker

It sure will be. It's not just about people who are getting started and are just that single person on the team. But if you are the pioneer, if you want your organization to adopt Kubernetes, how can you get started with Kubernetes on your own and then bring in the larger part of your team? So it will be a learning experience for everyone. Yep. All right. Well, Bhavan, until next time. Yeah, that's it.

00:47:12

Speaker

Thank you for listening to the Kubernetes Bites Podcast.