Introduction to Kubernetes Bites
00:00:03
Speaker
You are listening to Kubernetes Bites, a podcast bringing you the latest from the world of cloud native data management. My name is Ryan Walner and I'm joined by Bob and Shaw coming to you from Boston, Massachusetts.
Cloud-Native Insights and Industry Challenges
00:00:14
Speaker
We'll be sharing our thoughts on recent cloud native news and talking to industry experts about their experiences and challenges managing the wealth of data in today's cloud native ecosystem.
00:00:28
Speaker
Good morning, good afternoon, and good evening wherever you are. We're coming to you from Boston, Massachusetts. Today is March 16, 2022. I hope everyone is doing well and staying safe. Let's dive into it.
Weather in Boston and Personal Updates
00:00:43
Speaker
Bhavan, it's starting to become beautiful in Boston. I am super excited. I took a little walk today before recording of this podcast because
00:00:53
Speaker
I just got to start getting outside. What are you up to? Nice. I'm excited too. Like I'm loving those posts. Six, 36, 45 sunsets. Daylight saving is officially done. I don't know. I don't remember if it's started or ended, but whatever this is, keep it as is. I also saw a US Senate passed a law that they don't want to mess with daylight saving. And I was like, yes. I saw that. I was just going to bring that up. That gets me so excited actually.
00:01:22
Speaker
No longer do we need to flip back and forth.
00:01:26
Speaker
I mean, I think a lot of people will be excited about that. I know it's, it comes up in our slack all the time of like, FYI, now you have, you know, this other one hour difference for a short period of time, and then your friends in the EU. I still have to explain my parents back in India that usually it's either a one 30, one and a half hour worth of difference in the time zone, like day and night, but and then when it flips over, it's two and a half hours. So like we have to coordinate that timing. So
00:01:53
Speaker
If it's one thing, I can explain it once and just let it go. I think it would make life easier. That's obviously our opinion, of course.
Main Topic Introduction: Postgres on Kubernetes
00:02:03
Speaker
So today's topic, we have Postgres on Kubernetes. But before we dive into that, which is an exciting topic, we have a great guest today. But we have a little bit of
VMware CSI Driver and DataStacks Updates
00:02:12
Speaker
news to talk about. Why don't you kick it off?
00:02:13
Speaker
Yeah, sure. So the one thing that I had a few things to talk about, the first one being if you are a VMware customer or if you have been using the vSphere CSI driver, there's a new version out called version 2.5. And good news is now that it supports CSI snapshots for block volumes. So if you are using your VMware storage policy based management or SPBM approaches and provisioning those persistent volumes on your vSAN or VMFS data stores, now you can create
00:02:43
Speaker
Snapshots for those persistent volumes from inside Kubernetes. And with this new version, they added a couple of components. One is a snapshot controller that actually is responsible for the creation and deletion of snapshots and then binding the volume snapshot with the volume snapshot contents that backs it. And then the second component is a new sidecar that they added to their pod called CSI Snapshot, which actually triggers this operation. So now finally, we have CSI Snapshots in the VMware ecosystem.
00:03:14
Speaker
The second thing I wanted to talk about was the new operator that DataStacks introduced, or just not completely new, just enhancement to an existing Cassandra or Kate, Cassandra operator, which now allows users to provision Cassandra clusters across multiple communities clusters. So that's quite interesting. And I think Ryan, we already have
00:03:38
Speaker
A thing set up with patrick from the facts to come on this podcast and talk about that in detail so i won't go into too much detail but that's another interesting announcement from the past week.
00:03:50
Speaker
Yes, we do. Yes. And then lastly was around security. And there was a new CVE. It was 2022-0847. It's called the dirty pipe CVE. And it's basically a vulnerability that allows users on a Linux system to override the contents of a file or a container image that they can only read but shouldn't
00:04:15
Speaker
be able to write to. So if you only had read on the access with this vulnerability, you can actually modify things. So attacks can be around like if somebody gets access to your pods running on your Kubernetes cluster, and if they share a container image, they can make changes to that container image and affect all your other workloads. There is a fix available. So again, we'll share a link to Aqua Securities blog around it. So make sure you have your environments updated. But yeah, that's it for me.
00:04:47
Speaker
That's a damn, that's not a good one. Definitely want to go patch that one up. A few things for news for me.
KubeCon EU and Upcoming Webinars
00:04:56
Speaker
The first one is the KubeCon EU schedule is officially out. We are going to plug ourselves. There is a day zero event on the 17th. We have a Kubernetes data workshop.
00:05:08
Speaker
Uh, which we will link to in the show notes, um, all really about understanding how cloud native data works on Kubernetes. We'll be actually using Fortworks and doing some labs and understanding some of the things that you have to think about when deploying staple.
00:05:26
Speaker
applications on Kubernetes. So a lot of fun there. The other one is the Secret Store CSI driver. There's a webinar link that we'll provide, but really this CSI driver is your non-traditional CSI driver. If you think about CSI drivers in the aspect of storage, this one's all about secrets, of course. So it is a sort of standard way of working with
00:05:52
Speaker
many other secrets providers to be able to use a single CSI driver to interact with them, which is actually a pretty interesting concept of treating external secrets just like a piece of storage. It really is storing something, just storing something that's secure. Oh, I need to register for that.
00:06:12
Speaker
Yeah, definitely a good one. And the last one is just a link to a YouTube video. And it's, it's, it's titled What is a Kubernetes controller? I think this is an interesting video. It's 35 seconds long. And it's
00:06:27
Speaker
really just someone holding this physical box and explaining what a Kubernetes controller does. Now we did some episodes on operators and I think at a very basic level, this video does a really good job. It's kind of, it's a little silly, but at the same time- It's part of the people in our audience who use TikTok and Instagram reels a lot. Like this is the video for them to understand how Kubernetes operators work. Listen, you know, I, you know, power to the person who did this. I actually think it's kind of interesting, but it explains it at a very basic level. So we'll throw that link in there as well.
00:06:56
Speaker
I'm just jealous how you find, keep finding these interesting videos and blogs. Like I, I still laugh from time to time about that. It's such a fun thing to say. I still love saying it. Oh man. Uh,
Special Guest: Gabrielle Bartolini
00:07:11
Speaker
all right. Well, uh, let's get into our topic then. Um, today's topic postgres on Kubernetes and we have, uh, Gabrielle or Bartolini. I hope I'm saying that right. Gabrielle.
00:07:23
Speaker
And he is a Postgres and Kubernetes enthusiast. He's a VP of Cloud Native at EDB and a co-founder of Postgres SQL Europe and a founding member of Barman, a backup tool for Postgres. He was previously the head of global support and co-founder of Second Quadrant, where he consistently contributed to the growth of that organization before it is now part of EDB.
00:07:50
Speaker
and all about the DevOps culture in Kubernetes. So we're really excited to have him on the show. Let's get him on here. Welcome to the show, Gabriel. It's nice to have you here on Kubernetes Bites. We ask all our guests the same question when we first dive into it.
Gabrielle's Journey with Postgres and Open Source
00:08:07
Speaker
Welcome and tell us about yourself and what you do.
00:08:11
Speaker
Hi, Ryan, thank you. So I work for EDB. EDB is the largest contributing company to the open source Postgres project, PostgreSQL project. I'm the VP and CTO for cloud native at EDB. And my goal is to primarily foster the adoption of Postgres in Kubernetes with a DevOps mindset. Previously, I co-founded second quadrant,
00:08:38
Speaker
which was a well-known POSGIS company operating from 2008 to 2020, when it was acquired by EDB. And at this time, EDB was our major competitor. So while at second quarter, I covered several roles, and these include, for example, head of global support and infrastructure.
00:09:02
Speaker
Talking about my background, it's computer programming, statistics and data warehousing. I also studied business management, entrepreneurship and strategic leadership. And this led me to actually fall in love with DevOps and the DevOps culture.
00:09:25
Speaker
and pretty much funded my entire career on teamwork as a way to address and innovate in our complex world.
00:09:38
Speaker
I was listening to one of the data on Kubernetes talk that you did with Bart and I learned that you have been working with Postgres for the, like since it's kind of inception, right? Like, and that just blew my mind. Like, okay, I don't know anyone else who has been involved in a specific community for more than 20 years. So tell us about like, uh, how, how did you start working with Postgres and like, where are we now kind of a thing? Yeah. Okay. So, uh,
00:10:06
Speaker
Basically, I fell in love with open source. Okay, so it was the 1990s. I fell in love with Linux. And you know, it was the time of the internet. Okay. And the internet opened up, you know, these amazing and unprecedented learning opportunity. So I actually started with MySQL, you know, I don't want to go into details. But I was working in the open in the search engines area and link checking area.
00:10:36
Speaker
And I remember that I tried to use MySQL, the new engine for foreign key support. And I actually ended up losing all the data. And because of the time, you know, my friends, my friends at the local Linux user group, and one of them, you know, Marco, is a colleague of mine, he kept telling me, you know, just switch to Postgres, just switch to Postgres.
00:11:04
Speaker
I love PostgreSQL, it was fascinating. Okay, so that gave me the opportunity to definitely move to PostgreSQL. So yeah, sorry. No, no, go ahead. From there, basically, I started to promote PostgreSQL in Italy. And I organized together with other friends at the local city
00:11:29
Speaker
local, you know, my home city, Praso, you know, the local user group, Linux user group, we organized the first event of Postgres in Europe in 2007. And that, you know, in that occasion, more than 200 people came to my city from all over the world. That's how I got in touch with the
00:11:53
Speaker
with the community. And that's how we founded, for example, the European Postgres Association for Postgres called Postgres Europe, which now organizes the major conferences in Europe for Postgres. Awesome. How was the migration from MySQL to Postgres? How big was the installation? What did you do? What was it running on?
00:12:20
Speaker
Okay, so yeah, I was developing this link checker called HDCheck. So I was the main developer and I was, you know, it was a spider that would just crawl a website and all the links and storing the information in MySQL.
00:12:44
Speaker
And yeah, no, basically, it was probably my mistake, you know, when I moved to the InnoDB engine, but because Postgres already had support for views, for foreign keys, you know, indexes, and yeah, that gave me, that was like the event that triggered my definite move. Makes sense.
00:13:08
Speaker
So I think, you know, since we're Kubernetes bytes here, the obvious questions that I want to ask is, you know, you've been involved with Postgres for quite a while. Containers have been sort of, you know, mainstream since sort of the beginning of thousands into where we are now with Kubernetes. So where did that journey start in containerized Postgres and now Postgres and Kubernetes or Cloud Native Postgres? Okay. So,
00:13:37
Speaker
Basically, after the community involvement, I was lucky to start a company called Second Quadrant with Simon Riggs.
00:14:00
Speaker
experience an unprecedented volume and scale of databases. I thought I had been managing large databases until I started to work with second important.
00:14:14
Speaker
And so we actually, with Simon and the rest of the team, we were two or three people at the start. We ended up with more than 120 people in 12 years before EDB acquired us. We did a lot of things. We improved Postgres. And we also, for example, I started an open source tool for backup and recovery called Barman and so on. So I think we have been living
00:14:43
Speaker
a lot of history of Postgres. And one of the most important capabilities of Postgres has always been this capacity to adapt to the world around Postgres. So I think that's how the Kubernetes thing happens. So for example, I can give you some examples
00:15:06
Speaker
Around two thousand, you know, XML came out, so everyone thought that, you know, that was the ultimate solution for managing data, you know. And what Postgres did was actually introduced support, native support for XML and XPath. Okay, same thing, you know, with NoSQL. Everyone thought that NoSQL would kill SQL.
00:15:26
Speaker
And, but what Postgres did was to actually learn from the need of unstructured data. And it actually introduced native support for JSON, enabling multimodal databases that could hold both structured and unstructured data and be queried via SQL. Then, you know, we had, you know, primarily Postgres was used when I started in bare metal.
00:15:51
Speaker
situations where we used to cram a single physical instance with multiple Postgres servers listening to different network interfaces or TCP ports. And each of them working on separate volumes that were mapped to different disk spindles in hardware RAID controllers. That's how it used to be. Then VMs came out
00:16:16
Speaker
And we started to think about creating multiple VMs with similar installations of one instance per VM. So around 2015, that's where I started to actually work more closely with containers. And I remember the first times I was talking about running Postgres in containers, people thought that I was crazy.
00:16:45
Speaker
Stateful apps were not as popular back then. And then we know the story of Kubernetes, the standardization role that it played. But I think that it was in 2019 when we saw that local persistent volume support was introduced
00:17:10
Speaker
and probably this general adoption of the operator pattern. That's where we actually thought, OK, this is probably the right moment for us to jump
Running Postgres on Kubernetes: Challenges and Performance
00:17:20
Speaker
in. So at Second Quarter, we started in 2019. I think it was August 2019, an exploratory initiative to understand whether with a fail-fast approach, whether this was feasible.
00:17:34
Speaker
So the first attempt we did was to run POSGIS clusters in bare-metal Kubernetes with shared nothing architecture. So think about having three physical nodes with local disk and each node was dedicated to run a single POSGIS instance.
00:17:58
Speaker
Many thought that even this was kind of an anti-pattern in Kubernetes. You have Kubernetes, why do you want to dedicate and hold? But we actually obtained impressive results and I remember I published a blog post that became quite popular about this. We used Open EBS at the time and we discovered that we were able to go as fast as bare metal.
00:18:27
Speaker
So that's when we understood that Kubernetes was not only feasible, but for us, the way to go. I had never seen Fosga's high availability done the way it's done in Kubernetes. That's my opinion. So even for that, you pretty much have the full range from a shared environment where you share the node with other workloads.
00:18:55
Speaker
and you then share the storage to the high performing, dedicated, even bare metal installation and run POSGUS in it.
00:19:07
Speaker
So that's how the operator, our operator started. Gotcha. And like high availability definitely is one of the benefits. But like, according to your experience from running Postgres on Kubernetes from 2019, what are some of the other benefits that DBAs out there can get if they just switch to switch their databases and run it on Kubernetes?
00:19:28
Speaker
So yeah, I would say that the main benefit is to actually run Postgres inside Kubernetes, not on Kubernetes, but inside
00:19:39
Speaker
inside Kubernetes. To put it simply, it means being cloud native and taking advantage of all the DevOps principles and capabilities when you try and build microservice-based applications that also extend to the database. This includes, for example, automated pipelines for continuous integration and delivery and security and so on.
00:20:07
Speaker
So the main benefit for me is that application developers end up owning their application database and, for example, can track schema changes by a migration. And when I talk about cloud native, that's what I've been thinking for the last few years. It's pretty much three things.
00:20:31
Speaker
an organizational culture that is founded on DevOps principles. So I think that's the kind of mindset that then originates the requirement of microservice architectures that are based on Kubernetes, or non-containers, sorry, this is the second one. And the third one is that these containers need to be managed
00:20:57
Speaker
by a container orchestrator. And today, the de facto solution is Kubernetes. So I think it's more a philosophical or organizational reason. What about some of the challenges? Have you seen while working with customers or while just talking to people in the community, what are some of the challenges that people face when they try such a migration?
00:21:24
Speaker
Yeah, I think, you know, the challenges are too, in my opinion. Okay. When you run POSGIS on Kubernetes, you need to know both POSGIS and Kubernetes. Right. Okay. So I think those skills are required. Okay. And I think sometimes there's probably an underestimation of the skills required by Kubernetes. We think like it's an autopilot kind of system. Okay.
00:21:51
Speaker
I mean, there's a lot of benefits, but still you need those skills. So if both are not possible, in my opinion, and this probably depends on the selected operator for Postgres that you have, I think that it makes more sense to invest in the bottom layers. So for example, Kubernetes, understanding Kubernetes and what's underneath Kubernetes. So this can vary from organization to organization. It could be public cloud.
00:22:20
Speaker
private cloud, self-managed Kubernetes, provider managed Kubernetes, OpenShift, Ranger, whatever. So I think I would concentrate on getting the skills there. And the next challenge is, of course, to run Postgres. And in my opinion, from what I can see is that the more we go, we move forward, and the more I see similarities with the bare metal and visualized environments. So in my opinion,
00:22:49
Speaker
again, storage is the most critical component. And we cannot proceed from benchmarking in the capacity planning and decision making process. So that's my biggest challenges. And if, for example, there are professional, of course, professional positive organizations that can help with benchmarking as well, you know, other organizations for Kubernetes and so on.
00:23:19
Speaker
Got it. Yeah, that makes sense. I think, you know, to fully realize, you know, the benefits of Postgres on Kubernetes, you have to kind of take on those additional complexities, kind of what you're saying of Kubernetes to really understand the DevOps workflows and what they do to the organization. You know, that being said, if an organization isn't bought in on, you know, those principles, you know, why not run Postgres somewhere else or something like that would be maybe a question you'd ask yourself.
00:23:48
Speaker
Now, some of the benefits you say is combining these two technologies, the Kubernetes stack and Postgres itself. Now, if you don't happen to really fully understand that Kubernetes stack, is there an easy button? Do operators get us there? Are there other projects that you've worked with that
00:24:14
Speaker
tend to help the DevOps teams wanting to deploy the Postgres themselves get further along without fully grasping Kubernetes? Well, I mean, my personal view is that we still need to understand Kubernetes because if things go smoothly, of course, you don't need that. But I think Kubernetes
00:24:41
Speaker
And I always suggest, for example, to get to foster CKA exams, you know, taking certifications, not because I believe in certifications, but I think it's important to have a common dictionary, common vocabulary within the organization. So to be everyone at the same, like of understanding of the concept. Makes sense. Yeah, I would. Yeah, I would invest in Kubernetes. But of course, the operators are, you know, what what can actually help
00:25:09
Speaker
mitigate these, you know, complexities, as you were saying. Yeah, they definitely do help. And we've done a few episodes on operators. And I would love to get your opinion on, you know, where do you think operators are in their own life cycle, because they are generally newer, and they're definitely being adopted by
Best Practices for Postgres Management in Kubernetes
00:25:27
Speaker
companies like EDB and others as sort of the standard way to deliver or deploy. So where are your thoughts?
00:25:37
Speaker
Yeah, I mean, I can talk about primarily POSGIS operators and I can talk without making too much publicity to our operator. I can talk about the concepts behind our operator, what we have been following and the mindset again, the kind of DevOps mindset that led us to develop a new operator.
00:26:06
Speaker
And which, by the way, we are considering open sourcing in the future. So I'm really looking forward to that. Anyway, I think the operator pattern, as we know, is probably the best way at the moment to
00:26:28
Speaker
automate the steps that a human operator would manually do in order to react to unexpected events or control planned ones. And that's, for example, Postgres is a complex application, especially if you want to manage that in a highly available and under business continuity requirements, which is pretty much the norm for Postgres today.
00:26:55
Speaker
So if you think about that, it's all about reconciling an object or a group of objects like a POSGAS cluster where you've got a primary and one or zero multiple standbys to a desired state when the current state is diverging. And this is what I kind of found fascinating about Kubernetes. So what we did was actually try and
00:27:24
Speaker
apply all the concepts that we built across, you know, 20 years experience. So all the manual steps that we were doing when, for example, Postgres was going down, we had to, you know, restart it, you know, change the IPs. So if we were using virtual IPs or other techniques, we had to do all of that. And we said Kubernetes has got everything.
00:27:46
Speaker
You know, this is the source of truth for everything, you know, we don't have to talk about the DNS, you know, and if something goes bad, you know, we had to. It's all about the state and the concept of state reconciliation loop. So that's, I think, what made it was a gut feeling that it probably was the ultimate way to achieve business continuously with Postgres. And I still believe that.
00:28:12
Speaker
Gotcha. So my next question is around like running Postgres on Kubernetes. We spoke about that, but talking to people in the community or customers, what kind of deployments do you see? Like is it people have just one big Postgres instance on a Kubernetes cluster and then they try to create multiple databases and share that across teams or it's more of, okay, you have different Postgres instances on the same or multiple clusters. Like how does that topology look like? How are people doing it?
00:28:40
Speaker
Yeah, I think, I mean, the reason for that, I think, is in the way POSG SQL implements, for example, replication, which is the foundation of, like, fresh recovery, continuous backup, point-in-time recovery, hosting by physical stream replication, and even logical replication. I mean, POSG replication is fascinating. So,
00:29:09
Speaker
Yeah, so I think that because the way it's done, it's an instance level. So if you have 10 databases, for example, the transaction logs, they are shared in a single transaction log. So again, following the DevOps mindset, following the microservice architecture,
00:29:32
Speaker
Our advice is to separate the instances. So each instance has one single database, which is the application database. So the developers own that instance. They can decide when to update Postgres to a major release, for example, when to back it up. If there's a problem because somebody deleted a table or did a
00:29:58
Speaker
a wrong update or delete, you can restore that to the point in time previous to that operation. So I think my preferred way is again the microservice approach. So I have several instances
00:30:14
Speaker
a cluster of one database, so with, for example, a primary and two standbys, and each database being a separate cluster spread across nodes. And, you know, my recommended approach for better predictability
00:30:34
Speaker
is to have a dedicated nodes for Postgres. So if you have, you know, if, you know, database must be seen, in my opinion, as the most important asset, you know, for your application and your company in some cases. So I think, you know, having dedicated nodes is the recommended approach.
00:31:00
Speaker
And yeah, thank you for those suggestions, right? Like having these distributed or following the microservices pattern and cloud native approach also helps like reduce the blast radius, but also gives everybody that sense of ownership, right? Okay. You can deploy it on your own and you can manage it on your own. And think about integrating data into a CI-CD pipeline so that the developers, you know, you can take advantage of infrastructure abstraction, you know, have everything in the pipeline, you know, do the test, you know,
00:31:30
Speaker
and then continuous delivery and continuous deployment, everything goes fine with the test and so on. Yeah, makes sense. I know a big part of what you're talking about is business continuity and how do you also do things like backups and how do you protect what's actually running in that Kubernetes cluster. I know EDB has its own backup services within an operator, but
00:31:58
Speaker
There are other solutions out there as well in terms of being able to take backups of containerized applications like Postgres. Where do you find the sweet spot for providing those types of data protection and backup services for something like Postgres?
00:32:15
Speaker
I think specifically, if you're only running Postgres in your shop, I think obviously using tools that are specific to Postgres, but many of these shops, as you just talked about, running microservices, and they may have Postgres and a bunch of other databases. So just looking for your opinion on that topic. Yeah, so of course.
00:32:41
Speaker
You know, knowing Postgres very well and trusting Postgres a lot, you know, to me, you know, the prime directive is protect data. Okay. And data is the most important asset. And also the way, for example, we have developed our operator, our operator, for example, that doesn't use statement sets.
00:33:01
Speaker
we actually manage our own way for system volumes. And we also rely on the Kubernetes API server to keep the status of the cluster. So this concept of the data volume that is the most important thing, I think it is reflected in the way we suggest, for example, to do backups and even DR.
00:33:27
Speaker
The most common approach in Kubernetes is to rely on the storage layer for replication and for backups and snapshotting and DR. And that's fine, but I think because of the level and the quality and the robustness of the replication system, the native replication system that Postgres has,
00:33:56
Speaker
which is shared by crash recovery, as I was saying before, continuous backup and, you know, synchronous, asynchronous cascading replication and even logical replication. My advice is to stick to the kind of application level replication and disaster recover, meaning to use the native OSG SQL replication mechanism. So, for example,
00:34:25
Speaker
In terms of data protection, especially I'm European, so you know that from 2018 GDPR has actually at least one of the good aspects is that it made us aware of that problem. We can criticize GDPR, but I think really happy about that. So data protection is a fundamental right for every citizen that comes from Europe.
00:34:55
Speaker
And unfortunately, for example, POSGIS does not implement transparent data encryption yet, even though you can use some software from vendors. And they're talking about implementing these in the future. But in most cases, the mix of, for example, the adoption of encryption in transit, encryption at rest, and maybe even sensible adoption of grant and revoke
00:35:24
Speaker
I think you can get very, very good level also POSGE supports column level permissions. So in my opinion, if you use encryption in transit with, for example, TLS certificates and client authentication with POSGE support. And for encryption at rest, we can delegate that to the storage class. So delegate that to, for example, storage solutions like, you know,
00:35:53
Speaker
very a solution that is used a lot, for example, with our operator is Portworx, for example, for these kinds of problems. I think you are fine. And when it comes to backup and disaster recovery, I would normally recommend using the POSGIS way of doing this.
00:36:17
Speaker
So although you can control RPO and RTO, so use point in time recovery to restore to a point in time, to a timestamp or a transaction before the disaster. Gotcha. Thank you. Like it makes sense, right? Like if Postgres is the only database you're running, having or looking at that application first approach,
00:36:40
Speaker
Makes perfect sense. We have spoken about like the history piece, but I wanted to also know about, okay, what do you see next? What's happening in the Postgres and Kubernetes ecosystem? I know we have KubeCon coming up in like a couple of months.
Future of Postgres Community and EDB Plans
00:36:54
Speaker
I think it's more a social kind of evolution. I think it's, I'm really looking forward to get more involvement from the Postgres community. Okay. So get, I mean, participate to the
00:37:10
Speaker
kind of more cooperation to more cooperation between these these communities. And I think also the reaction I think, yeah, the reaction the DBAs will will have, you know, so I think there will be kind of a reshaping of the DBA role. And because Kubernetes is becoming more, more, more adopted,
00:37:37
Speaker
So I think this is one topic and yeah I'm looking forward for a more more adoption of POSGIS within Kubernetes. So then you know I can tell about for example our the plans for our operator but I don't know if they are interesting. Sure we'd love to hear about it. Yeah okay so
00:38:03
Speaker
Yeah, basically, our operator, we're planning to make it easier for... So we were talking about the complexity, also for backups. And before... Yeah, I think, Pavin, you were mentioning, if you use just Postgres in Kubernetes, it might make sense. But the idea is to actually have an operator that simplifies the underneath complexity
00:38:33
Speaker
And actually with a clear interface with the clarity of configuration, for example, enable you to perform, for example, even the most complex point in time recovery solution. So we have been working on the replica cluster concept. For example, the replica cluster is another cluster that is in continuous replication.
00:39:00
Speaker
And they can, in another, for example, Kubernetes cluster. And the fascinating thing is that PostgreSQL can support replication even without a direct connection between the primary and the standby, just by relying on the wall files that are archived.
00:39:24
Speaker
So the idea is to have, for example, one Kubernetes cluster in one data center with, for example, three nodes in three different availability zones, then back up in a local object store in the same region, and then have another cluster, OSGIS cluster in another region that is using the whole files from that same bucket in the other region to actually be in continuous recovery.
00:39:51
Speaker
Because the wall files are archived every time they are filled, so normally they are 16 megabytes of transaction logs, or you can change that. If you don't get to fill them up, you can, for example, every five minutes close them and archive them, so you have a predictable RPO. So this kind of solution can rely on file shipping,
00:40:20
Speaker
technologies and be let's say five to ten minutes delayed maximum. That's a great idea right like I think last week data stacks enhanced their operator which allows this multi-cluster single Cassandra cluster deployment so I'm
00:40:38
Speaker
I'm excited that even Postgres is thinking about that and like we'll have such again there will be a five to ten minutes of RPO but that's still good enough to have like a cross region or cross country replication. And with no direct connection and you can have three for example in different regions, three, four with the same symmetric architecture so that at any time in case of disaster that can become the primary cluster.
00:41:06
Speaker
So we're working on these, so we're hoping to improve, for example, the scalability of Postgres within OneNote, better log integration with, for example, the most common Kubernetes logging tools, and looking forward to the federation problem to be solved in Kubernetes once and for all. Multi-cluster sake, right?
00:41:35
Speaker
I think it's already difficult for me to stay focused in Postgres, you know, but you know, that's a good way of Kubernetes. I think if we follow standards, each one can focus on their own topic and their whole community will benefit.
00:41:52
Speaker
Agreed, absolutely. Well, we're at about the 30 minute mark. So we do want to give you the opportunity to, you know, give us a little bit of where people can get started with Postgres on Kubernetes, where they can find out more, anything you have here would be super helpful. Yeah, okay. So I think for the Kubernetes part, I think you guys are the place where you start with and you know, the Kubernetes
00:42:18
Speaker
You know, the Kubernetes documentation and, you know, there's plenty of resources. TKA is my recommendation. But Postgres, you know, there's the documentation, postgresql.org. It's very well written and very exhaustive. Exhaustive. Yeah. And another good source is Planet Postgres, which is a blog aggregator available from postgresql.org.
00:42:44
Speaker
that collects article from the best Postgres experts in the world. There's also a few, I suggest a few books about Postgres and the last one actually has been published by my colleague and friend Simon Riggs and I participated to the second edition of that book and then you know there will be Postgres event, you know, COVID permitting so I encourage anyone then
00:43:13
Speaker
We are talking about Postgres in Kubernetes and on Kubernetes in the data on Kubernetes community. The URL is docdoc.community. So I encourage to participate to that community that promotes the adoption of data workloads in Kubernetes and is affiliated with CNCF.
00:43:40
Speaker
So data and communities is such an active community that like they publish so much content with experts from all different verticals. It's hard for me personally to keep up. Like every day there will be a new YouTube stream. I was like, okay, I need to add this to watch later and then just come back and keep doing the same. Yeah. Yeah. Yeah. There's a lot of me, you know, webinars and what I like of the community is that we are actually trying to see databases
00:44:06
Speaker
as a special kind of application. So they can coexist in the same cluster. So there's actually an, I suggest to, if you are participating into KubeCon, there's the data on Kubernetes Day, which is scheduled, I think, May the 16th. I think it's part of the community events of KubeCon Europe in Valencia. So I'll be there. So I'd love to meet everyone.
00:44:36
Speaker
And so if you're there, you know, make sure you pass by the EDB booth. I'll be excited to chat with you and talk about all these things in general. We definitely will. We actually shameless plug having a Kubernetes data workshop on the 17th. So if you
00:44:55
Speaker
are interested in that. We'll be kind of doing that sort of in the same sort of vein as Data and Kubernetes. We'll be participating in that day, of course, as well. So a lot of good, exciting day zero. Well, I guess it's day zero and day 0.5 or something like that, because there's two days before KubeCon start, which is typically only one, but yeah, absolutely. Yeah.
00:45:21
Speaker
Well, Gabriel, it was a pleasure to have you on. I think I learned a lot personally, and I think our listeners will as well. Thank you, Ryan. Thank you, Bavin. It was a pleasure and an honor for me to be here. So thank you. Awesome. And we want you back whenever you release that multi-cluster functionality. I would love to dive deep into it. Yeah, maybe if we end up open sourcing one day, hopefully.
00:45:51
Speaker
All right, take care. Okay, thank you. Bye bye. Ciao. Okay, that was a great conversation with Gabriel. Even I had to make some notes on the side because there were so many good takeaways. So Ryan, why don't you kick us off and then we can add our own thoughts. Yeah, I agree. I think some of the
00:46:14
Speaker
The topics we talked about today were really hitting home and really specifically looking at a lot of the staple applications that we talk about a lot on the show today, obviously about Postgres. But one of the things Gabriel talked about was around sort of the DevOps culture and what it means to an organization and needing buy-in for both the Kubernetes and the Postgres level of technologies to really benefit, right?
00:46:42
Speaker
I know we've talked about organizationally, um, you know, taking on microservices and, and, uh, breaking apart a monolith changes your organization, or at least it should. Um, you can't just throw a monolith at a Kubernetes container, which, you know, unfortunately we have seen, um, attempts to do.
00:47:00
Speaker
But, you know, this point being that, you know, taking on DevOps culture, really buying in and understanding Kubernetes, and then really adopting Postgres and Kubernetes is how you get the full potential of running something like Postgres on. That one was definitely a good snippet that I thought about.
00:47:18
Speaker
Yeah, and then it helped us. He tied it in with the challenges that people face when they try to run Postgres on Kubernetes without understanding how Kubernetes works or without actually buying into the DevOps mindset. So if you're using Postgres and you might think from reading some blogs out there, oh, I can use operators and just deploy it and get a head start.
00:47:40
Speaker
that might backfire if you are running it in production. It's a great way to get started in test dev, but for production, you should understand how those underlying components work. So that was a great point. Agreed.
00:47:53
Speaker
And then from a benefits perspective, tying it back to challenges when we discussed, the one thing that I liked the way he highlighted was when you're running Postgres on Kubernetes, everything comes built in. So you don't have to worry about IP addresses. You don't have to worry about DNS. You don't have to worry about HA. All of those things are included when you deploy Postgres on Kubernetes. So if you are going down that path, you will start seeing these benefits as you progress through the journey.
00:48:23
Speaker
Yeah. And I think the emphasis on Kubernetes being sort of the source of truth for state, really emphasizing state and how effectively as an operator of Postgres, you can worry less about managing these things because Kubernetes really does a lot of that for you. So in the case of failure, you're really focusing on making sure that that data is safe rather than also worrying about discoverability and IP address changes and all those things.
00:48:54
Speaker
Yeah, definitely insightful there. The other bit I definitely want to take out of this, which was around data protection, right? There's a lot of data protection solutions out there. We talk a lot about backup and disaster recovery on this podcast. But one tidbit, which was
00:49:09
Speaker
really having a solution that can understand the application at hand, meaning it should understand how to do, say, a backup at a database specific level, right? Not just taking only a snapshot of that volume should be able to reach in to something like Postgres, understand how to use maybe some tooling that's
00:49:32
Speaker
within that pod or container to really get the specific needs that maybe Postgres has around RPO and RTO and those kinds of things. So lots of good information here.
00:49:42
Speaker
One last thing that I wanted to highlight was when we asked about how people are running Postgres on Kubernetes or how they should be running, one of the things that he laid it out so clearly was Postgres, if you have an instance, all different databases running inside Postgres will share a single transaction log. So if you're worrying about replicas and failover, or if you're worrying about data protection,
00:50:06
Speaker
That might not be the best scenario. So if you're running Postgres on Communities, ideally just deploy individual instances, one application database per Postgres instance, and just use a multi-rancy feature available in Communities and do it that way rather than having a really big Postgres instance and having 100 databases inside that. So just to keep in mind for people who are thinking about Postgres on Communities. Absolutely.
00:50:33
Speaker
Well, with that, I want to reiterate, um, you know, for all those listening to definitely go ahead and leave us review wherever you can leave us review, um, or check out our, uh, anchor webpage. And we do have a new webpage, which we will link here, which you can find out all about the podcast as well. We encourage you sending us a message or giving us feedback. So we know that we're tackling the right things on the show.
00:50:58
Speaker
Um, in a couple of weeks, we have a show on EBPF. Yeah. That's the hot thing right now in the ecosystem, right? So I wanted to like get somebody from the EBPF community and not just talk about like storage in EBPF, but also focus on like, okay, what is EBPF? What do we need to know? And how, how does it apply inside communities? Absolutely. Yeah. I think there's some day zero events around EBPF as well. So great. That's exciting. Um, and with that.
00:51:26
Speaker
Uh, thanks for joining today's episode. I'm Ryan and thanks for listening to Kubernetes Bites. Thank you for listening to the Kubernetes Bites podcast.