Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Extending Postgres for High-Performance Analytics (with Philippe Noël) image

Extending Postgres for High-Performance Analytics (with Philippe Noël)

Developer Voices
Avatar
2.3k Plays6 months ago

PostgreSQL is an incredible general-purpose database, but it can’t do everything. Every design decision is a tradeoff, and inevitably some of those tradeoffs get fundamentally baked into the way it’s built. Take storage for instance - Postgres tables are row-oriented; great for row-by-row access, but when it comes to analytics, it can’t compete with a dedicated OLAP database that uses column-oriented storage. Or can it?

Joining me this week is Philippe Noël of ParadeDB, who’s going to take us on a tour of Postgres’ extension mechanism, from creating custom functions and indexes to Rust code that changes the way Postgres stores data on disk. In his journey to bring Elasticsearch’s strengths to Postgres, he’s gone all the way down to raw datafiles and back through the optimiser to teach a venerable old dog some new data-access tricks. 

ParadeDB: https://paradedb.com

ParadeDB on Twitter: https://twitter.com/paradedb

ParadeDB on Github: https://github.com/paradedb/paradedb

pgrx (Postgres with Rust): https://github.com/pgcentralfoundation/pgrx

Tantivy (Rust FTS library): https://github.com/quickwit-oss/tantivy

PgMQ (Queues in Postgres): https://tembo.io/blog/introducing-pgmq

Apache Datafusion: https://datafusion.apache.org/

Lucene: https://lucene.apache.org/


Kris on Mastodon: http://mastodon.social/@krisajenkins

Kris on LinkedIn: https://www.linkedin.com/in/krisjenkins/

Kris on Twitter: https://twitter.com/krisajenkins

Recommended
Transcript

Introduction to Database Sweet Spots

00:00:00
Speaker
Every database has its sweet spots and its drawbacks. By design, they've been optimized for specific use cases, and that's why there are so many kinds of database. You get some that have been optimized for analytics, some have been optimized for embedding in web pages or hardware, some for distributed transactions or real-time processing.
00:00:22
Speaker
Some of them are truly web-scale. You pick which strengths you need, and there's probably a database out there that excels at exactly those strengths. But what do you do when your system outgrows that original decision? Well, one answer comes from this week's guest, Philippe Noel. He had hit the limits of what Postgres can do for full-text search and analytics. And instead of doing what a lot of us would do, building a hybrid system with Elasticsearch,
00:00:52
Speaker
He dove into the heart of how Postgres works because I didn't know this before we met but Postgres has an extension mechanism that goes so deep you can actually change the way indexes are constructed and the way tables get written to disk. So there's a lot of flexibility in there to change what Postgres strengths are.
00:01:14
Speaker
And they ended up building a custom search mechanism and a columnar storage engine to make Postgres do everything it already did well and a lot more besides. Well, I had to get him in to explain how. And while we're on the topic, what does that mean for Postgres going forward? I mean, how far can you extend it? Is the Postgres of the future going to be a database or a platform on which new databases can be built?

Meet Philippe Noel and His Journey with Postgres

00:01:42
Speaker
Let's dive in and find out. I'm your host, Chris Jenkins. This is Developer Voices, and today's voice is Philippe Noel. Joining me today is Philippe Noel. Philippe, how are you?
00:02:07
Speaker
I'm good. How are you? Thanks for having me. I'm good. Pleasure. Yeah. I'm, I'm excited cause you are, you're, um, by my standards, you're a Postgres dark wizard. Yeah. I don't know about that, but, uh, you know, certainly not a thing or two. You're further down the rabbit hole than I've ever been. So that's exciting to me. That yeah, that's, that's perhaps true. So we're going to get into building very bespoke features into Postgres.
00:02:33
Speaker
Before we go right into the depths, how come you learned about this? Where did you get to in life that you needed to extend Postgres? That's a good question. What were your Postgres users for a long time as developers ourselves? Me and my co-founder and my colleagues, we built a few projects and we liked it from an end-user perspective. Last summer, we were working in the
00:03:03
Speaker
We were working on a new startup, and as we talked to more and more customers, it seemed like there was a need to do better search in Postgres. And that got us kind of interested in understanding how it works and the internals and why no one had solved this problem before. And as we did, we kind of realized maybe there was a thing for us to learn there and move from being users to developers of Postgres. And that's kind of how we got started with it. Right. But this...
00:03:30
Speaker
There's plenty of people that would hit that problem. Some people would hit it and say, OK, Postgres has some kind of full-text search. Is that not good enough? Other people would say, well, let's bundle in Elasticsearch. You seem like you've taken the hard path.
00:03:45
Speaker
Yeah, well, we wanted to do it not just for ourselves. I think that was another reason there. Well, there's a couple of points to the answer to your question, sorry. On the one hand, it is true that it has full text search and that's already pretty good. You can go very far with it. I'm happy to talk to you about it. We often recommend people with especially small to medium workloads to stick with the native functionalities, but eventually you need more.
00:04:11
Speaker
For our use case, we needed more, and we saw a lot of other people in the space that also needed more complex feature sets. And at that point, we already were drawn into the idea of hacking on Postgres ourselves. I don't know if you've seen, but there's actually an article that was circulating around last week that said hacking on Postgres is very hard. And it's by one of the main Postgres committers. It's very hard, but it's also very cool. And that got us interested in solving the problem for the
00:04:39
Speaker
Let's say like more complex workloads that people had. How hard would you say it is?
00:04:47
Speaker
I don't know. I mean, okay. This is also a bit of a, we don't do the hardest hacking on Postgres. So let's say we do like, we do maybe like a notch before some of the people that work on actual like upstream source code. Um, they, they do the real hard work and I can pretend to be someone that's doing that work because I'm not. Um, it's quite hard. I mean, the code base is like 30 something years old, right? It's.

Developing Postgres Extensions with PGRX

00:05:11
Speaker
It's enormous really by the end. And there's so many things that can change and go wrong. And there's like seven or eight people in the world that have commit access to actual Postgres. So it's very, very, it's a very select group. Right. Okay. So you're not doing the, you're not trying to patch Postgres so much as extend it, right? Cause it has an extension mechanism. That's right. That's right. The only one of those extensions I've ever really used in anger is like geospatial.
00:05:40
Speaker
I've certainly never looked at building my own. Yeah, it's a very popular one. Geospatial is a big one. In fact, allegedly, this is a non-verified information by myself, but allegedly, I've heard that in geospatial conferences, the majority of people who are there are Postgres users that use geospatial through Postgres, sort of the primary way people store geospatial data nowadays. Oh, really?
00:06:07
Speaker
That suggests that for a cunning person extending Postgres, there are latent markets out there. That's certainly the hope. But it's also true in practice. Postgres is growing to be more than just the relational database. It already is in a lot of ways. And for us, we do search in Postgres. But there's other people that focus on it in other ways. Postgres is an example where you can
00:06:36
Speaker
Modify to store geospatial data another particular relevant extension of recent times is pg vector which allows you to store vector data and do vector search with postgres which obviously has come to be you know a big deal and in the last year or two super hot right now.
00:06:51
Speaker
Yeah, so you can kind of transform it into anything. And there's even this movement happening in the ecosystem that people call Postgres for everything, where they're trying to run your entire data infrastructure exclusively on Postgres and variants of Postgres modified for these extensions, for example.
00:07:12
Speaker
OK, so if we're there yet. Well, this is perhaps another stepping stone in that in that path, because you're going to teach me and therefore everyone listening. If I wanted to if I imagine a new feature for Postgres and I want to extend it, where do I start? Where did you start?
00:07:31
Speaker
Yeah, so great question. So Postgres is an extension mechanism where you can create small extensions that can be loaded into the database without needing to modify the core source code. That's very, very convenient to getting started. There's two main ways you can go and build extensions nowadays. Postgres itself is written in C and so comes with an extension framework to write extensions in C. You can look at the Postgres documentation and get started. However,
00:07:58
Speaker
The way we started is there's a meta project, a framework called PGRX, which allows building extensions in Rust. It's quite a popular project. A lot of companies and new extensions are built off of it, the majority of them nowadays. But someone by the name of Eric
00:08:17
Speaker
who is much more reposuous wizard than I'll ever be. And a couple of his colleagues, they created this framework where you can create extensions at Rust. So you write Rust code, and it generates the extensions in C, basically at the end of it. And that's quite a phenomenal way to get started. They made it very, very ergonomic. And they have examples and so on, so you can get started in that way. And that's how I would recommend people start.
00:08:40
Speaker
Okay, I'm going to dive into that. Firstly, what's the development experience like in that? Do I have to recompile Postgres every time I want to make a change or how painless is it?
00:08:56
Speaker
It's wonderful, honestly. It's wonderful. I'll speak for using PGRX because that's what we use. But if you do it with the extensions in CA as well, it's also quite fast. You don't need to recompile Postgres. Typically, you install Postgres using your package manager. PGRX, they can also install Postgres for you. If you don't have it with a package manager, you want them to manage everything.
00:09:19
Speaker
And then afterwards, it's the pretty standard Rust development experience, like cargo run. And the way it works is it packages your extension, installs it into a temporary Postgres instance that's blank, which is your extension install, and then you can do some testing with it. And they also have packaging and testing commands and so on. So it's quite standard. It's very, very painless, I would say. They make it quite easy.
00:09:47
Speaker
If you do it with the framework in C, it's not quite Azure Got to make, but it's pretty nice too, but it's the standard C development experience, like make, make install, and you'll have your package manager postgres, and then it would install there. It's less a dump attempt, I would say, in that experience.
00:10:09
Speaker
Sometimes you might find that your extension is still installed, some part of a system which can cause some unexpected errors or some unexpected behaviors. But if you work with the Rust version, it's quite good. It's made for the kids. It's modern. That's impressive because they could easily stop making Rust bindings to the C call easy.
00:10:34
Speaker
Yeah, they have taken it so far. And I've got the chance to talk to Eric. He's told me that he's done it because he needed it himself. So for context, Eric created this extension called ZomboDB. They were one of the early extensions in Postgres.
00:10:50
Speaker
It seems a lot of mileage by now. And ironically, it's quite related to what I work on. They were building a connector between Elasticsearch and Postgres. So the work that I've done and that we've done with my team is we've modified the way full-text search works in Postgres to make it more capable and more comparable to Elastic. What Eric did several years ago was he had Elasticsearch running separately.
00:11:16
Speaker
And then he had this Postgres, and he wanted to be able to run Postgres commands and have them be used to manage the Elasticsearch cluster, sort of a bridge between the two. And he first built that. It's called Sambo DB. He first built that in C and eventually had a variety of issues, learned about Rust, and thought he could do things better if he could rewrite it in Rust. And so he built this PGRX package, like extension framework, initially for himself. And when I talked to him,
00:11:46
Speaker
He's told me that he works on it because it's useful for himself and so on, but I think he's way past altruism at that point and he could have stopped a long, long time ago. He says he couldn't. I think he could have stopped a long, long time ago and now he does it because he's a good guy and this is improving the ecosystem and so on. So they're really taking it to very far to great quality.
00:12:07
Speaker
Nice, nice. Okay, so I'm feeling slightly less scared about the prospect of doing this one day. But what's it actually like? So you need, where do you start? You need to do full text search better in Postgres. You create an index extension?

Enhancing Full-Text Search with Custom Indexes

00:12:27
Speaker
Yep. So what we did is we created an index extension. There's two main ways to poke with storage in Postgres. One way is with what they call an index access method. So a database index, right? You can index some data from a table, for example. That's what we do for full-text search. And the existing
00:12:45
Speaker
Capabilities for basic full-text edition Postgres also work in a similar way. That's quite common. So we created an index access method via an extension. And whenever you run our Postgres extension, it gives you this extra command. So in Postgres, you can say create index some name on a specific table. Now, if our extension is loaded, you have an extra option as the possibilities for index. And it runs our code instead of running the native code.
00:13:15
Speaker
I think we'll get into that later, but another way in which you can modify the way data is stored in Postgres is to do what's called a table access method, where instead of modifying the way the index is created on top of a table, you modify the way the table data itself is stored. That's a somewhat recent functionality that came in late 2019, early 2020, really. Okay. A 30-year project, I suppose, that is recent.
00:13:42
Speaker
Yeah, exactly. It's not like recent, recent, but it's relatively recent. When we did that via PGRX earlier this year, we were allegedly, according to Eric, the author of the framework, the first people to do it in Rust to modify the table engine with his framework. So when I asked him about it, he said it was supposed to work, but he had no idea if it was going to work because no one had tried it before.
00:14:07
Speaker
Someone else can go first one step back. Yeah. Yeah. Okay. Well, let's start with indexes because I feel like they're going to be easier to understand. My guess is somewhere in there, there's a hook where Postgres tells me there's a new row coming in. I've got to write an update hook and I've got to write a find hook. I mean, what's it actually like?
00:14:30
Speaker
So a lot of this is already abstracted away from us, so it's quite nice. So if you go and you look at the Postgres documentation, you can see there's a couple of functions for defining index access method, and there's a couple of functions for defining table access methods. The index access methods are quite small. Not very many functions, really.
00:14:53
Speaker
I could pull up the list of all the functions that you need. But you implement these handful of functions, and then afterwards you're good. So exactly, it basically allows you to fetch the tuples. Whenever a new row comes in, which is a tuple in Postgres, it fetches that and then it re-indexes it into the
00:15:14
Speaker
like whatever indexing scheme that you use. In our case, we work with a library called Tentivy, which is a Rust implementation of Lucene. Lucene is this... Yeah, maybe Lucene rings a bell, but it's this search engine framework on which Elastic is built, and it's pretty state of the art. The underlying technology is called an inverted index. That's how...
00:15:38
Speaker
That's how you can tokenize the words in such a way that they're easy to retrieve using full-text search. And we basically bundle that in. So you sort of bridge the two. You implement the functions to fetch the tuples from the rows that come in in the new table. For example, in a table, there's updates that happen to it. And then these get indexed, like Passover to 10TV, to be indexed in those inverted indices. And those get stored on disk.
00:16:01
Speaker
with inside the database. And then whenever you fetch or you make queries against the index, it retrieves the information against the index data instead of against the table. Okay. What's your index file? Is that something that's managed for you by Postgres? Or do you have to... I mean, do you get some kind of scratch file, storage file in which you can work for your index? Or do you have to do everything manually?
00:16:32
Speaker
We do it manually, but the index access method framework has a lot of things taken care of for you in Postgres. But you have to mostly do it manually. So what I mean by that is we... So Tentivi, that Lucene library that we use, they have their own file extension that they use to store data on disk. This is what we use. But then afterwards, we need to do the wiring up to Postgres.
00:16:58
Speaker
And really, speaking of hacking on Postgres and getting started and so on, making extensions can be as easy or as hard as you want it to be. It can be as easy in the sense that this framework that I mentioned is very, very ergonomic. And if you do things that are relatively simple, it's quite easy. You can put something together in an hour and it's going to work well.
00:17:21
Speaker
For things like us, where you modify the file system, for example, you interact heavily with the tables, you need to connect the pieces with all of that mixed Postgres Postgres. You need the index to be cleaned up if you do vacuuming, for example. You need it to be connected to the write ahead logging system so that you can replicate the index to other Postgres shards if you want, or have high availability.
00:17:49
Speaker
whatever it is that you might need. And so all of these are actually quite a decent amount of work. And even we haven't implemented all of these ourselves by now. We've implemented the majority, but there's still a few things that are works in progress. Right. Yeah. I can see, I can see it being a rabbit hole as large as you like.
00:18:09
Speaker
But so the essence of your project is you've said it would be better if this data were in Elasticsearch, but we want to keep it in Postgres. So if we rip out the library from Elasticsearch or close equivalent and all your job will be glue in Rust. Yeah, that's a, that's a pretty good way to, to summarize it exactly. Okay.
00:18:32
Speaker
I can see that bringing hope to people that want to spin out another database, but don't want to spin out another database. Exactly. The idea really is, well, Elastic is powerful, right? But the truth is for a lot of people, you don't need that power.
00:18:52
Speaker
But you still need, like you need more power than Postgres offers, but you don't need the cumbersomeness and the scale and all the problems that come with doing something that's really powerful, right? Like perhaps, you know, like you need more than a kitchen knife, but you don't need a chainsaw, right? What if you just needed a butcher knife, right? We could actually just give you something like that. Okay. Okay. Take me through a bit more of this API, like.
00:19:16
Speaker
Let me see if I can think of an interesting corner case. You must have to do something for the sake of optimizer statistics, right? Your custom index has different access patterns. It must affect the optimizer. Yes. There's a couple of things. One is the
00:19:40
Speaker
The way we make it work, so it's actually quite interesting. The way we make it work is we use various procedural calls to create our indexes in Postgres. So it's kind of a unique syntax. It's a bit of an uncommon syntax. And when you make queries against it,
00:19:58
Speaker
we make sure that you only route the queries directly to our index. So the Postgres plan, actually, I may be speaking a bit out of my depth here, but when we did a scanning, it essentially only scans our index. And all of our queries, at least as of today, they get executed exclusively against our own index. So one example of the optimizer, not a tick of it, is the way we do limits in Postgres.
00:20:28
Speaker
So let's say you're making a query against a search index, and you want to say, retrieve me everything that contains the word red from this list of clothing, and you want to limit to only five items, for example.
00:20:44
Speaker
One thing that we do to be able to optimize the way the Postgres optimizer and Postgres planner runs the queries is there are specific keywords that need to be handled, whether by tentatively or whether by Postgres, to have maximum pushdown of the compute. So for instance,
00:21:01
Speaker
Tentivy, which is the search library, it comes with its own limit capabilities. And whether you run those limit capabilities with Postgres or with Tentivy makes a huge difference on performance. Because there's only so much that the index access method API gives us in our ability to work with the optimizer and to optimize certain queries. So we recommend for users to, we have like somewhat
00:21:30
Speaker
somewhat peculiar syntax, or maybe not peculiar, but a pedantic syntax, perhaps. It needs to be exactly the way that we recommend it to be optimized. Otherwise, you can still run standard Postgres queries on top of your search index, but it's not going to be as fast. Right. So the integration isn't completely seamless for the end user. They'll be aware that they're using something custom here.
00:21:55
Speaker
Yeah, they are. It's pretty good now. It's mostly seamless, but it's not perfect. And that's because of the way the planner optimizes. The planner interacts with the optimizer, for example. Something else to mention here is when you do work with index access method, you need to implement a function called the AM cost estimate. It stands for access method cost estimate. And that's another example where based on how that function is implemented, it will
00:22:25
Speaker
Um, it will influence the way the planner executes your queries. And that might require a little bit of, um, you know, a little bit of poking around in various ways to make sure like it does what you want it to. There's a bit of a, it's a bit of an art more than a science. The same goes for the table access method. Yeah. Yeah. Do you get, um, cause you can seem to remember in Postgres, you can get it to deliberately reanalyze indexes for the sake of the query planner.
00:22:56
Speaker
Do you get a hook for that where you say, okay, it's time to update your estimates about how fast you are for certain kinds of query? Yes, you can. So for the index side of things, you can actually do it pretty seamless. So because index access methods don't modify the actual storage of the data and the tables, they're sort of like,
00:23:22
Speaker
They're not truly in the guts of Postgres. They're a little bit on the peripheral side. It's kind of connected pretty seamlessly to the explain analyze commands and so on. So you can look over the plan and things like that pretty well. And Postgres will do a lot of the optimization for you. But when you start to work in the core of the storage engine itself, you need to do a lot of that yourself.
00:23:44
Speaker
And so at that point, you do have a specific hook that you can use to query the planner, to query the executor at various points and times, and make sure that it does what you want. And you can even come in and do plan optimization yourself. And once we talked about the storage, the table storage side, I can tell you more about how we do plan optimization on that because it's a lot more
00:24:08
Speaker
It's a lot more intentional and manual than at the index side. On the index level, we basically try to make Postgres do most of it itself, right? We want to mess with the ecosystem as little as possible. Right. Yeah, yeah, yeah. Okay. Well, in that case, I think we probably should get into the hairy world of replacing the storage engine. Because I didn't know, until we spoke recently, I didn't even know this was possible, that you could rip out the way Postgres thinks about tables.

Transforming Postgres Storage for Analytics

00:24:35
Speaker
Yeah, I think not that many people know it's possible. And it's not heavily advertised. But there's a few people that have worked on it. We're not the only one. There's maybe three or four teams that have worked on it from various angles. I'm happy to tell you about it. It's quite cool, actually. You can write an extension that changes something so critical to Postgres. So there's enough rope to hang yourself here.
00:25:01
Speaker
Yes, very much so, yes. You've got to be careful what you do. Out of curiosity, is this part of the reason for choosing Rust? Is it like you don't want to accidentally crash your users postgres instance?
00:25:15
Speaker
I mean, we chose Rust because it was Rust RC. And I mean, why would you choose C, I think, at that point? But from Eric, from the team that made Pygirks in the first place, part of the reason for choosing Rust was to make sure you can do memory management in an easier way, for sure.
00:25:41
Speaker
But still, even though we've had to deal with some pretty hairy memory issues ourselves despite working in Rust, sometimes you have to go and work with the Postgres functions directly. And so it's not fully, fully hands-off. You can still crash users' Postgres instances.
00:25:59
Speaker
Oh, good. Oh, good. Got to have something to keep you awake at night, right? Yeah, exactly. Exactly. But Rust, it, you know, it, it pats the pillow, you get to sleep a little bit better, but it's not perfect yet. Okay. So maybe you should tell me why you need a custom storage engine. As I understand full text search, I can understand that needs a custom index. Why do you want a custom storage?
00:26:25
Speaker
Yeah, so the work we do is split into two categories. We are trying, as you and I talked about, to make the workload that someone might use Elasticsearch for be possible via Postgres. So you don't need to move data and you can stay within Postgres. Obviously, the name is Elasticsearch. And when people hear that name, they take off search, right? And so that's the first thing that we did. But actually, another key functionality that Elastic offers is analytics.
00:26:51
Speaker
And it's actually quite common for people to do analytics in full text search within the same query as well. And so that kind of got us to thinking, OK, could we actually make Postgres do analytics? And if you're familiar at all with databases, you'll know that there's two main ways databases store data, depending on the use case. For relational databases like Postgres, they store data in row format. So on disk, the sequences of bytes that are stored in row format. So a specific row is stored next to one another on the disk.
00:27:19
Speaker
And that makes it easy to do reads for sequential, for point in time reads. So let's say we're trying to retrieve, like you're trying to log into your online banking system. It's trying to authenticate you against your user account, right? Just trying to retrieve your information. The other way that databases store data is in column format. So instead of storing rows next to each other on your SSD, it's going to store columns next to each other. So the column is sort of flipped into a row format from the disk and then each column. So like by data type.
00:27:48
Speaker
So we store all the usernames in one big part of the disk and all the ages in another part of the disk. Exactly. And this is optimal because when you're reading data on disk, sequential reads are significantly faster than if you need to jump the reader to various parts of that hard drive. And so if you have query patterns where you need to read specific columns, for example, analytics, where you say, okay, what's the
00:28:14
Speaker
I don't know the total sales across all locations this month right something like that then you look over like all the traffic some of all the transactions for instance and it's significantly faster to do this over when you started in column and column format.
00:28:31
Speaker
Because we wanted to do analytics alongside our search, we were like, OK, we need a way to do that in Postgres. Postgres stores data in a raw format. That's very inefficient for this type of workload. We need to store data in column format. And that brought us to saying, oh, what if we can modify the way data is represented under the Postgres tables? From the perspective of the user, we just give you Postgres tables, right? And you don't need to worry about anything. But us ourselves, we need to gut the inside a little bit and say, hey, we'll store data in column format.
00:29:00
Speaker
That is the point at which I would have given up with Postgres and said, well, it's fundamentally not the way it works. Yeah. And it is fundamentally not the way it works. You're correct. But again, there's this.
00:29:13
Speaker
Postgres is so extensible and people wanted to use them for so many things. It kind of reached a point where we said, hey, what if it could be the way it works? There are so many things that are wonderful about Postgres. If you can abstract away this problem or this challenge, we do the hard work. But as someone who uses Postgres, you don't have to worry about it. And it's quite seamless, really.
00:29:36
Speaker
When you create a table, you create table, my table, and then you can specify the schema for your table. If you use our extensions, you can say create table, my table, using, and then you specify the storage format. We use parquet, which is this very, very industry standard column format. So you say create table, my analytics table, using parquet. And now it's going to create it in column format, and that's it.
00:30:05
Speaker
Now everything else you do is the same. You can run the same queries with the same syntax. You don't need to change anything. Just the table creation name. And then everything is the analytics are 95 times faster than regular Postgres. And that's because of data. Well, one of the reasons is because the data is stored in column format on disk. And we've had to go and there's 42 or 43 functions you need to implement to create a custom table method. It's a lot more than the index.
00:30:35
Speaker
Yeah, once you get that done, you, from the end user, like the database user perspective, it's pretty, it's pretty transparent. Like how transparent, like are updates the same? Are joins the same?
00:30:54
Speaker
Yes and no, in the sense that it depends how you build it afterwards. This is actually a pretty, you're asking a simple question that has a very long-winded answer. But the answer is yes and no. So the way we do it, it depends how you process the queries by the end of it.
00:31:15
Speaker
The way we build our own custom storage method is we modify the way we do custom storage. And we also modify the way Postgres executes queries on top of it. So we have more speed ups. And we integrate other industry projects for this. I'm happy to talk to you about. If you don't modify the way Postgres queries get executed, the answer is yes. Everything's going to work the same.
00:31:36
Speaker
In our case, because we modified a storage engine quite heavily, there's a couple of things we need to do to wire up the pieces together. So if we only modified the storage engine, the answer would have been no. There would be other things that we would have needed to do. We kind of finished those. But if you stay within the row structure paradigm, the answer is yes. So the initial
00:32:00
Speaker
motivation for implementing this API in Postgres in 2019-2020 in the first place was by Fujitsu, I believe, and VMware. And what they wanted to do is they wanted to improve the way vacuuming is done in Postgres. But it's very difficult to get
00:32:16
Speaker
the seven or eight people that have Postgres commit access to approve your changes because they care a lot about keeping upstream extremely small, extremely reliable, right? They want things to be very well tested. So what those people thought is, hey, we want to rework the way data is stored on this so that we can have a more efficient vacuuming process. If we can create an extension API that allows modifying the table, we can offer like an alternative storage engine and people can start testing it without it being committed to the core.
00:32:46
Speaker
And that started this project in the first place. And so there's work that's done there to just say like, oh, it's still Postgres. It's still a relational data model. We're not changing the paradigm. We're just trying to make it a little bit better. We kind of took it in a different approach and said, no, no, no. We'll now use this to convert Postgres for analytics. And in our use case, there's a little bit more that you need to do to make joins work. Now that we have things work across the board, then the user wouldn't necessarily know.
00:33:16
Speaker
If someone's listening to us and says, I want to go and implement my custom table method, just so you know, you may need to do some more than just make the data be stored in columnar format to put all the pieces wired up together properly. Right. Yeah, it depends on how fundamentally different to the original assumptions you'll go. Exactly. Give me some clues.
00:33:39
Speaker
Don't give me real trade secrets, but logically, how does the game of joining change when you're in a hybrid row and column oriented data format?
00:33:54
Speaker
Yeah, so that was once we released the first version of our custom storage method, that was a number one requested feature. And so we spent a lot of time looking at it. And so the way we do it, well, actually, before that, the way we do our analytics side, like our custom storage, we also use a separate query engine. And that's important because that's what we use to be able to do the hybrid joins. We call them a hybrid joins or HTAB join.
00:34:25
Speaker
Whenever, when you want to do fast analytics, there's two important components. One is to do column nor storage of the data so you can read the data faster. But the other is you want to be able to do vectorized processing. So when your database goes and fetches data from the storage,
00:34:43
Speaker
Typically, the way Postgres works is it's going to fetch one row at a time or one tuple at a time. So if I'm retrieving information, Postgres is going to go one by one. And again, that's because Postgres is optimized for point lookups, right? Look up your user account, look up your credit card number to make sure your transaction is correct or things like that.
00:35:05
Speaker
big databases that do analytics, there might be grabbing, let's say, doing an account over the entire sales or the entire purchase history. And so you're not interested in a specific value. So in that case, it's just more convenient and faster if you can grab batches of tuples at a time to process them. That's called vectorized processing. Right, yeah.
00:35:28
Speaker
And so you're reading like a large column, and instead of reading like h-value at the time, you said like I'll read chunks of the column all at the same time to make things faster. So we also needed to integrate that into Postgres to make it really fast, and to make it competitive with industry standards. Nowadays, our extension is pretty competitive with some of the fastest databases in the market as well that are not Postgres.
00:35:50
Speaker
In order to do that, we use a project called Apache Data Fusion, which is a composable query engine for analytics. It's also written in Rust, and we also integrate it in. And so that's why when I was mentioning, there's kind of two things that we do from our table, from our custom storage method. We modify the way the data is stored, but we also modify the way the queries are executed. So when you run queries,
00:36:12
Speaker
on that custom table storage method. Instead of running them via Postgres, we actually transfer them from Postgres to this embedded query engine that we have in our extension and those get executed against it. And that's much faster.
00:36:32
Speaker
you're not hooking into, as we were talking about earlier, you're not hooking into Postgres as optimizer. You've got an entirely different optimizer to play with. That's right. That's right. That's right. And you were asking about hooks. There's a lot of hooks that Postgres makes available, so you can intercept query plans at the beginning, at the end of the plan optimization. You can optimize it at the beginning of the execution stage, where the plan gets passed on to actually be computed on top of data, at the end of the execution stage, so on and so forth.
00:37:00
Speaker
And what we do is we intercept the plan pretty quickly and we reroute it to that analytics engine and that analytics engine does its own optimization on it. That's optimized for specifically processing parquet columnar data.
00:37:17
Speaker
makes sense. So as a user, I'm just writing a regular query, and you're hijacking it sometime after it's passed. That's right. Exactly. That's right. The reason I mentioned all of this is because going back to your question, you asked, how does it work to do joins between hybrid rows and columns, right? And so
00:37:38
Speaker
What we do, essentially, is we federate the query. There's multiple tools that exist out there. Big data tools are common for doing this, like Spark and so on, that allow you to query multiple databases in parallel and say, oh, and so it's called Query Federation.
00:37:57
Speaker
What we've done is we've taken two databases and fuse them into one, but we've kept the boundary between the two. You have Postgres, of course, and you have regular Postgres and you have our custom storage and our custom query processing on top of it.
00:38:11
Speaker
And we make this transparent to user. But to do this, we want to say, hey, let's say you're trying to do a join between a regular Postgres table and our analytics Postgres table. We want to be able to say, OK, from your query, we want to build the largest possible subplan that can be executed entirely on Postgres and the largest possible subplan that can be executed entirely on our custom storage, execute those in their optimized ways, and then after that, join the final result.
00:38:41
Speaker
That's how we do it. So if I try and make this concrete, I come along and I query all the sales in my database and I get that total fine.

Optimizing Performance with Federated Queries

00:38:54
Speaker
Now I say, let's slice that up a bit and say all the sales for France, which is, I don't know, let's say it's a fifth of my data. Sure. How's that going to play out?
00:39:07
Speaker
If you want to join that with something else from your tables. So I've got the information that it's from France is in a different table. Yeah.
00:39:18
Speaker
Well, I mean, what would happen basically, right? Let's say you have one table that contains, I don't know, like your store locations, right? Like you have your store locations table and then you have like your full, like all the sales, right? Sales ID and then item and price, something like that, right?
00:39:42
Speaker
The way it would work is your join would essentially be federated. Let's say your sales location is a standard Postgres heap table, stored in a Pro format, and then all of your sales are stored in those analytics table so that you can do fast aggregations on top of them.
00:39:58
Speaker
The way it would work is that plan would be split between finding all of the store locations that have France as their country of origin, right? All of that would get essentially processed exclusively on the Postgres heap table. You would have all of the stores that have all your purchase orders and have the ID mapped to those specific stores, for example.
00:40:22
Speaker
And then afterwards, your query will get federated back and executed on the analytics table. So the first part that would happen would be fetching
00:40:30
Speaker
all of the stores that are in France and the associated sales IDs, for example. And then those would get joined with the data that contains all of your purchase orders that are in Analytics. And those would get executed on the Analytics Engine at the end of the day. Every single join finishes on the Analytics Engine, because that's where the bigger aggregations always happen, right? On the C design, you're skimming correctly.
00:40:58
Speaker
But the way it works is your query gets passed in, the plan gets constructed. Of that plan, we recursively walk back from the bottom of the plan to find the largest possible point where it's going to break and requires the analytics engine. Before we reached that point, we execute everything we can on the Postgres side. And afterwards, the final results gets drawn with what's done on the analytics table. Yeah, that makes sense.
00:41:26
Speaker
Do you do, I'm thinking of things like Apache Pino, right? But they do a lot of pre-computing of certain indexes to try and anticipate which analytics queries you'll run. Are you stretching into doing anything like that?
00:41:40
Speaker
We don't do that right now. Apache Pino is an example of a tool that's pretty similar in use case to what it is that we're doing, but it's a much more mature tool. There's a lot of optimizations that are done for us by the Data Fusion query engine I mentioned that we don't need to worry about, but we don't do any pre-computing of indexes or things like that today.
00:42:03
Speaker
There's still a few things that need to be done before we start to really, really over-optimize on performance. What's interesting about the work we do is
00:42:12
Speaker
People are comparing it really against Postgres and against other industry tools. Our analytics engine is about 95 times faster than Postgres. And it's slightly faster than Pinot today, but it's not significantly faster. I'm sure there's queries that Pinot can optimize way better than we do because we don't do anything specific there. But it's typically good enough as a start for people.
00:42:37
Speaker
You install an extension, run two, three lines of code, and suddenly your queries are 95 times faster without needing to move the data. That gets people excited enough that it buys us time to build the index pre-computing features in the future. So it might be coming along in the pipeline.
00:42:55
Speaker
Yeah, there's a lot of things that will be coming along the pipeline for optimization. There's a couple of other things we do where just yesterday we released the ability to query over cloud object storage as well, which I'm happy you talked about. Oh, yeah. So that's another interesting one where we kind of mess with the storage in a slightly different way.
00:43:14
Speaker
messing with it on disk, we actually connect to remote object storage in this sort of federated way. That's another big one. Oh, is this like, so I could store my database tables as parquet on S3? Is that what we're doing? Yeah, exactly. Exactly. Yeah. Imagine if we gave you like a Postgres table UI or like interface in a way over parquet files in S3. Yeah.
00:43:38
Speaker
And I get to do that all through presumably my very familiar Postgres tools. Yeah. You know, bsqlDB, whichever one you like. Yeah. Oh, that's neat.
00:43:49
Speaker
Yeah, so there's all this is like, I'm thinking this in a bit of a different direction than your core question. But yes, like index optimization and things like that are things in the pipeline. But I would say a bit further down the line, there's still some cool stuff you can do just at the feature set rather than at the performance metric. And there's a big discussion happening in the database world nowadays. There was another article by the founders of Mother Duck, the DuckDB
00:44:17
Speaker
commercial side of things, where they were saying like, performance is a commodity. And it's quite true. Like every database is fast, basically nowadays. They're all fast enough. There's other ways to differentiate and really help build a better product for users. Yeah, yeah, I can have some sympathy with that. By the time you're talking about several orders of magnitude, fast become the proper debate. But for a lot of people, convenience is still the far larger factor, right?
00:44:47
Speaker
Yeah, exactly. I'm sure you can drastically speed up some queries with those optimizations, but the queries are already pretty fast. And so in a lot of cases, it's not bad. It's pretty good. But there's a lot of tuning you can do even before we do any pre-computing or things like that. You can do materialized views on top of those analytics, like our custom table storage engine, like I was mentioning.
00:45:15
Speaker
That's what typical white people will drastically speed up queries.
00:45:19
Speaker
Yeah, yeah, yeah. Okay, so by hacking into, not even hacking into Postgres, by extending Postgres, you can do things like changing the storage engine, changing the query planner, changing the indices or indexes, never know which one people prefer. I'm trying to think what the limits would be. At what point would you say, actually, I do need a different database?
00:45:46
Speaker
My guess is the fundamental limitation of Postgres is that you're still single machine master database, right? Yes.
00:46:01
Speaker
Yes, I think it's an interesting question. It's an interesting question. There's a lot of people that believe you should be able to use Postgres to do anything. And certainly, there's great work that has been done to make distributed Postgres work very well, right? Citus, although Citus, I believe, is still a single writer node on their whole distributed Postgres, I got to confess, I'm not fully familiar with the latest and greatest when it comes to sharded Postgres. So I may be speaking a bit out of my depth, but there's a lot of people working on
00:46:30
Speaker
on it, like neon is another one. Obviously, Amazon has their own versions. Today, our own custom storage is not compatible with this distributed Postgres, so if you use our work, it has to be single node. You can do high availability, where you have read replicas and things like that, but it's fundamentally single node. This won't be the case forever.
00:46:54
Speaker
The truth is Postgres scales vertically very well. Nowadays, machines can get huge, right? And it's quite crazy. I've talked to companies. I won't say any names because they're companies that you know of, and I don't know if I should. But there's quite large companies that run enormous internet workloads that I've talked to. And when I asked them how they run their Postgres, they're like, yeah, we have this one big machine in this one AWS region. And it's good.
00:47:19
Speaker
And I thought they would tell me they have this really crazy setup and so on, but you can get quite far with it. So the answer to your question is maybe. Obviously, any Postgres limitation is a limitation you get as well by using us. But there's a big push in the industry to truly get rid of pretty much every Postgres limitation.
00:47:42
Speaker
make Postgres almost like, it's kind of like the Linux kernel in some ways today. It can be taken in so many directions to make so many versions of what a database should be and I see no reason why Postgres couldn't solve those problems and truly be like a database for everything in five, 10 years.
00:48:00
Speaker
So you get to the point where you download Postgres and then you choose what kind of thing it is by which extensions you install. Yeah, right. Like you'll pick your Linux distribution based on your preferences or your use case, right? And there are so many Postgres is the same.
00:48:16
Speaker
You want a database will maybe eventually databases get more and more synonym which is the word postgres right and you say okay today it's you know i need this one to be my post is like geospatial database right this one is my analytics postgres version and you have all these flavors.
00:48:33
Speaker
You need to start hacking on Postgres to realize how extensible it is. I had not appreciated that until we started. And to anyone listening, I would highly recommend you give it a try. It's very eye-opening. There's a lot of things people think you need something custom for, like a custom data store or things like that, but really well-designed abstraction layers and boundaries and what you can do and can do.
00:48:58
Speaker
allows you to do a lot it's quite beautiful it's very elegant in some ways like other ways to design a system. I'm trying to think of what databases would challenge it. I'm thinking about Kafka but that's actually really simple storage engine. Fundamentally it's just an append only log so that doesn't seem like much. What if I said okay I'd like Postgres to be like Redis a really really fast in memory only key value store.
00:49:27
Speaker
Does that seem feasible? I believe so. I mean, I believe so. I'm trying to think if there's anyone that has done like a Redis-like system. There's a project for doing queuing on Postgres, like RabbitMQ type workflows, but that's a bit different.
00:49:48
Speaker
Top of mind, I mean, maybe I'll out myself there and this is not a good idea. I've thought about it for like five seconds. But if I were to do this, I would do something like you can make a custom storage engine with rocks DB or something like that, right? Which is one of the more battle tested, if not the most battle tested key value store. And probably you can implement something
00:50:09
Speaker
that can approximate this workload pretty quickly. And Postgres has in-memory table called virtual tables. So you can do all of these things purely in memory as well. Your index access method as well. I mentioned, well, everything that I've been talking has been with the assumption of writing on disk. Our full-text search index are written on disk. Our custom storage engine are written on disk. But those can also be virtual tables. They can be in-memory indices. So it doesn't have to be written. It's just the way the data is modeled.
00:50:39
Speaker
so you can do things in memory. I'm not familiar of a Redis competitive project on Postgres, but I don't see why it would not be possible. Maybe my suggestion is probably a little bit limited in this code, I only think about it more, but I don't see why it wouldn't be possible. Maybe someone listening to this will go out and build a Redis competitor. Maybe you can go and do it. Yeah, maybe. Okay, so I think there's one other really big issue
00:51:09
Speaker
that hits relational databases and is often something you give up for the sake of going to a different database, which is the transaction model, concurrent readers and writers, ACID, these classic relational database things that NoSQL scalable databases often give up first.
00:51:32
Speaker
Yeah. To what degree do you have to give up those kinds of guarantees in order to get performance?
00:51:40
Speaker
That's a very good question. This is something we spend a lot of time thinking about. So let me give a very simple example. In the case of our full-text search index, one question that we had was, do we offer a weak or strong consistency when indexing? So you want your transactions to be as fast as possible. As soon as data is committed, it should be available to execute SQL queries again. But then you have a dilemma, which is, OK, I have this search index.
00:52:09
Speaker
on top of my table that does indexing for full-text search, you go and you write a new row into your database. Do you want to wait until that row is also indexed in the full-text search index before the transaction commits? In this case, you get purely strong asset consistency, but you get the trade-off that
00:52:30
Speaker
your transactions are slowed down because you need that indexing to happen. Or do I say, hey, we'll give you weak consistency. We're going to commit the transaction right away, but it might not be fully done indexing on the full-text search index. So it might take a second or however long to fully commit.
00:52:50
Speaker
It depends what you want. In our case, the decision we've made in this specific example is to hold strong consistency because people come to Postgres, they want, as you said, they want acid properties and things like that. If you didn't care about those things, well, there's a lot of tools out there that NoSQL databases that give those up.
00:53:07
Speaker
People seem to care. But it does mean transaction a little bit slower. So if you're running a pure transactional workload with our extension installed versus without, your transactions are ever so slightly slower. It depends on the set of tradeoffs that you're willing to make. But you do have to give up some of those guarantees if you want a pure optimal performance.
00:53:36
Speaker
Yeah, I think I can see arguments for both ways, but I think you made the right decision. If I'm in Postgres, I expect things to behave in a Postgres-ish way. Yeah, that's always been our thinking in the first place. The number one priority for us is to remain purely Postgres idiomatic as much as possible. People come to our work because they love Postgres and they want to do more Postgres and we should do
00:54:06
Speaker
everything we can to make everything feel as Postgresy as possible to them. If they didn't care enough about Postgres to want to stay in it, then they might as well use one of the other tools that exists out there and they'll probably offer a good solution for them.
00:54:21
Speaker
We still had a heated debate. We had debates with early users of our database as well on this. At some point, we considered making this a setting that you could choose. Do you want consistency or strong consistency? For the time being, we decided not to do that.
00:54:39
Speaker
keep things a little bit simple, but who knows? Maybe down the line, it makes sense. But this is something also like Postgres is so extensible, you can basically make the user able to toggle fundamental database properties like ACID guarantees, which is crazy. What do you think about it? In the way you can design your index access methods, your table access methods, we could give the users the ability to select some settings and say, no, I want strong consistency or reconsistency.
00:55:07
Speaker
That's pretty cool if you ask me. The Postgres of the future will have three switches for the CAP theorem. And if you try and put all three down, one of them flicks back up. Yeah, exactly. That's the thing. You don't want to give people also, as you said, rope to hang themselves with. You want to be careful with options. You give them because sometimes that's not necessarily a good thing.
00:55:34
Speaker
There's a couple of things you can do, which is kind of cool. Yeah, yeah. Okay. You talked about early users, so I think we should talk about taking this into production because it's one thing to play around in the internals of Postgres and make it work in theory. But what happened in practice? How well does it work? How often did it crash?
00:55:55
Speaker
That's a great question. So our work is sort of split into two main categories with the search and the analytics. The search side is quite good today. It's used in production by 1 Fortune 500 today, which is maybe one doesn't sound like a big number, but our company is only about eight months old, eight time months old. And so it was quite a big deal for us when it happened. It's a huge number.
00:56:22
Speaker
Yeah, we were pretty scared, I'll be honest, the first time that we started testing that. But we have done testing on terabytes of data and billions of rows and it scales very well. Performance is very good. Things are quite reliable.
00:56:39
Speaker
So on the search side, it's very good. It's not perfect. But in terms of resiliency and reliability, it's very good. We haven't had any major crashes. The data doesn't get corrupted or anything like that. Some of it is thanks to the fact that index access methods, as I said, they sort of restrict a little bit more what you can do, which is put a blessing in a curse. When it comes to the analytics engine, our analytics engine is not production ready yet.
00:57:06
Speaker
There's still things that need to be done to make it fully Postgres idiomatic. It's used by companies, by mostly small and medium companies in dev environments, or let's say what some companies call production when they don't have really, really strong uptime guarantees yet and they're just starting. So it's probably going to be a few months until we fully hammer everything out and I feel comfortable telling people it can go into production.
00:57:36
Speaker
The main way people typically deploy us is to install extension on already running Postgres databases. And that's kind of the pitch, right? You shouldn't need another database. So we ship a Docker image with Postgres and our own extensions installed. But overwhelmingly, what we recommend people is say, hey, Chris, you run
00:57:58
Speaker
you know the postgres cluster at company abc you have a dev you know cluster you have a product cluster you can download the extension created. Make sure it works well and then like you then doesn't need to go anywhere and that's kind of the preferred deployment mechanism or people will integrate it and you know whatever you develop scripts and tooling they have for managing their cluster.
00:58:20
Speaker
Yeah, as I've installed the geospatial stuff, you just grab the directory and tell Postgres where to find it. Exactly. So that's kind of the deployment process. Can I ask, what are those last steps over the next few months? Which parts are you having to really hammer out?
00:58:42
Speaker
Good question. So there's a few things. On the analytics side, the big one is the buffer cache in Postgres.
00:58:54
Speaker
When we started doing our work, we tried to keep things as straightforward as possible. And ironically, that made them perhaps less straightforward, in that when you're making a table storage engine, there's a lot of pieces to it. And Postgres has this buffer cache, where basically one data gets written in, it gets buffered before it gets written to disk, or things like that. And that's used in a lot of ways.
00:59:18
Speaker
in a lot of places in Postgres. It's used for write-ahead logging, for example, so you can roll back transactions easily. It's used for deleting, being able to delete data. It's used to make sure that the ingestion speed is very good, because you don't want to flush the disk every single time that you write a row or you write a column. That would be very, very slow. But initially, we were just getting started, and so we did do these
00:59:45
Speaker
shortcuts, where it's like, hey, we're just going to flush everywhere to disk. And it works, but it means ingesting data is very slow. So today, if you use your analytics engine, queries are super fast. They're as fast as Clickhouse or some of those existing state-of-the-art databases. But if you compare the ingestion speed between us and Clickhouse, it's quite a bit slower.
01:00:05
Speaker
because we do that flushing all the time. And obviously, that's not going to cut it for production readiness, right? We have to wire it up with write-ahead logging. Right now, write-ahead logging is not yet implemented on our analytics engine.
01:00:21
Speaker
If you're just testing, it's fine. But in production, you need this to work with your high availability and failover toolings and things like that. And that requires redhead logging. So those are examples of functionalities that aren't, I would say, core to just seeing that the value of a faster analytics engine in Postgres is possible. But they are part of the functionality of saying, OK, now that this is possible, let's make sure it works in the production environment.
01:00:49
Speaker
And there's a couple of things that we do to make that a bit faster, to improve the speed at which we develop those features for our customers that are waiting for them.
01:01:04
Speaker
When it comes to working with remote object storage, for example, we don't suffer from those limitations because we overlay the Postgres table interface basically on data stored in S3. We don't store it inside Postgres itself. So we don't have to rewire all of these internals. So if you want to use our analytics engine with data stored in S3, for example, today, it's already pretty production ready.
01:01:31
Speaker
I wouldn't go as far as to say that it is, but it's like significantly closer than for data stored on disk because we separate storage and compute in that instance. Those are examples of things that we have left to do. Okay, very interesting.

Future of Postgres Extensions: A Look Ahead

01:01:46
Speaker
Is there anything on the horizon after that? I mean, are there burning features or perhaps even a burning extension in your mind that you'd like to see in Postgres?
01:01:58
Speaker
Oh, they are. Yeah, they are. You mean that we will do or that I would like others to do? Both. Both. Sure. So for things that we will do, I mean, there's the one that we just started. So our analytics engine is now being added the capability to do data-like housing, like data-like type querying, right? So yesterday, we released the ability to query S3. There's a lot of other places people store data that we want to query. We ship this as a separate extension.
01:02:27
Speaker
And so it's sort of a third one, which is a half C extension between a completely new one and our analytics extension. But the reason we ship it as a separate extension is so you have the ability to use it without any file system access, if you would want, for security reason by saying this analytics. We have an analytics engine that's purely in Postgres. And we have this one that can be in Postgres and stored outside, but you can make it be fully outside.
01:02:56
Speaker
Another extension we want to do down the line will relate to time series. I'm not going to say too much by now, but it's going to relate to time series, which is going to be exciting because great work that exists in the space, but also other things that, you know, can be done a little bit better perhaps. Or so we think that's one.
01:03:15
Speaker
As for others, there's cool stuff that other people are doing. I think the Redis example you mentioned is a cool one. The people that are working on the queuing system, that already exists, a company called Tembo. That's a pretty cool one. I think there's a lot of things that need to happen as well in the previous extension world to facilitate migrations. And I know there's some people that are working on extensions to do migration.
01:03:41
Speaker
Yeah, extensions that allow you to easily migrate data from other data stores, like MySQL, for example. There's a company working on one from Mongo, but for MySQL, there isn't something super, super good. That's another big one that I'm hoping to see more energy invested in. Yeah, that's something. Let me see. Those are the main ones I have top of mind. I have another one.
01:04:08
Speaker
doing time to lose and simple database maintenance would be a big one as an extension as well. We've had multiple users tell us they want to be able to set TTLs. We have literally rose deleted after a certain period for privacy and security. And today I've talked to a bunch of people that have reimplemented that themselves within their own organizations because
01:04:34
Speaker
There's no extension that does it. I think having TTLs would be a cool extension. If someone wants to build an extension with PGRX, that is probably somewhat straightforward. I think that would be a cool one that could be done and it would be valuable to people. Yeah, I could see Postgres with a custom table type of GDPR one day. Yeah, exactly.
01:04:58
Speaker
That would be cool. Gee, so the future of Postgres is not only bright, but it's expanding. I think it is. It's kind of wild when you're thinking about it, because a lot of people think Postgres is like...
01:05:14
Speaker
new in some ways. You're like, oh, it's this database everybody's talking about. It's got all these projects, like our projects that are trying to make it do more and so on. What is this cool new database? But it's like 30 something years old, right? It's just a slow ramp up in some ways. It's sort of the hype train in the MySQL years.
01:05:35
Speaker
Yeah, exactly. And so even though it's an old project, I think it's kind of at its beginning of its truly growth era. It's becoming a teenager right now. And there's a lot of things that are going to happen. The future is very much post-grad-centric, in my opinion.
01:05:53
Speaker
Well, that's both a bright note for technology and a generally happy note that when you're over three decades old, you might just be entering your teenage years. Exactly. You're just starting. You know, it's just the beginning. I'll go with that. Yeah. Philippe, I will leave you to go back and finish off that extension. Thank you very much for joining me.
01:06:16
Speaker
No, thank you for having me. It's been a great time. I don't know if I'll be able to finish it off today, but we'll put some more hours into it. Do your best. I wish you the best of luck with getting to production and getting to the second Fortune 500 company.
01:06:33
Speaker
Thank you. We're working on it. We're working on it. Good luck. Thank you, Philippe. So if you're in a position where Postgres does nearly everything you want, but not quite, maybe it's time to get extending. I've put links to all the projects that he mentioned into the show notes, and those would be good launching points to start from.
01:06:51
Speaker
While you are scrolling down towards the show notes, if you've enjoyed this episode, please take a moment to like it, rate it, share it with a friend, and make sure you're subscribed because we'll be back next week with another voice from the developer community. But until then, I've been your host, Chris Jenkins. This has been Developer Voices with Philippe Noel. Thanks for listening.