Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
The Evolution of Databases & the Future of Database Technology (with Ben Stopford) image

The Evolution of Databases & the Future of Database Technology (with Ben Stopford)

Developer Voices
Avatar
1.5k Plays1 year ago

Have you ever been overwhelmed by the number of databases on offer? This week we welcome database expert Ben Stopford as a guide to help us map the database landscape and make sense of it all!

Join us as we embark on a journey through the history of databases, tracing the path from Edgar Codd to the multitude cloud-era of options available today. Discover the strengths of various database styles and explore the tradeoffs between general-purpose databases like #PostgreSQL and highly customised ones like #Cassandra or #Snowflake.

We delve into the realm of the cloud and the opportunities it brings, both for users and the database vendors themselves. And then we examine the challenges that arise when you're forced to connect multiple databases across an organisation. Should you look at Event Sourcing? Or Event Streaming, and how exactly do they differ?

Finally, we look towards the future, discussing Ben's vision of an ideal database and which programming language he would choose to build it in.

Kris on Twitter: https://twitter.com/krisajenkins
Kris on LinkedIn: https://www.linkedin.com/in/krisjenkins/
Kris on Mastodon: https://mastodon.social/@krisajenkins

Recommended
Transcript

Choosing the Right Database

00:00:00
Speaker
What's the best database to use on your next project? Well, as many senior devs will know, the correct and completely unbiased answer is it depends. Depends on what you're using it for. It depends on what kind of performance you need and for which balance of tasks. It depends on your team's experience. What are you expecting them to already know and what's worth learning as you go? It depends.
00:00:28
Speaker
And while that's true, it depends, it's always true, it's not terribly helpful, is it? It doesn't give you a map to navigate the decision by. And it's especially hard when that map would be full of so many different kinds of database, all vying for your attention these days.
00:00:46
Speaker
So I thought we'd spend an episode of Developer Voices exploring the landscape, going on a recce of the current state of the database world, trying to figure out where everything lies.

Evolution of Database Technology

00:00:57
Speaker
And I brought a friend of mine, Ben Stopford. He has worked on designing databases. He's helped build database companies around new technology and new ideas. He's even written a book on how you join multiple databases together across an organization.
00:01:13
Speaker
I really can't think of anyone better to be our guide and our cartographer as we make this map. So, if you're using or choosing a database, or you're wondering if in hindsight you made the right decision, join us for a walk. I'm your host, Chris Jenkins. This is Developer Voices, and today's voice is Ben Stoppford.
00:01:47
Speaker
We're joined today by Ben Stoppford. Ben, how you doing, man? Hey, great to see you, Chris. Good to see you in your new house and new garden, judging from the background. That's right, yes. We moved out to the country. Moved out to the country to contemplate databases and technology in general. Absolutely. Perfect.
00:02:08
Speaker
So I have told several people in the past that you're one of those people that has thought really deeply about a specific field and really has a better grasp than almost anyone I know of that field in practical terms. And your field is databases. So
00:02:32
Speaker
The question I have, right, I've got lots of questions about where we find ourselves in the database world. And I'm thinking back through history and it's like databases, they were really ad hoc for a while. And then Edgar Codd came along in the 70s with a theory of relational databases and just defined the landscape for about 30 years.
00:02:55
Speaker
stop me if I'm saying anything untrue here. And then we hit like the internet age and everything just explodes and goes nuts. And I want you to try and help us understand why and what the constraints are and how we navigate so many databases. Yeah, there's definitely a lot of them.
00:03:20
Speaker
Definitely a lot of them. That's a huge topic. The starting point is the relational area took a long time to mature and has been pretty dominant. It's still around. It's just
00:03:43
Speaker
It has evolved also, though. I mean, the databases, the relational databases of the 80s, the 1990s are still relatively, you know, relatively simplistic in comparison with most of the ones we see today. So I think there's been a kind of evolution in NoSQL, in analytics, and in the relational world.

Open Source and Database Diversity

00:04:07
Speaker
And actually, importantly, in practice, the way that you actually go about using a database.
00:04:13
Speaker
But yeah, the landscapes definitely got a lot broader and open source has definitely helped with that. So if you think about it, there was always this kind of, for most of the sort of 1980s, 1990s and sort of the early 2000s,
00:04:35
Speaker
The database market was an oligopoly. There were a small number of big vendors that dominated the space. The barrier to entry was really high. Nobody was ever going to come in. The amount of time it takes to build a new relational database, the investment is gigantic. There was basically Microsoft
00:05:03
Speaker
Basically bought side brace this oracle and you know. The yeah they kind of pretty much dominate the market.
00:05:17
Speaker
and still do to a certain extent, but then the open source world kind of helped along with this like internet trends or the birth of the internet because what that really did was it was a bit of an innovator's dilemma.

Scalability and the New SQL Movement

00:05:36
Speaker
problem. There was a merging category of users that needed to build internet scale software and they had a problem they had to solve that you couldn't really solve with a relational database.
00:05:54
Speaker
Not because it was relational, but really just because it wasn't built to scale in certain ways. You could build a simpler database that would solve that problem. A lot of those types of things came out. That was really what the New SQL movement was about. It was about scale.
00:06:13
Speaker
introducing sharding much simpler query models not because you necessarily want to send the query model but just because you know if you if you got an internet scale problem you gonna pick the solution that can actually solve the bulk of it even if it means you have to work a little bit harder.
00:06:30
Speaker
What you've seen since then is two different sets of data-based technologies. You've got these mainstream technologies that have got better at dealing with everything.
00:06:51
Speaker
So your one-stop shop database is still what most people are doing and most people are using. And the big transition there has really been the utility of each database has got a bit broader, but also there's obviously been this transition to the cloud, which has changed things massively. And then on the other side, you've got these niche databases, which
00:07:17
Speaker
are very specific use cases which normally relate to some form of performance. So there's something about the performance characteristics of your problem that mean that a specific database is
00:07:36
Speaker
something that's written specifically for your area is much better. There are a couple of broad examples of that. Roughly speaking, you've got OLTP databases, and OLAP databases are still the two use cases. It's like, do I want to do transactional updates, where I've got multiple people competing around writing data, or do I want to do analysis on data that's actually mutable, i.e. you've got to create someone else somewhere else,
00:08:06
Speaker
I want to do some analytics on it. I want to create some dashboards. That's kind of the two broad categories. Are you optimizing for readers or writers because that's your dividing line? Yeah, exactly. Just embarrassing. I have to turn my phone off because I put it on silent mode. I thought I put it on silent mode.
00:08:25
Speaker
Oh, I hadn't put it on silent mode, don't be gay. I'll try and segue into notifications of real time. So yeah, apologies for that. But yes, I think, obviously, there's those two broad categories and then
00:08:45
Speaker
you know, the reality is this kind of like, there's the performance element, there's the utility, the functionality that kind of goes with it. And yeah, I think the
00:08:58
Speaker
Obviously, the analytic systems tend to be pretty separate from the transactional systems. Still today, there are some kind of people that are trying to do the one-size-fits-all, and then there's a lot of people using Postgres. That actually makes a lot of sense. Have you heard the old adage that the
00:09:19
Speaker
The best camera is the camera that you have with you, which is obviously why I like cameras on mobile phones. Often, the best camera that you've got, even if they're not quite as good as the fancy SLR lens, that I think is very true. If you're building like a microservice application, you're quite likely to use Postgres and you're probably not going to use, maybe don't care too much about performance. That's certainly where you should start.
00:09:49
Speaker
But if you're moving into one of these problems that's just very hard for a general purpose database to solve, then you need to look elsewhere. Particularly time series databases tend to specialize in analytics databases. And they are all making specific, normally fairly low level changes to the way that they structure data.
00:10:18
Speaker
really to try and improve either network time or disk time, maybe processing time, depending on what kind of database it is.

Architectural Considerations

00:10:27
Speaker
But usually, it's mostly to do with how you actually access data on disk. Do you think these dividing lines are like, are you mostly reading or writing? Are you mostly worrying about a single machine or spreading over the size into multiple machines?
00:10:47
Speaker
Are those the two axes of our graph? Yeah, I mean, I think that's a good way of thinking about it. Life is pretty much always easier.
00:11:03
Speaker
on a single machine. A single address space like life is just easy. If you can. There are certain types of query that are really challenging and just distributed environment.
00:11:18
Speaker
If you want to be able to do ad hoc SQL on a relational database, that involves joins. There are certain queries that are incredibly hard to do, certainly in a shared nothing architecture. Even with shared nothing architecture, shared nothing architecture is where you're just sharding the data across a bunch of nodes.
00:11:42
Speaker
So each node has autonomy over the data that it holds, like some subset of it. That being compared with the shared disk architecture, which is really where you have these processing nodes and they can all do everything and they share a big disk array. There's a big debate over which of these two architectures is better. The reality is that it changes as hardware gets more advanced.
00:12:08
Speaker
So interestingly, shared nothing got very popular and for good reason. It's very scalable, a very scalable architecture. These days, it's actually kind of going back and having a set of worker nodes and a shared disk array is actually becoming certainly a more preferable choice.
00:12:33
Speaker
because it gives you more utility. You can actually get away with these more complex joins and so forth. The architecture definitely matters. It does definitely split between those two, the operational ones and then the more specific analytic ones based on the performance.
00:12:58
Speaker
to each of them requires. Right. So are you saying that you're I mean, if you were working on a green or even brownfield project today, right, you you would aim to start with Postgres evolve and find out what your specific problems are? Yeah, I mean, I think I think
00:13:18
Speaker
If it's not broken, then there's no point fixing it. I think they're just a set of problems that warrant a certain different type of database. I think that's probably fairly well understood. If you are doing something that is time series based, or if you're doing something that requires
00:13:40
Speaker
aggregation, which is really like what a lot of time series databases end up doing anyway to some extent, then you are better off with a database that at least knows how to organize data in a way that suits that. And what that really means on the most part is some sort of columnar mechanism.
00:14:03
Speaker
So, a column in a database, the difference really is just the way that they lay the data on disk, they lay it by column. And what that really means is the biggest benefit actually tends to come from compression. So if I have a, if you imagine if you've got like a set of numbers or a set of
00:14:28
Speaker
column full of text. You can do very efficient compression on that, even just like basic compression, like run length encoding. And it can actually be really efficient in terms of reducing the amount of data that you've got to move around. And if you can change that by a factor of five or 10, then at relatively little CPU cost for the compression and decompression algorithms, then that can actually significantly improve your performance.
00:14:58
Speaker
Likewise, when you're doing an aggregation, let's say you're grouping by some fields, you might be grouping orders by the regional
00:15:12
Speaker
user's name, etc. These things work very efficiently when you've got a single column. That kind of column-oriented mode is very different to the row-oriented mode, but it's a trade-off. If you do select star from a database, there's column-oriented database, and it's got 100 fields in each column.
00:15:36
Speaker
in each row, then it just takes a long time to construct all that stuff together. Where a single query, a single field aggregation in a columnar database is incredibly fast. That's an example of where something very specific works. In that analytical space, if you're doing analytics, you're using, I guess, something like BigQuery or Redshift. These are examples of
00:16:05
Speaker
databases that are designed to do that very specifically and just literally can't do transactional workloads. There are these hybrids in between, but on the most part,
00:16:16
Speaker
Yeah, if you're doing kind of analytics on web data, data that's not being mutated, then you probably know that and you probably are going to go for one of these columnar things. But the interesting thing probably these days, increasingly is that the utility that comes from the cloud providers is probably more important than
00:16:37
Speaker
um, the actual underlying database itself. So that's, that's probably like the, yeah, that's, so whilst we've had this kind of Cambrian exploration, particularly in the 2000s and maybe like the sort of early, early in the last decade, I think it's kind of like stabilizing a lot now. So, you know, what's really happening is a lot of the,
00:17:03
Speaker
There's been a lot of super new ideas, super different approaches to solving database problems and to hosting them on the cloud. We haven't really gone back to an oligopoly, but it's definitely
00:17:23
Speaker
a massive landscape with, I would say, a relatively small number of leading players. And that's probably the way that it's going to end. And maybe sadly, the reality is, when you store data in a database, even in an analytics database, you want it to be reliable. You want it to be in the region that you want it to be in.
00:17:52
Speaker
All of these things are really, really difficult to build, make it expensive to do, to do really well. That's why you kind of end up with going back. So we're probably going back to the oligopoly that we had weirdly in the 80s and the 90s.
00:18:12
Speaker
There's more players now. The field's bigger, but that's kind of where we're going back to. Do you mean dominant players, as in you think they're a handful of companies emerging or you think they're a handful of architectures emerging? I think the reality is there's a handful of companies and a handful of architectures. I mean, there's always kind of been a handful of architectures.
00:18:41
Speaker
All that really ever changes from a database perspective, all that really changes is hardware. That's the truth. Shared nothing's been around since, I mean, it felt really trendy in the 2000s, but it was around in the late 1970s, early 1980s, and the first Teradata was doing shared nothing. Early 1980s, I mean, very basic,
00:19:10
Speaker
These things have been around. The main thing that changes is really just this shift around.

Uptime, Resiliency, and Cloud Shift

00:19:18
Speaker
Networks get faster, CPU processing gets more efficient. The big changes in terms of resiliency and how recovery and algorithms for consensus and all of that has definitely changed. And what we demand for uptime has been the big one, I think.
00:19:40
Speaker
It used to be that every database had like overnight where it could be down. The weekend you could switch it off for maintenance, right? Yeah, I mean, it's really hard to do and it's really hard to, it's probably like, you know, one of the hardest problems. And I think it's really that.
00:19:58
Speaker
Is it, you know, our expectations of what a database should do and the way that, you know, in the up times that it should be able to perform, it should be able to maintain and the, you know, resiliency guarantees that it provides definitely increased romantically. And again, I think that's why you're kind of getting this, this, uh, you know, a smaller subset of more dominant players in the database space.
00:20:27
Speaker
Do you think inevitably then that all database creating companies will become database providers? Do you think the difficulties of maintaining an always on database will eventually become the domain of people that write databases? Yeah, all cloud companies, yeah.
00:20:48
Speaker
But do you think that companies that make databases will inevitably be pushed towards being cloud companies? Oh, yeah, absolutely. I think that's already happened, right? I mean, the number of popular databases that aren't available on the cloud is pretty small now, I would say. And certainly, if you're not on the cloud, then that's
00:21:16
Speaker
Because that's half the problem, right? Getting it to work, getting a database to work on the cloud. Yeah. That's a arguably as big a task to do well as building a database in the first place. So really things have to go hand in hand. Because I would have thought initially that creating a database from scratch would to most people seem a lot easier than then deploying it to the cloud.
00:21:46
Speaker
I mean, creating a simple database is easy, but creating something that works, I think, is pretty hard. Yeah, but that kind of works well. I mean, so I guess if you look back in history, let's pick. So if we look at some of the sort of, you know,
00:22:11
Speaker
NoSQL databases like Mongo was famous for its ability to use data, if you remember. In the early days, that was definitely true, sadly.
00:22:24
Speaker
Yeah, yeah, I mean, it didn't, it didn't write anything transactionally. Um, it was kind of a mess and they, they basically, you know, storage engine, they did really well. They bought, they bought this, this thing called wired tiger, which was somebody else basically went and rebuilt the Mongo storage engine. Um, and you know, Mongo DB were able to acquire this company. It was called wired tiger. Um, and.
00:22:52
Speaker
had a good storage engine. The reason that people like MongoDB had a great query model. It was very developer-oriented. They had very good marketing. It was actually more that than probably anything else. Then they were able to catch up in the background and
00:23:12
Speaker
and they did that by replacing the storage engine. The last one, most InfluxDB was another one they ended up, they originally used, I think they used RocksDB. I can't remember, but they ended up writing their own for time series. Time series, again, is kind of a slightly, it's a tricky, it's a particularly tricky problem, time series databases.
00:23:40
Speaker
And then, yeah, you're kind of getting these kind of more esoteric ones. I guess if you look at things like event sourcing, that's kind of, that's what I would describe it. Is event sourcing a type of database? Does it require a type of database? Or is it just a pattern that you use over the top? Say the same thing with for, you know, bi-temporal databases. Is that an implementation pattern? Or does it actually warrant a database
00:24:10
Speaker
you know, a specific database, you know, that's designed to solve that problem. And if you go for the specialist ones, then they will do a better job. They will be more, which really just means there will be more performance, you might get some features that work better, but you can probably build all of this stuff in Postgres, you just a might run a bit slower and B, you might have to work a little bit harder to get the kind of queries to work. So even under those circumstances, would you say start with Postgres until it becomes painful?
00:24:42
Speaker
I would always pick Postgres until I knew I thought I was going to have a problem.
00:24:51
Speaker
that where it isn't going to fit. But, you know, that's kind of if I'm working on premise, if I'm not, if I'm working on the cloud, I'd probably pick the one that made the most sense to me. So the big, probably one of the big changes that occurs with the cloud is actually a little bit easier to, you know, if it's changing database technology was always very, very difficult.
00:25:19
Speaker
You know, but yeah, if you can think back for 15 or 20 years, you just didn't really change. People would talk about changes. They talk about like, yeah, it's all ANSI SQL 92 compliant or whatever. Very, very rarely do anyone ever really change. Yeah, there was this dream that you'd be able to just swap out Postgres for MySQL for Oracle and it never really worked beyond anything basic. No, well the vendors also have like a
00:25:48
Speaker
They're kind of incentivized to try and get you to lock in. So they add in these little features, which are really useful, but make it hard for you to kind of lock in. And then the reality is that semantics, although the standard would be the same, the actual implementation is not necessarily quite the same, the execution times may not be exactly the same.
00:26:09
Speaker
So it is quite hard. It was very hard, I think, to switch. You had to buy new hardware, you had to do all this stuff. On the cloud, it's like a little bit easier. Mainly, partially because you don't have to worry about, you know, you can just try a new service. But partially because the way that people are going to use databases these days is more like a repository than it used to be. Like, there was a, you know,
00:26:36
Speaker
If you're sitting behind an ORM and doing most of the stuff in the application space, then it is probably a little bit easier.
00:26:44
Speaker
Or it is a little bit easier to switch between different providers. But then the argument is that, well, if you're working through an ORM, then you're not really using a database as anything much more than a store for your application. So that's very difficult. Difference is something. There's a lot of business value, as you would see, let's say, in the article side. Or maybe if you're doing something that's highly transactional, where you actually care about performance.
00:27:09
Speaker
This actually leads on to something, maybe you've got some ideas about this, that I've always felt was a huge tension in the database world, is that you've got ORMs, and they never really work beyond the basics, not that well, because there is a fundamental tension between object orientation, and if you've got a relational database, relational set theory. Yeah, I was just always thought ORMs were just, you know,
00:27:39
Speaker
There was just a way that people could be a bit lazy and not have to learn SQL. I mean, but yeah, the, uh, it's convenience. I mean, yeah, there is this, this, this relational, you know, obviously there's like a object relational mismatch. It's, it's a very real thing. If you're writing something very simple and an ORM definitely helps you. Hmm.
00:28:09
Speaker
Um, yeah, I mean, I'd argue like if you're a serious application developer, you're going to know enough SQL to be, to be fairly dangerous. Um, then yeah, you're better off just doing it yourself because that way you actually at least know what's going on. Like debugging, I mean, it's a while since I used, I guess the, the bigger ORMs, but
00:28:34
Speaker
debugging ORMs was always like relatively painful. Yeah. And inevitably, you tried to get down to the point where it was just SQL anyway, so you could actually understand what's going on. Yeah, I mean, if you care about performance, then and you're building an application that actually probably the best way to think about it is if you are investing significantly in your application,
00:28:58
Speaker
then I would say, and the database is more than just a store of, let's say, semi-mewable state, then you're probably better off just embracing the database as being part of your application. I think if you're a developer, then you should learn how to
00:29:18
Speaker
You should learn your database. It's as much part of your application as anything else. You should learn how to get the best of it. Wrapping it in an ORM so that, yeah, for me it's, as you said, there's this mismatch. It's better to kind of manage that, kind of mismatch yourself. Putting a little fast application. It's a brilliant website. Just want to get it out the door. Yeah, ORM is probably fine.
00:29:49
Speaker
So as ever, it depends on your constraints. But there are some rules to go for. OK, so that takes us across to the other kind of integration question. Maybe I can reference back what you were saying about columnar databases. They're great if you want to aggregate a single field. They're kind of lousy if you want to get a single row by ID. If you need both, if you genuinely need both,
00:30:19
Speaker
and under high performance conditions. So let's say you're not allowed to say Postgres. You're spreading over multiple nodes at very high transaction rates or whatever. Is there any kind of universal integration pattern if you've got to use two different styles of database? Well, firstly, there are actually a bunch of approaches that do do both.
00:30:49
Speaker
or trying to do better. So like, there are definitely architectural styles that kind of give you your cake and eat it to a certain extent anyway. So generally the patterns that get used here are
00:31:09
Speaker
There are databases that effectively have a lot of databases to do this in some form or another, but they have really two different types of database inside the database. For example, you have something transactional, which is accepting data, which allows you to do
00:31:29
Speaker
So fast writes and you can do things checks and violations and so forth inside the part of the database that is responsible for really
00:31:44
Speaker
taking data, getting it down on disk transactionally, and then you've got another part of the database which is suited for queries. I think Druid is a good example of this. It's like effectively two different databases inside it. Now, the reason that's a little bit tricky
00:32:04
Speaker
certainly from a database programmer's perspective, is that you've got to manage these two different stores, right? So you've got data in one, and data in another, and when somebody sends a query, you kind of have to like...
00:32:15
Speaker
query both of the databases inside, but you own both of them so it's not really that hard to do. You can actually do exactly the same pattern using something like event streaming. This is what a lot of people obviously do at a macro level is they have like an operational database and they have an analytical database and they use something like Kafka to move the data from one to another and you kind of address the one that you want.
00:32:39
Speaker
But yeah, the ability to kind of do these things internally is kind of more powerful. So you're seeing like, you know, snowflakes trying to do stuff like this at the moment, you know, they're trying to increase their ability to do operational workloads. You know, companies like Oracle, which is
00:32:58
Speaker
know, that they, you know, they're actually, you know, they're a good databases is a pretty impressive piece of technology. That kind of manages to do both, you know, through, actually, mostly through brute force through through hardware optimizations, but it wasn't really clever, clever technology, you know, technology in there. So yeah, that
00:33:21
Speaker
My guess is that whilst you can do everything with Postgres, but the reality is that the workhorse database that sits in the middle, its abilities are never to be going to grow. I think the cloud really helps with that because you do get these big players like Snowflake that have got their investment budgets, but it must be massive.
00:33:51
Speaker
Um, and they've got like, uh, you know, the opportunity to kind of host, which gives them a lot of control over the way that, you know, the, the optimizations that they can make because they own the whole runtime. Um, and they'll, you know, sort of predict provisioning planes. So there's definitely an opportunity for that, you know, that more generalist database to kind of grow into a lot of different, different use cases.

Investment and Innovation in Databases

00:34:17
Speaker
Um,
00:34:20
Speaker
I'm not sure that answers your question. I know you have fans in the Kafka community and you go back a long way into event sourcing. I was trying to push you into discussing is event sourcing
00:34:37
Speaker
What do you think stepping back at the moment? Do you think event sourcing is like a universal bridge? Is it part of the puzzle? Do you think potentially we'll just start running more databases that connect to Postgres's event log and read directly from that?
00:35:05
Speaker
do you think Kafka has a sweet spot that like event sourcing has a sweet spot that just works for specific organizations? Yeah, I mean, although I've read a lot about I've written a little about event sourcing in Kafka. You know, I think it's and yeah, there's a there's a lot of people with a very strong opinions about this. But, you know, event sourcing and event streaming are
00:35:30
Speaker
very closely linked, but they're actually quite different. I actually think that event streaming is a lot more powerful than event sourcing, just because it solves a very different need. So it solves that need, which I think is obviously what you were getting up for, which is the ability to basically tie different databases together. But it's not really about tying different databases together. It's about embracing the fact that your application is not a little island.
00:36:00
Speaker
It's not just you and your database and that's it, unless you're like some tiny little company. It's you and a whole bunch of other systems. Most companies have
00:36:12
Speaker
Tens, hundreds, thousands, tens of thousands of different systems that need to somehow operate together in a way that looks joined up to a customer or an internal user or what have you. The reality is that one database can never do it all because it's not one application. It's really hard to share data across applications, particularly with a database. You end up having to embrace this anyway, and that's where event streaming comes from.
00:36:41
Speaker
And it takes a lot of the elements of event sourcing. But when we say event sourcing, when I say event sourcing, it tends to just mean the application of events to store data at the level of an application.

Event Streaming vs. Event Sourcing

00:36:57
Speaker
Whereas event streaming, for me, is using the same toolset and the same thinking
00:37:05
Speaker
to move data across different applications, different microservices. And it's that kind of fabric that joins it together. So event streaming is just much more powerful because it helps solve this very real fundamental issue that you have that you're going to coordinate data across a variety of different applications. And you're not going to do that by sharing a single database for a whole bunch of different reasons.
00:37:35
Speaker
Then sourcing itself as a pattern, and you know this, I know you do, but in my mind, it's a really nice pattern to build an application with, but I actually think that you're better off using a bi-temporal database. That's my take. I think you have to go into that argument a bit deeper.

Bi-temporal Databases and Their Impact

00:37:58
Speaker
Yeah, well, by temple databases, I mean, that's like a whole different thing. But I'm like a real fan of by temple databases. It's a fairly niche pattern, you can build a by temple database on Postgres. Actually, there is a, they, there was actually a proposal to put a language extension to do it, kind of provide like proper support in Postgres for by temple data. But
00:38:24
Speaker
James Henderson that it's being argued into the spec already has been argued into the spec. Was it? Okay. There's definitely some motion on it about a decade ago, and then it just kind of stalled. It was a bit of a shame. But for me, it's kind of event sourcing done right. It doesn't have all of the attributes of event sourcing, and it actually has very little to do with event streaming.
00:38:55
Speaker
But it has basically most of the really good stuff that you want. So you can't have a conversation with event sourcing people about it because it becomes all about the different sort of, you know, the dogma that surrounds event sourcing. And it's got nothing to do with that. It's just to do with the utility. Like, why do I actually want to event sourcing in the first place? Well, normally, because I want to make sure that I have a record of what really happens. But then I also want to have this lightning.
00:39:21
Speaker
efficient way of viewing the world. Effectively, I want to be able to have a table of orders and I want it to look like, well, I want to have my shopping basket and I want it to look like my shopping basket so I can just spare it to the user, but I also want to have this structure that tells me exactly what happened and it maintains over time, has that audit and it's all built into the fabric of the way that the data is stored and I can also
00:39:52
Speaker
irritate the log and port it to another machine and by temporality does that really just by
00:39:58
Speaker
then two indexes on a table for two different times. So one is your world clock time, that's your event log, and then you basically load over that and layer a temporal index, which gives you business time is normally the terminology that's used. That's just like a view, which basically gives you the terms, the event log.
00:40:24
Speaker
into that table and it just maintains both all the time. There aren't many specific bitemporal databases, but the reason that you want a specific bitemporal database is that
00:40:46
Speaker
you want something that's going to maintain that view very efficiently. It's actually pretty difficult to compute. If you just build it in Postgres, the actual query that builds those two views with the event log view and the tabular
00:41:04
Speaker
Everything's reduced by key view. It's expensive queries. You end up using timestamps, you end up with a greater than equal to this time, less than equal to that time, or less than that time on every single one of your queries. Sometimes query optimizers can end up doing table scans and it just gets pretty painful. In a bespoke database, you can build something that's a bit more efficient.
00:41:31
Speaker
and can take advantage of the fact that you know that you're going to have queries of this particular type. But I said there aren't really many of them. But yeah, it has this really nice, yeah, it solves a lot of those event sourcing problems in a really neat way. But it doesn't give you everything that event sourcing gives you. Probably like 90% of it.
00:41:56
Speaker
Okay, so is that your dreamed of future where we have a bi-temporal database acting in a lot of applications as their core database, and then streaming it out as an event log to speak between departments? Yeah, I mean, I think getting an event log out of the bi-temporal database is kind of natural anyway, so I think you kind of have that. I think event streaming is still separate,
00:42:26
Speaker
Um, but the, you know, if particularly on the, the, you know, the LTP side on what, you know, where you're sourcing data, um,
00:42:41
Speaker
If every single database, because a by-temple database looks like a regular database, it has all tables by default to look like normal tables. You select stuff from your basket and you get your basket. Inside the basket, Chris has chosen
00:43:03
Speaker
three pairs of trousers. He's got three pairs of trousers in his basket. It doesn't say Chris added a pair of trousers and removed a pair of trousers. And then he added some whatever, you know, some glasses or something and removed them. It doesn't give you that event, obviously by default, it's there, but it gives you the nice stability. So it feels and operates just like a normal database. But because it maintains that log, you have that audit trail, you have that ability to easily create that, you know,
00:43:31
Speaker
an event stream of it and that database is do this anyway cuz they do it in their transaction log, she's exactly the same thing, but it's about kind of wrapping that up in a way that works really efficiently and if every if every database sort of had that book that functionally functionality out the box, it makes event streaming a lot easier because you know, they're all the storage model is actually
00:43:57
Speaker
design in such a way that it's maintaining this view. You don't just need to have a connector that's going to be there at the right point in time to pull the event log out. You're not throwing this event log data away, which is what pretty much every single
00:44:24
Speaker
This is the source database does. So you're just in a much better place. And I think that that's going to be the future. But I've been saying that for a long time, and it's taken off. So maybe in the next venture will be, my next venture will be by a central database company. I don't know. I'd like to see that. I'd like to see how you do it.
00:44:49
Speaker
What, okay, so final question then, in this theoretical future where you go and build a bi-temporal database company, which language are you going to pick? Oh. Well, it wouldn't be JVM based, I'm so sure. Oh, why not? Oh, it's really hard to manage data on the JVM, it's painful. I mean, yeah, I mean, so it works for like, like, it works well, works for Kafka.
00:45:18
Speaker
works for like a screen streaming use cases. You know, I think that you can, you can definitely do it, but you just have to fight harder. So, you know, I mean, ultimately, bringing a lot of data into the JVM and manipulating it is painful. We talking about disk management or memory management or something like that? Yeah, it's basically because your options are either you bring it onto the heap,
00:45:48
Speaker
In which case, it then has a bunch of extra... Java creates a bunch of extra overhead and there's a level of abstraction which makes it hard to manage large data sets as well on the heap, not to mention garbage collection. Or you can manage it off heap.
00:46:08
Speaker
which is slightly better. But then you still have to do garbage collection. That means, you know, ultimately, most problems require garbage collection. So if you do it off heat, then you've got to manage the garbage collection yourself, which could be more efficient, but it's also quite hard work. And then, and then like, if the
00:46:25
Speaker
Yeah, and if you do manage to, then you've got to deal with serialization issues, every time you break back on and off again. So it's just quite a lot of, for very highly, if anything requires storing a lot of data, particularly if you want to bring that data into memory and manipulate it, then the JVM is kind of tricky. You probably break it off with something else. Yeah, I mean, these days,
00:46:52
Speaker
Rust looks pretty good. But does it have the library support? Yeah, I don't know. I think the jury's still out. Okay, fair enough. I thought you'd go for something in the sea-ish family. Yeah, I mean... Which I'm including Rust and Go in, the sea-ish kind of way of doing things. Yeah, I think that's
00:47:19
Speaker
Rust, Rust maybe go. Yeah, definitely. I mean, or even just like, you know, see, I mean, there's this C-style library which CeliaDB created, which looks fairly painful to program with, to be honest, but, you know, that's like a C rewrite of Cassandra, which is a database, an LSM-based database, which is built on the JVM.
00:47:49
Speaker
Yeah, it probably does have some performance improvements. I think if you're going to start again these days, yeah, you're probably going to start with something that is not JVM-based if you're going to build a database. Fair enough. I shall leave you to enjoy your new garden, mow it, and contemplate how to build your time series database.
00:48:13
Speaker
will do. All right. Well, great. Great to see you. Thank you. Thanks very much for having me on the show. Thanks for joining. It's a pleasure, Chris. And yeah, Ben Supwood. See you again. Cheers. Bye. Thank you very much, Ben. Now, I don't know if Ben is actually going off to work on his Rust database of the future. But if he does, I'm really hoping he calls it Ben DB.
00:48:37
Speaker
I think it's got a good ring to it. And the mascot just designs itself, right? Bendy B. Perfect. I'm not sure he's going to thank me for saying that, but if you'd like to thank me for making this episode, please take a moment to share it, tweet it, rate it, click the thumbs up icon, like, subscribe, all that stuff. You know how this works by now. Did you know that you can rate podcasts on Spotify, but only from the mobile app?
00:49:04
Speaker
true fact that you Spotify listeners might want to do something with. And of course, if you want to get in touch with me for any reason, including inviting yourself on the show, my handles for Twitter, LinkedIn and Mastodon are in the show notes. But until next time, I've been your host, Chris Jenkins. This has been Developer Voices with Ben Stoppford. Thanks for listening.