Challenges with Real-time Updates in Databases
00:00:00
Speaker
Databases are great at storing data, storing what happened, when it happened, who done it, that kind of thing. Traditionally, they're not so great at saying, this is what's happening right now. If you've got breaking news, your user wants to order a taxi right now. Your customer is trying to send you money right now, but that process is failing. Those are really useful things to know as they happen.
00:00:26
Speaker
but most databases aren't built with liveness in mind. They're generally built in a theory that says, first you store the data, later you can query it out. Very useful, but missing that piece of newness.
Introduction to Debezium for Real-time Data Capture
00:00:42
Speaker
And that's where today's topic comes in. We're going to take a look at Debezium. It's a tool that can tap into a huge number of existing databases and capture that sense of what's happening to them right now and turn it into live notifications, real-time analytics, even just ferrying the data to other systems in a hurry.
00:01:03
Speaker
You can use it from Mongo to Kafka, or from Oracle to Kinesis, or MySQL to Google PubSub, all kinds of options. And one of the things I think makes it particularly interesting is it's minimally invasive. You can have an existing data model and an existing database, maybe even a legacy database you can't touch, and yet you can still use Debezium to tap into that stream of data and make it valuable.
00:01:32
Speaker
So, I think it's worth learning about, and to learn about it, we've got in an expert.
Interview with Gunnar Malling on Debezium's Architecture
00:01:36
Speaker
We've got Gunnar Malling. He's the former project lead for Debezium, and he's now a Debezium and Flink expert over at Decodable. And he's going to take us through what Debezium can do for you, what it demands of you, and how the whole architecture fits together. So let's get stuck in. I'm your host, Chris Jenkins. This is Developer Voices, and today's voice is Gunnar Malling.
00:02:14
Speaker
Gunnar, how's it going? Hey, Chris. Yeah, all is good. Thank you so much for having me. We're really looking forward to our conversation. Me too. Me too. Your name. Well, the name of your project was dropped in a recent podcast. We had an aside talking about Debezium. So I thought, let's get you in for a proper deep dive into what the heck it is. Okay. Yeah, that sounds good. You've probably got as good qualifications as anyone on the planet for this. You were the project lead for Debezium for a number of years, right?
00:02:43
Speaker
Right. While I was at Reddit, I was to project it for about five years or so. Yes, starting in 2017, all the way up to last year. Yes, I spearheaded the project and experienced quite a few interesting things during the time. We're getting into all of that. Maybe that sounds more mystique than it is, but yeah, it was a fun time. You know, sometimes programming isn't a glamorous job. We have to do everything we can to evoke some mystique. Exactly. I can tell you about things, you know.
00:03:13
Speaker
Don't ask me. I cannot talk about everything. Let's get more brass tacks about this. First up, what is Debesium? Why does it need to exist?
Technical Overview of Debezium's Change Data Capture
00:03:31
Speaker
Well, what it is, it's a solution for change data capture. So in the simplest terms, it's about getting notifications out of your database, like Postgres or MySQL or SQL Server or any, you know, quite a few more. Whenever something has changed, like something gets inserted, something gets updated or something gets deleted, you want to be notified about this event and then
00:03:54
Speaker
make sense out of that information, right? So you would like to maybe update or invalidate the cache, put the data into a search index. But in a nutshell, it's about reacting to changes in the database and telling the outside world about it. Okay, first dumb question, why isn't that just a database trigger? Why is that just not a database trigger? I mean,
00:04:17
Speaker
And yes, triggers can actually be one way for implementing change data capture. And we can talk about, you know, all the pros and cons. One of the things to keep in mind, the trigger would sit on the right path of your transactions, right? So that would be some sort of performance implication. Whereas what the Bism does is it's what's called log-based change data capture. So this is an asynchronous process, which essentially tails the transaction log of the database.
00:04:43
Speaker
So it's, you know, very reliable, very robust. It doesn't sit in the right path. And then of course, by going through something like Kafka, it gives you connectivity with all those kinds of sync systems. And I guess my question would be, how would you do this with a trigger, right? So how could you send a REST request there or send a message to Kafka? I guess it would be a bit of a stretch to do that with triggers.
00:05:07
Speaker
Yeah, I kind of asked the question in a provoking way, but I think one of the issues... And I totally fell for it. You did, you did. Maybe I'll throw in... I mean, the right path is a big thing. It does block how long it takes the transaction to complete. The other one is, I think triggers really work when you're staying within the database. But when you want to get outside, that's a big issue. Exactly right. But I'm cheating. I can't throw you questions and then half answer them for you.
00:05:37
Speaker
Okay, I'll try better next time. We'll ramp it up as we go. Why are people trying to capture changes that happened? Because that doesn't seem like a normal thing to do in a relational database model.
00:05:53
Speaker
Okay, so that's a good one. I mean, I think there's tons of motivations for doing that. And I mean, we can talk about use cases all day long, but I would say in the most common one, I would say it's just replication in the widest sense. And maybe not, you wouldn't necessarily use it if you want to replicate your data from one Postgres instance to another Postgres instance. Then I guess you would just use the tools which come with your database. But
00:06:20
Speaker
Maybe you want to take your data out of your operational database into a data warehouse, something like Snowflake, or a real-time analytics store, or something like Apache Pino, or a search index. So you want to cross the vendor boundaries. Or maybe you have a production Oracle database, and now you want to put this data for offline analysis into a Postgres instance. So replicating across all those vendor and system boundaries, that's definitely
00:06:45
Speaker
I would say a super common use case, just taking data from one database to another or data warehouse on the same side of sorts. So that's replication in divided sense, but then there is like cache updates, caching validation. So maybe you would like to have a read view of your data, which sits close to the user and maybe it has some sort of
00:07:04
Speaker
denormalized representation of a data model. So you need to keep this in sync, of course, with your operational database. And then reacting to those change events is the perfect way for keeping that cache in sync with your operational database.
00:07:18
Speaker
Do you know, that reminds me of a project I worked on that had exactly this issue that we wanted to... And we ended up doing sequel polling, which was not ideal. There was a mechanism in Postgres, I remember, that would let you watch a table and get notifications. But if your connection to that session dropped, it lost your place in the stream later.
00:07:38
Speaker
Yeah, I think you're referring to the listen notify API. Yes, I am. And yes, exactly. It kind of lets you do that thing. But as you say, if your client isn't running, you wouldn't be notified about anything which happens during that time.
00:07:56
Speaker
Whereas, you know, with this log based approach, all this is like fully reliable, fully safe. So essentially if the reason would restart after such a downtime, then it would continue to read the transaction log from exactly the point in time where it left off before. So it stores, essentially like the position and the transaction log. How far did I get? And then if it gets restarted, it will continue from there and you will not miss any change.
00:08:22
Speaker
Right, so maybe we should just quickly dive into that to make sure it runs on the same page so modern database. Every time it makes a change it writes that data to change lock party recovery but also to hook in standby databases.
00:08:38
Speaker
Right, exactly. Those are exactly, I would say, the two reasons for a database to have a transaction lock, a SSA transaction recovery. So, you know, if something crashes, the server, I know, loses power while some transaction is running, then it will be able to go back to a consistent state after restarting. And then, yes, you would also, you could use those transaction locks to keep your replicas in a database cluster.
00:09:04
Speaker
In that sense, the Bism is acting like kind of a replication client. It gets hold of this replication stream of data from the transaction lock and then keeps its own view of the data, you could say.
00:09:20
Speaker
sneaking him around the side and reading the same binary file? Or is there some API that, say, Postgres or MicroSQL are providing that you hook into? Right. So it depends a lot between the different databases, which the museum supports. So unfortunately, I would say there is no standardized API or standardized way for, you know, so we could implement the museum once and be done. Instead, it's a bespoke effort for each and every kind of database.
00:09:49
Speaker
So in case of MySQL and Postgres, for instance, yes, there is essentially, you know, remote protocols, let's say, so you establish a connection to the database, and then this will give you like a callback or will notify you whenever there's like a new event and transaction lock. So you can just be any kind of remote client for other databases, let's say SQL Server.
00:10:11
Speaker
This gives you essentially a view of the transaction log in forms of what they call CDC tables So change their capture tables that case the Bism goes to those tables and queries them For Cassandra again, then, you know, we need to actually have a component which runs on the actual Cassandra host so we can go to the actual transaction log files on the file system and get hold of them so it differs between all the connectors, but then the
00:10:40
Speaker
format which the Bismx poses, that's one unified and abstract format, so the users don't have to care about all those nitty-gritty details. You're in one of those situations where this is horrible, every vendor has a different standard, we shall create one new standard and then do all the work.
00:10:59
Speaker
Right. Yes. Exactly. I mean, that's exactly the challenge. And, you know, I would say it's a bit of a long tail, of course, of connectors. So every now and then, you know, people come to the Bismuth community and they say, Hey, we would like to have support for DB2 on the mainframe. Can you make it happen? Okay. And then, you know, the project needs to weigh, of course. So is this like a common thing or how feasible is it?
00:11:25
Speaker
terms of DB two and the mainframe. So how would you even go and test this? Because I mean, I don't know what in the division community has a mainframe under the desk. So how do you even develop this kind of thing? Um, but yeah, that's, that's the challenge. But then on the upside, you know, the project does this sort of unification effort. So as I mentioned, the format, which, which it exposes to users, this is abstract and generic. I would say it's kind of a de facto standard, which, um, it managed to establish. Okay.
00:11:54
Speaker
It's funny how, in our industry, it's only the de facto standards that actually seem to work. And the ones where three large vendors get together and announce it, apart from SQL, maybe, it will seem to flounder. Right. Yeah. I mean, there were some initiatives around having a CDC standard, but it didn't really go anywhere so far. And I mean, this kind of, with Debizum, it kind of naturally happened just when there would
00:12:20
Speaker
you know, announce their support for the DBSume change event format. So let's say CillaDB, you know, they developed their CDC connector for their CillaDB database using the DBSume connector framework. And now they also support the DBSume change event format and then also all kinds of
00:12:43
Speaker
You know, consumers of those change events, they also support this format, something like Flink SQL or, let's say, Materialize, they actually natively supported to be some change event format. So this defacto standard thing just kind of happened. Yeah, defacto, I guess. OK, give me an idea of what that standard looks like then. You know, if I do if I do an update on a user's email address in Postgres, what's going to happen on the Debezium side?
00:13:13
Speaker
Right. So in terms of the event format, mostly it is about the, so the schema resembles the schema of your tables. So let's say, you know, you have this customer table with your email address column and you have five other columns. Well, then your change events, they would have, uh, those six fields representing those, those columns by default. I mean, you can, you know, you can override configure or filter everything, but by default it would be that. And then in case of an update, well, you would have the.
00:13:42
Speaker
previous day of this row and the new state of this row. So you would see, okay, email change from, I don't know, Chris at example.com to Chris Jenkins at example.com. So you would see all the new value. Again, depending on how you have configured your transaction lock, maybe, you know, you want to save some space there so you don't have the old value, but that's the challenge list of it.
00:14:03
Speaker
So that's the actual structure, it resembles your table structure. And then there's a large chunk of metadata also part of it. So like timestamps, what kind of event position in the transaction log, the offset in case of MySQL, for instance, it can also tell you what was the query, which was triggering this change. So you could identify that. So all kinds of analytics as well.
00:14:29
Speaker
Yes, right. And how your database is used. Yeah, exactly. Yeah, you could. That's interesting. I didn't know that. Okay.
Debezium's Integration with Various Streaming Systems
00:14:36
Speaker
And what's the, in that format, if I've got like 10 columns in my table and I just update the email address, do I get the whole of the previous row and the whole of the new row? And do I get something saying, but it was just this column that changed?
00:14:51
Speaker
So you would by default get the whole row for most of the connectors. So for instance, in case of MongoDB and I think Cassandra as well, then your action just would get any modified columns for, but otherwise you would get the whole, at least the whole new row.
00:15:10
Speaker
For instance, people typically don't expose the entire old role, but of course it would take too much state or space in the transaction block. Okay. But it simply tells which of those fields in the new row changed, or do I have to infer that?
00:15:25
Speaker
You would infer that just by comparing old, old and new value. Yes. Okay. Okay. Okay. So I think actually it makes sense. Um, again, I feel like you'll trigger me, but I think it might job in a way.
00:15:41
Speaker
I feel it makes sense to have this as a default because not every sync system supports what we would call partial updates, right? So I guess you have to work from the assumption, okay, I only can write like entire records on the sync site, at least as a safe default. And then of course, if you have a smarter system, it will be able to just apply partial updates. But I feel
00:16:03
Speaker
in the grand scheme of things, giving you the entire event, this kind of makes sense. Also, if you just look at it from the point of view of having data just in Kafka, and then you would be able just to go to the latest change event for a given record, and this would tell you, okay, this is the state of the world at this particular point in time for this record.
00:16:28
Speaker
That raises a much bigger issue I'm going to get to in a minute. But you've just mentioned Kafka for the second time. Does using Debezium imply you'll be using Kafka?
00:16:38
Speaker
No, that's a good question. I guess I should have mentioned that. And so it does not require it. Most of the times people use the Beezim with Apache Kafka as the messaging fabric or messaging layer. So, you know, they can connect it to all kinds of sync systems. And this is also historically speaking, how the Beezim started. So essentially it's a family. Anyway, it's more than that, but at the core, it's a family of Kafka connect source.
00:17:05
Speaker
So they are about taking data from the external system, all those databases into Kafka. But then, sorry, we realized over time, or we just got the feedback. So people liked the functionality, which the museum provided. So all the CDC goodness, but then not everybody necessarily is on Kafka, right? So maybe they're using something like AWS Kinesis or Google Cloud Pub-Sub.
00:17:31
Speaker
as managed infrastructures, or maybe they use Apache Piles, or nuts, all kinds of things, Reddit streams. And they still would like to use the Debezium change stream capabilities. And this led essentially to the introduction of what we call Debezium Server.
00:17:48
Speaker
And this one is, you know, a runtime, you could, in the oval architecture, you could compare it to Kafka Connect. So it's the runtime for the connectors, but now this DBSO server project gives you connectivity, this everything but Kafka. So you can use this to send events to all the things I mentioned and quite a few more.
00:18:07
Speaker
So that's the second way for using DBSoom. And then there's even the third one, and this is about embedding it as a library into your Java application. So if you're a Java developer or a developer on the JVM, and you have very bespoke needs, then you can use this DBSoom engine component and essentially just register a callback method, and then this will be invoked whenever a change event comes in. And you can do
00:18:32
Speaker
whatever you want to do, right? And this is typically what integrators of DBZM2 like Flink CDC is a great example. So they take the DBZM embedded engine, then they expose this straight into a Flink pipeline or a Flink job without having to go through something like Kafka. Okay. So if I was in that situation where I'm trying to invalidate a Java cache based on changes in the database, would I probably just go that route?
00:18:59
Speaker
Right. I think, again, I guess it depends, but let's say you have an in-memory cache. Yeah, I guess that would be the simplest way of doing that, right? I mean, now that is the question, okay, so application clustered and is it like, you know, scaled out, then do you have this invalidation callback in each of your nodes or do you need to sync this invalidation message yourself? But yeah, that would be one way of doing that.
00:19:24
Speaker
You can always tell when you're speaking to an expert programmer because they start with, it depends. I feel like I should prefix every second answer. We'll come take that as red. That seems like an awfully huge amount of work.
00:19:41
Speaker
Right. Go for that project. And one of the things I instantly wonder is like, are you are you doing things like you've got a connector to Cassandra that has to be written in Java and you've got another connector for Postgres that has to be written in C and how many languages are involved?
00:19:59
Speaker
So it's mostly really Java. So just by virtue of being based on Kafka Connect, the connectors are implemented in Java. There's a little C component indeed for Postgres, because in Postgres the way it works there, there is this notion of logical decoding and logical decoding plugins, which essentially control the format
00:20:23
Speaker
in which you receive those change events from Postgres over this remote connection. And so for the longest time, and it's still the case, the BSM had its own logical decoding plugin, which emits those change events in the Google protocol buffer format, so an efficient binary representation.
00:20:42
Speaker
So that's the little C code there is. By now, I also should say, as of Postgres 10, there's a default plugin PG output, which is, you know, coming with Postgres, it's available in all the cloud providers, like RDS, for instance. So now I would recommend people to use this logical decoding plugin PG output. But yes, still, there is this C-based component called decoder buffs in the museum.
00:21:07
Speaker
OK, so mostly Java with a tiny bit of C. Exactly right. That sounds... I'm just trying to get a... Well, actually, I should also type script because there's the Bezium UI as well, so there's also a web-based UI for it, so you can set up things there. This is a fair chunk of type script. OK, what's the UI do?
00:21:31
Speaker
Essentially, it's about giving you a configuration and management experience. So instead of, as you would do it with Kafka Connect, configuring everything with JSON, this gives you, I would say, a rather intuitive UI. And it guides you through this process of setting up those connectors. One example, for instance, where it's really useful is there's this notion of
00:21:57
Speaker
inclusion and exclusion filters which describe which kinds of tables or schemas in your database you actually would like to capture because in the transaction lock everything is contained right so all the changes to all the tables are there but maybe you are just interested in
00:22:13
Speaker
I know capturing changes out of this one table out of your 10 tables or out from changes out of the 10 tables out of your 100 tables. And the way you configure this is with filters, which is essentially regular expressions. And now one common
00:22:29
Speaker
thing we, uh, you know, got asked about again and again, and say, Hey, I don't receive any changes from DB zoom. What's going on? But I know I do changes in my, in my database, but nothing comes out of it. So what's, what's going on? And very often the answer was, well, your filters were just wrong. So you just excluded all the tables. So naturally the connectors wouldn't, uh, emit any, any change event. Everything would be discarded. Now with this UI.
00:22:53
Speaker
It gives you actually a preview, so you specify your filter expressions, and then it will tell you, OK, based on this configuration, this is the set of tables you actually will capture, and this is a huge usability improvement. OK. That makes me think, because presumably the standard change log has all the changes to the internal tables, too. Yes. Right. Yeah, so what's the question? So generally, I guess you don't want to hook
00:23:27
Speaker
that? Yes. So most of the times we would like to ignore any internal tables and then the DB connectors, they just do that by themselves. So I know for Postgres, sorry for Oracle, there's specific schemas, I guess, even which you would like to just ignore.
00:23:47
Speaker
So they would never be sent out to the user. As you were asking about DDL events, again, it depends a little bit on the connect of what it does and what the database supports. So not all of them, unfortunately, give you an event if there was a DDL change, like an alter table or create table. So for instance, in case of
00:24:13
Speaker
Well, actually Postgres, it also tells us that, so it is like a special kind of event. But yeah, it depends a little bit on the connectors. Okay. I'm just thinking, I'm wondering about setting all this up, right? Right.
00:24:31
Speaker
So you're saying that if I'm connecting to, for example, let's pick MySQL, right?
Complexity of Custom Change Data Capture Systems
00:24:37
Speaker
What you're essentially doing is hooking into MySQL's changelog API that they have for their replicas. Well, there's actually a library which we use for that in case of MySQL. It's called the bin-log client. So the transaction log in MySQL is called the bin-log or binary log. And there's a bin-log client library which DBZM uses, yes.
00:24:59
Speaker
I'm just thinking classic. I mean, we've all fallen victim to not invented here syndrome. If I now know that the trick to the Bezium is that it's kind of behaving like a special replica and the master database, doesn't there be different? Why would I not roll my own and get my own preferred format for the data?
00:25:18
Speaker
All right. I mean, you totally could, if you don't have anything else to do, I guess you could do it. I mean, there's a few things to consider. So A, it's very easy to underestimate the effort, I would say. So, you
00:25:33
Speaker
There's things like there's a very rich type system, for instance, in databases like Postgres. So you would have to implement support for all those potential types, which there are in terms of data types. There is this entire notion of snapshotting, which we didn't discuss at all so far. So if I go to my transaction logs,
00:25:51
Speaker
I won't have the transaction logs from last year, right? Or last month, maybe, because the database, as you mentioned, it uses those for transaction recovery and replication. So, at some point, it will discard all the chunks of the transaction log because otherwise, your database would just run out of disk space.
00:26:10
Speaker
I've got every change, but I haven't got any starting state to apply. Exactly right. And let's say you would like to put this data into Snowflake so you can do analytics. You want to start with a complete view of your data, right? You don't want just to have the data which happens to be changing right now. You would like typically to start with an entire backfill of what's there. And so you would have to implement this by yourself.
00:26:35
Speaker
Then, you know, there's tons of corner cases in terms of failovers and restarts and making sure you never miss any kinds of events. Again, coming back to Postgres, you need to make sure like,
00:26:50
Speaker
you acknowledge the transaction lock you have consumed, otherwise the database would hold onto it forever. There's corner cases there, which you can get wrong. So you could sort it all out. I mean, it's software, it's doable, but I would really not recommend it. And the other thing is, of course, you would miss out then on this notion of the defacto change event format, right? So you would come up with your own
00:27:16
Speaker
event format, but then all the sync connectors which supported now, they wouldn't support your format, right? So you would have to interest that concern as well. Yeah. Yeah. I can see that'd be a lot of... I'm just trying to get a sense of the scope of a project like this. And I think the moment you mentioned implementing the whole type universe. Right.
00:27:38
Speaker
Right. And then, you know, there are so many things also to consider, like, A, to make this work efficiently, monitoring, logging. I mean, there's, you know, stories to tell about all of those things. So, I mean, yes, doing a simple POC, which, you know, would print out some change events you get from a database on the command line.
00:28:00
Speaker
That would be quite easy, but then really packaging it into a production worthy system with the complete features that's a huge task. I mean, the team has been working on this for many years and it's not a small team as well. How big is it?
00:28:18
Speaker
Yeah. Let me see. So when I took over the project, it was essentially two guys working it. So it was myself and another Reddit engineer. So by the way, I should mention it. So Reddit is the main sponsor of the project. So, you know, the core engineers, they are employed by Reddit, but then there's also other companies working on connectors under the Debezium umbrella. So folks like Stripe or Google who did their connector for Cloud Spanner, they do this as part of the Debezium
00:28:48
Speaker
community or Insta cluster, for instance, they work on the Cassandra connector nowadays. So it's quite a few companies running behind it. But let's say in the core team, I would say it's like seven or eight engineers by now, something like that.
00:29:05
Speaker
I'm curious. I mean, this is kind of an aside, but since you mentioned it, I want to know why Stripe? Why a payments service? That's a good question. I don't know to be honest what their use case is, but if I had to guess, it would be feeding data to their analytics store. That's definitely a common thing.
00:29:27
Speaker
I know organizations like Sugar CRM, they work on their CRM tool. They have SQL Server and they have tons of CDC use cases. So for instance, they made a huge contribution towards the museum.
00:29:44
Speaker
So it could enable their deployment of hundreds of Debezum connectors. And for instance, just by means of this scale of use case, the core team, you know, we don't really have the capability to test with a hundred databases and just getting this feedback and then also the contributions from those companies is like super valuable. Okay. Yeah. I can see that once you're, I can see their use case and I can see they're large enough, their use case to become dedicated to an entire person supporting it.
00:30:13
Speaker
Yes. And then they have this weird edge case. So yes, if we run the BZM on 500 databases and then we do this particular thing, then it doesn't work. And okay. Yeah. So let's fix it. But there's no way in the world for how the team could get on top of that by just by themselves. Yeah. 500 machines is becoming tenable, but it's still a lot to ask for an open source project. Exactly right. Yes. Okay. Okay. In that case, let's get
00:30:40
Speaker
Let's get into the idea of snapshotting, which you mentioned, because I could see my first use case for Debezium would be capturing things to send email notifications, right? And that I just need to know what's happened. But the analytics case, yeah, I do need to worry about getting the whole world, which I can't get out of the changelog. How do you solve that?
00:31:03
Speaker
Right. So the way it essentially works is in the simplest case, it's just, you know, first of all, the connector will upon startup, it will get hold of the transaction lock and it will just, um, you know, get a reference to the current position in the transaction lock. Um, that's called the offset. And then it will start a snapshot transaction.
00:31:28
Speaker
So to make sure we scan all the tables, which are there. I mean, again, coming back to your filter configuration, right? So it would scan through all the tables you want to capture. You also can customize this. So let's say you work with a notion of logically
00:31:44
Speaker
deleting your data or soft deleting your data. So maybe you just want to export like your non-deleted data into your analytics store. So you could exclude all those soft deleted records. So you can do that. And then once this snapshot process has completed, then it would start to read the transaction log from this offset, which you memorized before.
Innovations in Snapshot and Streaming Methods
00:32:07
Speaker
So essentially this is to make sure that there is nothing lost
00:32:11
Speaker
in terms of changes which happened while you did this snapshot. Okay, so you're kind of pinning the transaction log, going to another window, logically, and saying select staff from users, and then coming back. Okay. Can those two run asynchronously? It's not blocking anything.
00:32:37
Speaker
Yes. Right. So this isn't blocking anything. So for instance, coming back to Postgres, I feel like I'm seeing Postgres all the time. So let me say Postgres another time. We're definitely guilty on developer voices of saying Postgres to mean generic good quality relational database.
00:32:53
Speaker
Yes, exactly. And they just get so many things right. And also in this case, so they have this notion of what they call exporting snapshots. So essentially, the way it works, you create what's called the replication slot. So this is essentially a handle to the transaction lock. And this is a pointer to how far you have read the transaction lock. So you start this replication slot,
00:33:18
Speaker
This gives you back its starting position, its starting offset. Now you can, any optioner, you also can say, I would like to export this offset as a snapshot, which that means you can start another transaction at this particular offset. And you could even do multiple ones. So let's say Dibism
00:33:40
Speaker
Not quite sure whether they do it by now, showing a bit that I have stepped down from the project. But at least in theory, and it totally should be the case, you could have multiple snapshot transactions all at the same offset. So you could read multiple tables or even multiple chunks concurrently just to speed up things. And then once you're done, you would start to consume from this.
00:34:02
Speaker
replication slot and you would just be sure you haven't missed anything. And also, of course, any new changes can come in during the time. So yes, that would all be async. Okay. So you're doing like almost like a point in time query that you can then synchronize on both sides. Exactly. We should link to this in the show notes, but I saw you had a talk at Current, which I watched and was very good about how that process is now being overhauled.
00:34:32
Speaker
Right. I'm going to say a bit about that. Oh, yes. Let's definitely talk about this notion of incremental snapshotting because I mean, so Debezium always had this notion of let's call the initial snapshotting as I roughly described it. And then there were a few issues with that. So for instance, you couldn't change easily those
00:34:53
Speaker
filter configuration. So let's say, you know, you have set up your table filters in a way you capture 10 out of your 100 tables. And now there's, you realize, oh, actually there's another 11th table, which I also would like to capture.
00:35:07
Speaker
So you would change the filter configuration. But then in the traditional snapshotting approach, there was no way for you to actually also capture now this 11th table and then continue to stream the changes for all the 11 tables. So this was something which just wasn't doable very well. You mentioned, yes, the transaction log gets pinned during this initial snapshot. And that's exactly right because, well, as we don't consume from the transaction log while this snapshot is running,
00:35:36
Speaker
It would use more and more disk space while this traditional snapshot is being executed. So you're saying freeze that point in the transaction log while I select star from 10 tables. Exactly. And that can get big. Yeah. And that can be big. I mean, the snapshot can run for multiple hours or many hours.
00:35:53
Speaker
if you have a large amount of data. Um, so that's, that's maybe not desirable. Then, you know, sometimes people just want to re snapshot a specific table. I know maybe they've like deleted their Kafka topic, accidentally, yeah, should happen. Or maybe they just want to like backfill a new sync system. They stand up. Um, so they would like to re snapshot specific set of tables. And this wasn't something which was a doable. And all this is now supported with the notion of incremental snapshotting.
00:36:23
Speaker
I'm not sure whether we should go or can go into details. But essentially, it interleaves the events you get from the snapshotting phase and the stream reading phase. So they happen concurrently. So all those problems are kind of solved. Also, you can trigger them interactively. So you can interactively say,
00:36:47
Speaker
I want to capture this table customers and it will go and do this while it continues to read from the transaction log. So, yeah. So it's still very active. I mean, you could think it was just, okay, read that file and turn it into the format. Right. And then spread that out to many databases, but you're still actively improving the processing.
00:37:09
Speaker
Oh, yeah, absolutely. I mean, there's so much things which still can and are being done. Like, I mean, new connectors are added all the time. This snapshotting process is being reworked. I'm sure there's tons of things which can be improved still in terms of performance, like, you know, really making sure everything happens in parallel, all this kind of stuff. And then, of course, well, the Bezium
00:37:33
Speaker
traditionally speaking, it just concerned itself with getting data out of a database into something like Kafka or Kinesis.
Enhancements with the New JDBC Sync Connector
00:37:42
Speaker
But then, of course, you don't do this for just the sake of putting your data into Kafka. You want to do something with that. So you need to take this data
00:37:50
Speaker
and put it elsewhere as we discussed. And I mean, it just wasn't, and I would say it still isn't really the scope of the project, but still people came to the Dubisian community and said, okay, so how can we take now this data and put it into another database? And this is why they just recently added a JDBC sync connector. So now there is also a sync connector, which allows you to take the data from Kafka,
00:38:14
Speaker
and put it into all kinds of databases which you can talk to via JPC. So like all of them, I guess. And this is very nice in terms of usability, because things like this to be a UI, they can give you a complete end-to-end experience. Also, those connectors, they work very closely with each other. So I don't know if you were to use
00:38:35
Speaker
other, I should say, Confluent has a GBC connector, right? And there's other GBC sync connectors. You need to make sure they are configured the correct way so that they work together. Whereas now here, as the museum is providing source and sync connector, all this gets very seamless. It can reason about schema changes and apply them to a sync database. So having this sync story, I think this is a huge value to the project. And this by itself creates lots of new
00:39:05
Speaker
things to work on. Even if it were exactly the same quality as all the others, it comes with the same assumptions on the right end as the read end, and that's going to make things easier. Exactly, yes. Is it that that then gets into what you mentioned earlier, where we're saying, you might want to do replicating Oracle to MySQL?
00:39:31
Speaker
Right. Exactly. It could be used for that scenario.
00:39:37
Speaker
I could hear you holding your breath as you answer that question. Do you think, is it a sane thing to do? If that was my task, I wanted to get Oracle into my SQL. Would the BCM be a good choice? Yeah, definitely. I mean, people do it, right? So maybe, I mean, I used to work in environments where you just wanted to have a database with the, I don't know, the data from yesterday. So you could run some ad hoc queries or develop
00:40:03
Speaker
like your new functionality against it and, you know, get like execution plans and all this kind of stuff, not from the live database, but from this, uh, other database. And yeah, sure. I mean, maybe you don't want to use like this super expensive database for that purpose. So you, you another one. Um, I could see that and people definitely do that. Okay. Okay. Cool.
00:40:24
Speaker
That sort of relates to one of the reasons I started thinking about, let's do an episode on Debezium. We did, as I mentioned, we did this episode on real-time data streaming and why we care. Okay, Thomas Camp. And we were considering, like, if you've got an existing batch-based system, stuff feeds into, let's say, MySQL, through to your website, you know, standard architecture. And you want to say,
00:40:50
Speaker
I would like to get into real-time eventing. I can see some advantages to it, but change is expensive and new projects are risky. What's the minimum step I could take? Would that be just turning your database back into a real-time stream of events using Debezium? Is that a sane first step into the real-time world?
00:41:15
Speaker
So I'm not quite sure whether I fully understand what you ask. Do you have a batch-based architecture? Just imagine a traditional stack where there is MySQL at the heart of it, and you're doing transactional updates to stuff, and you would like to start doing real-time notifications to clients or something. But you don't want to overhaul the entire thing and say, okay, let's have an 18-month project to do event systems that's going to actually take three years that's going to fail.
00:41:43
Speaker
Right. It's the bezium, your minimum only app into the world of real time. Right. No, I think, yes, I think that's a fair assessment. And people very commonly do that. And actually often there is this scenario. Yes, they don't want to touch the source code of the application. Sometimes they cannot even touch the source code of the application. I mean, I remember at the past job, we had this war or jar running on our application server and nobody had the source code any longer.
00:42:13
Speaker
So it was running just fine, but you couldn't change it because the source code was gone. So, you know, there's definitely this kind of world and sometimes people just shy away from it. I mean, of course you need to be careful about it, right? Because if you are in that sort of situation, typically your data model also is maybe in a bit of a non-ideal state. And now the question is, do you want to propagate that to like your new world, right? So maybe I know
00:42:42
Speaker
maybe you want to use this for migrating off of a large monolithic application to microservices. That's definitely a common use case. But then maybe you want to shield those new microservices from all this weird legacy modeling. Let's say you have a database and there's like column names can only be like 30 characters long. I mean, first, there's such a database. So, you know,
00:43:06
Speaker
you want to have some sort of transformation layer in between oftentimes. And this is what people typically do with Kafka Connect for simple stuff, or maybe Kafka Streams or Apache Flink. So, you know, they can give a nice and clean view of the data. So that's the, you know, maybe they would like to limit stuff. They would like to rename things or change the types. I know maybe everything is a string in your old database. You would like to expose proper, yes.
00:43:39
Speaker
So having some sort of a transformation there between like your old legacy world and the new world often makes sense. And that's the thing I would advise people to consider, not just to use CDC to expose the weird oddities, but make sure, you know, to put some consideration into this, to provide properly crafted data contracts.
00:44:03
Speaker
Yeah, data contracts is the thing, isn't it? I guess what you're saying is that... I'm looking at a blog post button right now.
Transforming Batch Systems with Debezium
00:44:10
Speaker
You are a prolific blogger and we will link to your blog first. Oh yeah, that's amazing. I guess what you're saying is that Debezium is the same way to turn a batch system into a series of real-time events, but then there's also the semantic journey. A series of JSON packets that say insert an update is not the same as an event system.
00:44:33
Speaker
Yes, exactly. There are so many things to say about it. First of all, there's of course the question about granularity of events. What CDC does, it's about table level or row level events. But maybe, I don't know, you were to model your application in terms of domain-driven design, you have something like aggregates, and in a relational database, they would be persisted in multiple
00:45:00
Speaker
Tables right so let's say just a simple example a purchase order and it has you know multiple order lines so in your relational database. This was would be stored in two tables one for the headers and one for the lines and then does a like one to n relationship between the two.
00:45:18
Speaker
So now you would get those table level events, but maybe what you actually would like to have is an entire event, which describes the entire purchase order. And now the question, of course, is how do you achieve this task? And again, stream processing can help with that. So you could use something like Flink SQL for joining those raw streams and exposing
00:45:39
Speaker
a higher level stream, which then has a nested data structure. There are things like the outbox pattern, which can help with that, but which then also would require to modify your application and actually emit those outbox events. So maybe in a legacy kind of use case, it's not desirable. The pain's got to go somewhere. Exactly right. That does raise the question of how Debezium handles transactions. If I've got that order header and the individual order rows,
00:46:08
Speaker
am I going to get that as one transaction in the Debezium log output, whatever you call it? Right. Great question. I love that. It's kind of my favorite thing. So no, by default, what you get right now is you get those events. One
00:46:24
Speaker
by one. So let's say you have your one purchase order and it has a three order lines. You would get those four events, typically by the way in Kafka on separate topics, because there's a correlation between the table name and the topic name by default. Um, and also it wouldn't have like strong ordering guarantees. So it could happen that you receive one of the order line events. Then you consume the order header event, and then you get the other two order line events.
00:46:51
Speaker
So, you know, you get them one by one, but still you can correlate them. And this is possible because each of the events contain the transaction ID where they are sourced from. So you will see for all of your four events, okay, they are originating from transaction one, two, three.
00:47:08
Speaker
And now what the Bism also can give you is another topic which tells you about the transactions which have been executed. So on that separate topic, you would receive a message, okay, or an event message, an event, sorry, transaction
00:47:25
Speaker
123 has started, and then you would receive another event which tells you, okay, transaction 123 has committed. And by the way, in this transaction, there is one event for other headers and five events, sorry, three events for other lines.
00:47:40
Speaker
So you have this information and now you could use all this to implement some sort of offering logic, either in your sync system. So you could put the data into some sort of staging area and go, you know, and essentially just propagate the events once you know, okay, I've received all those four events, which I expect from that transaction. Or again, you could use something like Flink to implement this in a streaming fashion.
00:48:04
Speaker
Yeah, so there's a bit of legwork and almost detective work to reassemble what those atoms of change mean semantically. Yes, exactly. I mean, the problem is, and people sometimes ask about it, so hey, can't we get a single event for an entire transaction from Dubisium? And the answer is yes, in theory, this could be possible, but transactions can be arbitrarily large, right? So they
00:48:29
Speaker
You could do a bulk delete and delete 50 million records from your database in a single transaction. And there's just no way we could give you any single event which describes 50 million records. It would be just way too large. Also, if we just go back to the simple order example, there's no guarantee they've got one transaction per order. It could be batched up with lots of other things that happened in that transaction. Yes, exactly right. Totally. So you would have to reason about it as well.
00:48:58
Speaker
Why don't we, as a slight aside, because I don't know that much about Flink and I know it's now part of your day job at Decodable, give me the overview of how you might recapture that order event in Flink. Yeah, that's a good question. How would it work? Well, I would essentially...
00:49:19
Speaker
First of all, listen to this transaction topic, and then based on that, I would implement some sort of buffering logic so I can have state stores in Flink. First of all, I would
00:49:34
Speaker
probably wouldn't be able to do it right now in Flink SQL, right? So I would have to do some Java programming. But so I could implement some buffering logic, which essentially compares those events and sees, OK, this event is for a specific transaction. I haven't received all of them yet. So I would need to put this into a state store. And then more events come in. And then you would be able at some point to see, OK, I've gotten all of them. So now I can actually go and emit
00:50:05
Speaker
whatever the aggregated structure from that, that should be.
00:50:08
Speaker
Okay. So you're writing a bit of Java that has to recreate that logic of grouping them back up. Exactly right. And then of course, if we wanted to abstract things a little bit, we could say, you know, we could think about having some sort of specific query operator in Flink SQL, which would tell, I don't know, select transactionally from whatever stuff I do. And then, you know, some, let's say that was a managed platform, like the codable, which could do that. And this could be a functionality which we could provide.
00:50:38
Speaker
Okay. Do I get a slice of the royalties for giving me that idea? Well, this idea might have been around before. I have to keep working on that then. No, definitely. I mean, I can tell you this is absolutely a very common need and scenario, because just to make things tangible, people oftentimes are in this sort of scenario when they want to put data into something like elastic search, because what happens there is,
00:51:08
Speaker
coming back to this purchase order example, you would take the entire data from this purchase order and all its lines and we would like to put it into a single document in an elastic search index, right? So you don't want to do query time joining, if you even can, I don't know. So you would like to have an entire order and all its associated data in a single document.
00:51:28
Speaker
And now the question is, if you implement this as is with Flink or Flink SQL, well, data would come in on those two source streams, orders and order lines, and then this join would run, but it would run whenever a new event is coming in. So essentially you would materialize this join when you have a single order line.
00:51:49
Speaker
And you would put this into Elasticsearch. The next line comes in, so you would materialize it again, and you would have an order with two out of three order lines. So you would have like a partially complete view on the world. Now, let's say the user comes at this point in time and they go to Elasticsearch and do a query.
00:52:07
Speaker
you know depending on the specifics of timing and so on they would get like incomplete search results right so ideally what those people want to do is they would only write to elastic search once they know okay now this document actually is complete and this is you know the complete view of this data
00:52:25
Speaker
So this is a very common requirement. And yes, using those transaction markers, which we have in the Bezum, this is a way for, for solving it. And again, by the way, the Stripe guys, they spoke at Flink Forward about how they implemented that. So it's already happening. Okay. I'll have to link to that in the show now and stick that one out. Yeah, definitely. It's again, we coming back to the old chestnuts of transaction boundaries, normalization and denormalization.
00:52:53
Speaker
Yes, I mean, it's the same, same, but different, right? Exactly. So I think we should probably end on, what tends to be the simplest use case that people start with, and what's the most ambitious or crazy use case you've seen? Okay. And some real world things to take this out. Right. I mean, most simple
00:53:20
Speaker
Definitely, I would say feeding data to OLAP or OLAP stores or data warehouses. I feel like that's the most common thing people would do. By the way, also, we didn't really touch on latency and why it is so interesting to do this all in real time. So I know of people who take the data from the operational MySQL clusters.
00:53:42
Speaker
and put them into Google BigQuery so they can do analytics there and they have an end-to-end latency of less than two seconds. So it's really like kind of instantaneous. And you know, this opens up many interesting possibilities, right? So to be able to go to your data warehouse and have a live view on the data so you could drive dashboards off of that and all this kind of stuff.
00:54:03
Speaker
I would say that's the most common and I would say also simplest use case. Well, in terms of most complex, that's an interesting one. We have an interesting one at Decodable where we actually use it for propagating changes from our control plane database to our data planes. When people create new things in Decodable, like a new connection, we would like to react to that so we can then materialize the resources in our data plane.
00:54:32
Speaker
So that's, I guess, a bit more on the advanced. Because you're doing like a software as a service, Kubernetes. Exactly. Yes. You submit a request, it turns into running. It should be like reliable and all this kind of stuff. So I can use it to be easy for that.
00:54:48
Speaker
But then also there's this entire notion of, yes, we touched briefly on it, like migrating from a legacy application to maybe microservices or a new world of applications. I think this is more advanced because it has a strong notion of stream processing and like massaging the data, denormalizing it, putting it into new shapes. Sometimes people choose to use just something like Kafka Streams or Flynk SQL
00:55:12
Speaker
to continuously update materialized views. So, I don't know, maybe they want to have the revenue per category in specific time windows, and this should be updated in a continuous fashion. I guess it's also a bit more on the advanced side of things.
00:55:32
Speaker
Okay. I'm going to cheat and break my own rules. I'm allowed to, because I know you did talk about this as well, and you hinted about it, and we really shouldn't leave without discussing this. Going from a legacy system to a microservices system. Yes.
00:55:49
Speaker
I can begin to see how that fits in, but you did a whole talk about it. I can't see. Give me a teaser on how the Bezium fits into where we're migrating to
Gradual Migration from Monoliths to Microservices with Debezium
00:56:01
Speaker
microservices. Oh, okay. Yeah. Yeah. I'm so glad you asked. So there's an interesting pattern there, which is called this triangular fake pattern.
00:56:11
Speaker
Strangler fig. Right. So the idea is right. The idea is, you know, so there's this Strangler fig plant and it kind of wraps and strangles and it's tree, it's host tree. At some point this old tree dies off. It doesn't make sense, I know. But so it's, you know, it strangles around that old thing and grows from there. And that's kind of the picture for that migration approach. So the idea is
00:56:40
Speaker
you don't want to do like a huge big bang migration where you as you say you work on this for three years and it takes another three years and then it fails rather you want to go gradually and you know you want to reduce the risk and you want to go step by step so the idea is you take
00:56:55
Speaker
one chunk of your functionality maybe you start with a single view in your web application i don't know like the order history review so that's the thing you want to start from a new you know shiny. Microservice written in closure i know.
00:57:12
Speaker
something like that. And so you start to extract this functionality from the models into your new microservice. It's its own thing. And of course, this should now have its own state storage, should have its own data
00:57:27
Speaker
So you don't really want to share data stores across service boundaries. So it would have its own data store. And then you would use CDC to propagate all the rights, which at this point still go to the old model. So you would go propagate all the changes from there over to this new database of the new microservice. And then in front of everything, you would have a routing component, which essentially says, OK,
00:57:55
Speaker
This is a read request for the order history review, so I sent this over to the microservice and let it serve from there, and everything else should still be served from the monolith. All the other reads, all the writes for that part of the domain, they still go to the monolith. But I have this read functionality, which already is served from the new service. And you use CDC to gradually
00:58:18
Speaker
extract data from the old database to the new one, keep them in sync, and then you could, you would keep going, right? So you would extract more functionality. At some point you would say, okay, everything which is related to purchase orders, this should now be owned by this new Closure service, or I don't know, Elm, or Zig, what's your favorite language these days? Let's say Haskell. We don't mention Haskell. All right, so this is new Haskell service.
00:58:43
Speaker
Does anybody build web service in Haskell? I don't know. Okay. So you have this new Haskell service for purchase orders. And, you know, this receives now reads and writes for that part of the domain. But then it could still be, you have functionality in the old, uh, application context, which needs to know about purchase orders, right? So essentially now you could say, okay, I also propagate changes.
00:59:07
Speaker
from purchase orders back to the old world, so we can, you know, I don't know, reference the data there and so on. And so you would kind of do it like a bidirectional, but you would just have to make sure, you know, that for each part of your domain, for each table, essentially, or each aggregate, I don't know, there's one system which is in charge, right? So what you should want, what you should avoid is like having rights to, I don't know, customers or purchase orders
00:59:33
Speaker
on both sides of this architecture because then you would end up like propagating them forth and back in cycles and it would be kind of weird. Yeah. Yeah. And with a reconciling multiple master databases, it's always horrible. Exactly. So you shouldn't do that. But otherwise Haskell for your website was discovered.
00:59:53
Speaker
episode on that because it's very nice. Oh yeah, I will watch it. Maybe I do a coding walkthrough or something. But okay, we'll link to the full talk that gives me the picture and I like incremental migrations. Big Bang projects are often disastrous. Yes, absolutely. It's a nightmare, right?
01:00:11
Speaker
approach, you can go at your own pace, you can pause. Maybe at some point you realize, okay, I'm happy if I've just extracted those three things into their own services and the rest can remain in the old system so you can do that. So it's really, it's about minimizing risk, right? So you don't want to change everything at once and then nothing would work maybe. So it's about risk management. Yeah. And is that the kind of way you'd
01:00:38
Speaker
Is that the best way to get into using Debezium in a system? Would you say like, let's just choose one small piece and pick it off? I would say so. Yeah. I mean, I'm always a big fan of just taking something, be it Debezium or whatever technology and apply it for one part of your problem space and see how it works, get a feel for it. And then you'll realize how amazing it is. And then you use it for everything.
01:01:06
Speaker
I think it's the perfect tagline to end the episode. You know, thanks very much for joining us. Absolutely. It was a great pleasure, Chris. Thanks so much for having me.
01:01:17
Speaker
Thank you, Gunnar. Now we dropped quite a few references in that discussion, so I'll compile them all together and you'll find the links in the show notes as usual. I'm going to add in another link there. We kind of touched on in that discussion why you can't just use SQL polling to get real-time data out of a database. You do have to look at the transaction log if you don't want to lose data.
01:01:40
Speaker
If you'd like some more detail on that, there's a really good talk by a former guest of ours, Francesco Tissio, and he can tell you some horror stories from just relying on JDBC, so check that out if you want the full argument.
01:01:55
Speaker
as well as that sort of back reference, I'll give you a forward reference as well. We need to have an episode on Apache Flink, and it is in the pipeline. So if you want to catch that, now's a great time to click subscribe and notify. And if you've enjoyed this episode, please take a moment to like it, rate it, maybe share it with a friend, you know where to find the buttons, so I leave you to it. Until next week, I've been your host, Chris Jenkins. This has been Developer Voices with Gunnar Morley.
01:02:25
Speaker
Thanks for listening.