Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
The State & Future of Apache Kafka (with Anatoly Zelenin) image

The State & Future of Apache Kafka (with Anatoly Zelenin)

Developer Voices
Avatar
0 Playsin 21 hours

I’m joined this week by one of the authors of Apache Kafka In Action, to take a look at the state of Kafka, event systems & stream-processing technology. It’s an approach (and a whole market) that’s had at least a decade to mature, so how has it done? What does Kafka offer to developers and businesses, and which parts do they actually care about? What have streaming data systems promised and what have they actually delivered? What’s still left to build?

Apache Kafka in Action: https://www.manning.com/books/apache-kafka-in-action

Pat Helland, Data on the Inside vs Data on the Outside: https://queue.acm.org/detail.cfm?id=3415014

Out of the Tar Pit: https://curtclifton.net/papers/MoseleyMarks06a.pdf

Martin Kleppmann, Turning the Database Inside-Out: https://martin.kleppmann.com/2015/11/05/database-inside-out-at-oredev.html

Data Mesh by Zhamak Dehghani: https://www.amazon.co.uk/Data-Mesh-Delivering-Data-Driven-Value/dp/1492092398

Quix Streams: https://github.com/quixio/quix-streams

XTDB: https://xtdb.com/

Support Developer Voices on Patreon: https://patreon.com/DeveloperVoices

Support Developer Voices on YouTube: https://www.youtube.com/@developervoices/join

Anatoly’s Website: https://zelenin.de/

Kris on Mastodon: http://mastodon.social/@krisajenkins

Kris on LinkedIn: https://www.linkedin.com/in/krisjenkins/

Kris on Twitter: https://twitter.com/krisajenkins

Recommended
Transcript

Introduction to Log-Centric Databases

00:00:00
Speaker
Back in 2015, guy called Martin Kleppman gave a brilliant conference talk called Turning the Database Inside Out. And the core idea of that talk, if I can squeeze 40 minutes into 15 seconds, was sooner or later, every database Oracle, Postgres, Mongo, whatever, you name it.
00:00:20
Speaker
Sooner or later, every database's architecture evolves to the point where there's a write-ahead log at the heart of it. And it uses that log for replication, durability, building indexes, whole bunch of stuff.
00:00:33
Speaker
At the heart of any well-architected database, you're going to find an append-only log. So, asks Martin, what if instead of evolving databases towards the architecture, what if we started with a log and built the database up from there?
00:00:50
Speaker
And it's a brilliant talk because he very eloquently manages to crystallize the whole argument for things like Apache Kafka, event-based systems, event-based systems being equivalent to and more powerful than traditional approaches,
00:01:07
Speaker
and for the whole real-time streaming data world. It's 2015 and he really puts a stake in the ground. But that was 10 years ago. It's been 10 years since he gave that talk.
00:01:19
Speaker
More than 10 years since Kafka became an open-source Apache project. That's enough time. You have to wonder, has his vision come true? Have we as an industry turned the database inside out and built up a glorious new tech stack from the solid, simple foundation of a log file?
00:01:38
Speaker
Is the event-based streaming data world there yet? How well did it work out in practice?

Interview with Anatoly Zelenin

00:01:45
Speaker
So I put those kinds of questions to Anatoly Zelenin. He is the co-author of Apache Kafka in Action, and he's a contractor who works with companies to build out Kafka deployments.
00:01:56
Speaker
So he knows the theory, he knows the business case, and he knows how well it's working in practice. And he is pleasingly frank about that. The answer is good and bad.
00:02:08
Speaker
There are successes. There are works in progress. There are still gaps in the market. But I think he gives a great point in time snapshot of the state of Kafka and the state of data streaming.
00:02:20
Speaker
So let's find out what it is. I'm your host, Chris Jenkins. This is Developer Voices. And today's voice is Anatoly Zelenin.
00:02:40
Speaker
I'm joined today by Anatoly. How you doing, sir? Good. Thanks, sir, for having me. yeah Pleasure. You're coming in from Europe, which is always nice on the latency for the internet connection. always like that.
00:02:54
Speaker
And you're a very busy man, right? You've just finished translating your book, Apache Kafka in Action, from German to English? And back to German.
00:03:05
Speaker
And back German. Did you do it all yourself? No, luckily not. and So i I had the pleasure to write the book together with my co-author Alex, which is a friend of mine for very, very long time since university.
00:03:21
Speaker
Ah. Has that strengthened the friendship or make it made it worse? I think it's independent of it. ah Good, good, good. I could say that kind of project would put a lot of stress on the relationship. It does, it does, definitely. Yeah.
00:03:36
Speaker
Okay, so Apache Kafka in action. i i want to spend the majority of this conversation talking about the inaction part. But we have to get everyone on the same page. So start off with Apache Kafka.
00:03:50
Speaker
What's your definition of it? Why do people use it? Why do people care about it? I think we need to look at this from two different directions. We need to think about more from our direction, from the developer perspective. And I think the more important thing is to look from it ah from the business perspective.
00:04:10
Speaker
So from a business perspective, perspective perspective I think that Apache Kafka is very interesting because and the world is happening real time, but our IT systems most often do not.
00:04:23
Speaker
And when I started in which is not that long ago compared you, I believe. um that's the most polite way of saying I'm getting old.
00:04:36
Speaker
No, I've been around a while. yeah me Yeah, me not. Also a while, but not that long. um We're still using batch systems everywhere. We're still waiting for stuff that people from my generation and probably even younger expect to happen in almost immediately and it takes days, a week, so even months to happen.
00:04:58
Speaker
Yeah, the overnight job is still very much a part of our landscape,

Kafka's Role in Real-Time Data Processing

00:05:01
Speaker
right? Yes. Not only the overnight job, but also there the job at the end of the month, the end of the quarter, the end of the year.
00:05:09
Speaker
and yeah I think Kafka is one way to enable companies and especially larger companies to do that stuff in real time or near real time because I don't like the term of real time when talking about Kafka because it is not the technical term of real time.
00:05:26
Speaker
yeah Yeah, there are a couple of different definitions of real time. But if if fast is overnight, then Kafka is definitely real time. Yes, or as in it takes maybe seconds or milliseconds to do something instead of hours, months, or even worse.
00:05:42
Speaker
Yeah, very fast. Let's call it relatively very fast time. Yeah, near real time. yes I've heard a term business real time. Business real time. I'm going to use that.
00:05:54
Speaker
Okay. Okay. And from the technical do perspective, I think Kafka is, there's one thing you need to understand when you're talking about Kafka from a technical perspective. It is just a log.
00:06:08
Speaker
So, and then I don't know how much we want to go into the depth of it, what a log is, but a log is basically just a list of stuff where you can append data to And then read sequentially from the very beginning to the end.
00:06:24
Speaker
It sounds very easy. it is very easy. it sounds like, what is the purpose of this? But I think the idea of a log, of having a history of stuff that happened, a history of events, it's very, very powerful when you think about that.
00:06:43
Speaker
When you think about that in terms of putting the history of everything that happened in your company, in your organization, in the center of your IT architecture, think this is very, very interesting.
00:06:56
Speaker
I think you've raised one or possibly two other business cases here, right? Because I've heard a lot of people in the Kafka world talking about real time being very interesting. The other one that comes up almost more often or almost more immediately is connecting different systems together, right? It's it's not about making data get around faster than a day, but making it get between department A and department B at all.
00:07:26
Speaker
But this is something we had for years and years. So this is, I don't think this is a differentiator of Kafka. We have enterprise service buses that are older than myself.
00:07:38
Speaker
We have messaging systems that are older than my myself. We have them for dozens of years, but I don't think that this is the key differentiator of Kafka. Because...
00:07:50
Speaker
If you go to a business person and you try to sell them an ESB, you can say, yeah, we have this ESB enterprise service bus. You have here, I know you have an SAP and ERP system here, and you have your CRM and Salesforce, and they don't want to talk to each other.
00:08:06
Speaker
I have here a ready-made solution for you. you just need to pay me money, and then and just works. Hmm. If you have Kafka, I cannot sell them this like that because I can sell them Kafka or tell them, yeah, you can use Kafka for that. But then you need to build connectors for SAP. You need to build connectors for your Salesforce.
00:08:32
Speaker
And you need to do that yourself or buy it from somewhere. This is not a nice story. yeah Yeah, that's fair. Unless you find, because like you've got Kafka at the center and it's just a nice, neat place to dump data.
00:08:48
Speaker
If you find you've already got that piece, the reader piece from the source system and the writer piece out, then you've got a chance of making that story work, haven't you?
00:09:00
Speaker
Yes, but then you have maybe a chance of making the story work from system A to system B. But if you need to talk between system a to system c then you will need to write everything, the transformation logic again.
00:09:19
Speaker
And the ESB will tell you, no problem. You want to have a non-assertive system. By the way, we have a plugin and um it's done. You just pay us the money. With Kafka, you need to build it yourself.
00:09:33
Speaker
And you need to, ah probably you will have twice as much data for that. So i I know this is what happens and I think this is valid. It's a valid use case of Kafka to connect systems, but I don't think this is the main selling point of it.
00:09:49
Speaker
That's interesting. Okay. Okay. So you're really sold on the real-time angle as being what people in the real world are after.

Is Kafka a Database?

00:09:57
Speaker
One point. I think the other one is lark.
00:10:01
Speaker
Because the to the best of not my knowledge, apart from systems that build on that concept of Kafka or something like Pulsar, which is, I don't know, sibling maybe of Kafka? How would you call it? Pulsar?
00:10:15
Speaker
A sibling? Maybe? Yeah, sibling maybe cousin. Cousin? Twins in the family tree. Yeah. um Plastical messaging systems do not do forget their stuff.
00:10:28
Speaker
But Kafka can remember the information you shared with it. Yes, that's the thing, isn't it? what Do we throw away the data once we've processed it once? There's a defining difference, yeah.
00:10:40
Speaker
And me being, in a way, an old-school relational database person, I dislike the idea of throwing away data habitually. Me too. Fair enough. And and um whenever I'm talking about Kafka, what's very important to me is to say that Kafka is not a replacement of a database.
00:11:01
Speaker
A database and Kafka are two different systems that are both very important to us in the IT architecture. Okay, I was going to ask you, because its one of my favorite questions to ask Kafka people, is Kafka a database? Yeah.
00:11:16
Speaker
And everyone has a different answer. ah depends how you define the database. That's my answer. ah Go on. um If a data database is a system where we can store data, so a database, then of course Kafka is a database.
00:11:34
Speaker
But this is probably the most drawing definition of a database. Okay.
00:11:40
Speaker
But if you say a database should be more relational based, maybe you should have some strong consistency guarantees, then Kafka is not a data database.
00:11:52
Speaker
Okay. But i'm
00:11:55
Speaker
what I also like to to think about is what questions does a database system answer and what question does ah um does a Kafka answer? And I think they answer two fundamentally different questions.
00:12:10
Speaker
Okay, go on. what What do we store in a database? If we look at the data from a physical... So let's go to the philosophical side. side Yeah. Let's look at the data we have in a database.
00:12:21
Speaker
And I believe what we have in a database is the description of the world that we know.
00:12:29
Speaker
Current state of the world. Yeah, this is the next step. Oh, okay, sorry. The information about the world as we know it. And then the question is, at what point in time?
00:12:42
Speaker
Now, a relational database describes the count state of the world as we know it. Yeah. Is this useful to have?
00:12:56
Speaker
Yeah, of course. Otherwise, you wouldn't pay Oracle millions of dollars every year or pounds or euros every year. Yeah. This is very useful. And i believe that almost every IT system needs place where it stores the current state of the world.
00:13:15
Speaker
yeah Do I see that databases, relational databases, will go away in the near future? No, because it's very useful to have. But if we look at Kafka and remember that Kafka is a log, is it the current state of the world?
00:13:30
Speaker
No. Maybe you can look into Kafka and find out the current state of a single log.
00:13:38
Speaker
entity, maybe you know the current description of a person with its each current name, the address, credit card information, something like that, yes. But it's not the world, just the person.
00:13:51
Speaker
I think, the so if a database stores the current state of the world, I think that the um the main question the database answers ah what is, what is happening now? What is the current state?
00:14:07
Speaker
what I know about the world now. And if I look at a log where I see all the history of what happened in my company, think the question that log or Kafka answers is, what happened?
00:14:21
Speaker
Yeah. Is this useful?
00:14:26
Speaker
Yes, I think so. Is it always useful to have a system that can answer you the questions, what happened? No. No. Depends on your needs, right? Depends on your needs.
00:14:39
Speaker
If you're financial auditors, the answer is always yes. Yes. If you just want to find out someone's current email address, not so much. Yeah. If you're just a small startup and you don't have a lot of data, you just, or you're manufacturing company ah doing very small batches and yeah, somebody ordered a new piece of metal and you need to know what's the concept of the order, you don't care so much.
00:15:05
Speaker
oh And i would even say that you can, of course, you can implement this question and those answers in a database. you can You have history tables, you can have

Kafka in Microservices Architecture

00:15:16
Speaker
version tables, something like that.
00:15:18
Speaker
This works. But at some point you will say, no, this is not enough for me. And then a central log like Kafka starts to become interesting.
00:15:32
Speaker
You've done plenty of work with... them with companies on Kafka, right? So is that
00:15:41
Speaker
is that how they're adopting it, as a way of doing real-time data or as a way of saying, actually, we need to know the way things used to be as well as the way things currently are?
00:15:54
Speaker
I think both. I think there's, think Kafka started out as this thing that, yeah, we need to move huge amounts of real-time data from point A to point B. as This is how LinkedIn is using and used Kafka. This is how every fancy startup is using Kafka.
00:16:14
Speaker
It's very interesting from a technical perspective, but from an architectural perspective, it's just this huge pipe in the center where you just push data into it and then get data out of it
00:16:25
Speaker
a bit ah but boring, isn't it? There's something beautifully boring about it's an append-only log. Yeah, it's just a pipe. You put stuff into it and it works. Yes, of course, you need to think about scalability, about pricing, about reliability of the system.
00:16:42
Speaker
It's just a pipe. Yeah. Yeah. So my I like to look at it more from an architectural perspective and this is where my customers are mostly.
00:16:54
Speaker
And some customers even say, yeah, looking at the amount of data we have, maybe we wouldn't need Kafka. and We are talking about thousands of events per hour.
00:17:06
Speaker
This is for us a lot, maybe millions of events per hour. But for Kafka, this is still boring. It's bought by these amounts of data. um But still, it's very interesting from an architectural perspective and the the ideas that Kafka enables or things a Kafka enables companies to do.
00:17:27
Speaker
To say, okay, we have we can now but can have now independent teams that work. yeah Like in microservices, we have different teams that work on their stuff and we need to define ah ah an API between them.
00:17:42
Speaker
Maybe we define a REST API with all the pros and cons of synchronous communication, or we can say, yeah, but we would like to do asynchronous communication and then Kafka comes into play, which again has its pros and cons.
00:17:56
Speaker
And then you don't even care about real time. You just have, yeah you need to, as long as you um s your data sticks to the schema, just put it into the bucket.
00:18:11
Speaker
um It's ordered, it's nice. And whenever somebody else care about it, they can grab the data. Whether it's now, as in real time, whether it's to tomorrow in a month or two, you don't care.
00:18:26
Speaker
I wonder, see, it makes me think, there is a definite parallel between microservices Kafka and REST WebSockets, let's say, in that the industry has massively adopted the idea of microservices, but very much in a request-response, REST-centric way.
00:18:50
Speaker
The industry has not in the same way devoured the idea of WebSockets or real-time stuff. It's there, but it's not like it's not become a core technique in the same way that I would say Kafka is there, but it's not become a core technique.
00:19:06
Speaker
And they they're structurally very similar. Do you send stuff and wait for a response, or do you just send stuff and fire and forget and hope, expect the right answer to come back eventually?
00:19:18
Speaker
And my question is, is there some fundamental reason, in your opinion, why one is more useful or more adopted than the other?

Kafka vs. REST APIs

00:19:27
Speaker
Or is it just habit? Have we not got used to doing things asynchronously?
00:19:37
Speaker
would say it depends. So when you're saying, yeah, we we are adopt REST as basically the interface for everything. I would agree with that.
00:19:48
Speaker
And when I'm thinking about that, I'm thinking first about something like GitHub or every API you find on the internet is REST. Yeah. Now, is that a habit or fundamentally some advantage?
00:20:01
Speaker
It's easy. It's simple. We have so much technology that just works with REST.
00:20:10
Speaker
Doing the same stuff over Kafka would be so much complicated more complicated. Mm-hmm. Imagine we have OAuth and all the authentication protocols doing that with Kafka.
00:20:25
Speaker
Do you want to allow any other third party to access your Kafka directly?
00:20:34
Speaker
you're You're laughing, yeah? yeah i mean, I'm laughing for two reasons. I'm laughing because I agree with you, but I'm also laughing because I think we have the same problems in REST. We're just used to solving them.
00:20:46
Speaker
Someone says, oh, just use OAuth. Same problem, problem solved. I'm used to doing OAuth. Yeah. We hate it probably because it's very annoying and complicated.
00:20:57
Speaker
um But no one gets to complain about it because it's the one of the things you have to be able to do to play the game, right? Yeah. But I don't think that we have something like, it's just not possible with Kafka.
00:21:12
Speaker
Actually, maybe maybe it would work. So actually, it would be even easier with Kafka than with REST for some certain approaches. But we're not there yet.
00:21:24
Speaker
And I'm not sure whether it's um important that we became able to do that.
00:21:34
Speaker
and Okay, so what do you you don't think it's necessarily important to have authentication in Kafka? No, no. no no What I mean is is it important to be able to open Kafka-like APIs to the world?
00:21:55
Speaker
Probably not. Probably not. Because if you're just using REST, it's much easier and you can just say, this is my API.
00:22:07
Speaker
And yeah, you know that it's awful and annoying. And yeah, you will not get real time results, but it's good enough for you.
00:22:20
Speaker
What I see is that if you have... If you have a one-to-one relationship with another company or with two companies or three companies or five or ten companies, then you can ask the others, those could you give us the data in real time with Kafka? And they will just say, yeah, of course, we can do that.
00:22:41
Speaker
Okay, so you're seeing Kafka as almost a B2B solution. with the With possibly the modifier that sometimes in a large company, business to business is actually department but to department. D2D, let's coin that one.
00:22:56
Speaker
Yes, definitely. But there's still the whole familiarity thing, I think, which is important to developers. like we are you Even going from department to department, Kafka is weird and unusual compared to and HTTP baby microservice.
00:23:17
Speaker
But I think as soon as you're are in the company, you can
00:23:23
Speaker
you can use other arguments here. Okay. You can say, oh okay, but do you, you go to a business and or to head of department, ask them, do you want to rely on the others that they are able to have their services up and running when you need them?
00:23:49
Speaker
And often they will say, I'm not that sure about that. So do you say that if you're using HTTP, I need to rely on the other department that they are managing their services correctly and reliably?
00:24:05
Speaker
I don't want that. Yes, because there's the other side of like almost the the insight of the actor model, isn't it? That if you need to rely on a response, then you have introduced a tightly coupled reliability problem.
00:24:20
Speaker
If all your services rely on request response, then they are all dependent on all the others all the time. And we are in the territory of inside data versus outside data.
00:24:33
Speaker
Oh, explain that one. Do you know that paper? ah i I've heard of it. I'm not sure I've read it. But for the sake of me and the listeners, yeah unpack it. There's an amazing paper from Pat Halland, who I think is at Salesforce

Data Models and History

00:24:49
Speaker
right now. um Yeah. um And it was 2004. It is years before Kafka.
00:24:58
Speaker
okay and he thinks about yeah When we're looking at data, we need to distinguish the data we are using in this service and data that we are using between services.
00:25:13
Speaker
When we're in this service, we need to use, or we want to use data structures that are tightly coupled to our services. And tightly coupling is a good thing because if we are coupling things tightly, we can make the most effort out of it.
00:25:31
Speaker
So for example, um if we use a relational database, relational databases are awesome. We can have selects, we can have joins, we can have filters, we can have triggers, we can do a lot of good and bad stuff with that data.
00:25:46
Speaker
And this is perfect as long as you stay in your service boundaries. Okay.
00:25:52
Speaker
But if you start to rely on other teams, other departments, other services, third-party providers, you actually want to decouple the connections to them, but also the data to them.
00:26:11
Speaker
You want to have data that is basically outside of your service. And Pat Halland describes that the form of the data must be different than the form of the data on the inside.
00:26:25
Speaker
The form of the data on the inside needs to be optimized for your service. But the form of the data on the outside should be optimized for the communication, for the independence of the services.
00:26:39
Speaker
The best way you can achieve it is if you have a contract. You basically say, this is a contract. This is how my data looks like. It is not optimized for you. it is not optimized for me.
00:26:50
Speaker
But this is there something in between that we can work with. In the best case, I think this is something the data stands for itself. You know, when you look into a relational database and just to look at a single row in a single table, and more often than not, you cannot think about that.
00:27:09
Speaker
Because there's so many pieces that are missing that are in other relations, in other tables, ah just IDs, stuff like that. But you don't care because you like the atomic...
00:27:23
Speaker
um the atomicity of the data, you like the normalization of the data, ah the du duplication of the data. But if you're on the outside and you just have this piece of information with user ID, address ID, email ID, ah product ID, you cannot do anything with that.
00:27:44
Speaker
So what you're saying is if I've got an order to buy something in my system, I might just have a user ID in that order, which I'm happy to link to. But somewhere in the organization, if that data gets used elsewhere, there should be a centralized thing that is both the order and the user details in one place.
00:28:04
Speaker
Yes. OK. OK.
00:28:10
Speaker
Because? Because. what Why is that? We need to think about the data in the time. If you have just this piece of information, there is an order for a certain user ID. do a what Do we know the current user at that point in time which this information is talking about?
00:28:37
Speaker
Imagine the user um ordered something to address A. And a second after that changes its address. And if you have a dump system, you might want to send the old order to the new address.
00:28:54
Speaker
But the user ordered it to the address that was their address at the time when they did the order, and then later. And then to do to do this joins over time is very, very hard or nearly impossible.
00:29:11
Speaker
to do them correctly. Yeah, you'd need some kind of... um What's the term? They're there are like XTDB, like historical databases that can do point-in-time queries, which is even less mainstream than Kafka these days. Yeah. yeah okay Okay, you are envisaging a world with maximum data reuse between departments, but maximum data...
00:29:40
Speaker
format optimization within a department? It's not And not only me. You know, um think that in every community in the IT, there are a few papers or blog posts or ideas that are shared.
00:30:00
Speaker
Yes, the focusing lenses. Yes. Yeah, I think, I feel in the closure world before Kafka and there's this paper out of the tar pit that describes how horrible the the state of the IT currently and was there and is now and will be probably for the next 10 years at least or forever.
00:30:20
Speaker
And I think for me personally, there are two very influential pieces in the Kafka world. There's one, data on the inside first data on the outside by Pet Halen from 2004.
00:30:34
Speaker
2004. That's a very long time in computer years. Yeah, but it's still up to date. And yes, of course, he's talking about service-oriented architectures and stuff like that. We don't like to talk about that anymore.
00:30:46
Speaker
um And the second one is ah the piece by Martin Kleppmann, who describes turning the database inside out. Yes, yes. and When only two people get together to talk about Kafka long enough, Martin Kleppman will come up.
00:31:01
Speaker
Yes, I believe so. I did want to talk about that because... um and Tell me if I'm wrong, but I would summarize the central idea of Martin Kleppman's thing as every data every relational database has eventually evolved to the point where it realizes the core of it has to be write-ahead log for durability and reconstruction purposes and replication.
00:31:27
Speaker
Yes. So the idea is, why don't we start by building a database with the write-ahead log and then build the pieces up on top? like Exactly. Now, so this leads me to my big question.
00:31:40
Speaker
That journey with Kafka as the write-ahead log and building the pieces up on top to get a full database must have begun over 10 years ago.

Challenges in Distributed Database Systems

00:31:52
Speaker
I don't think that journey is complete. No, neither. What have we achieved? What are we missing? oh Small question. I think what we achieved is that we have a very good log.
00:32:07
Speaker
I'd agree. But the rest? Oh, you're that bleak. Okay. So I think what we are also quite good at is having this log, we are quite good at building systems, building these views on top of it.
00:32:24
Speaker
But we're doing it manually. So we have a consumer that fetches the the data from the log, transforms it, stores it in their relation database. okay Which is totally fine. This is what Martin Kleppman described. Yeah, you can just use a relational database as a materialized view over the log.
00:32:44
Speaker
Yeah. Okay. I think we had approaches to try to do what Martin Kleppman described, like a SQL DB by Confluent. Yes.
00:32:56
Speaker
Sadly, now not officially dead, but de facto dead. Yes, de facto dead. you know, two years ago, I asked the people from the teams, what about KSQL DB? And they said, yeah, it's not dead. We're still working on it.
00:33:10
Speaker
um On the last current in London, but there was not a word talked about KSQL DB. It wasn't mentioned anywhere. It's still not dead.
00:33:21
Speaker
buts do some but Some people are working on it and continue working on it, fixing bugs. But I don't think that... I think it will be retired by the time it gets a new feature, because I don't think it's going to get any new features.
00:33:35
Speaker
Yes. That's my opinion. But this is to just to ground ourselves quickly. So you've got this log, which is a recording of all the things that happened. And you can rehydrate that into the current state of the world by going to a database.
00:33:50
Speaker
Or you could try and run an SQL query over it with something like ksqlDB if that weren't dead. There are some other interesting answers. This flink is another big answer. I want to mention quick streams as a solution.
00:34:04
Speaker
yeah um But quick streams, isn't it just, just yeah, it's it's a good thing. um Isn't it Kafka streams for Python? Yes. Yeah, i think that's a fair summary.
00:34:16
Speaker
It's like we have these different ways of running over that log and producing something that looks like a query. There is no obvious clear winner in that space at the moment. And as you say, it all feels rather manual.
00:34:31
Speaker
there you mind um Compared to you go to Postgres, what how do you do queries across a Postgres database? There is one obvious answer and it's easy. We're not there. And even if you say, yeah, we have Flink and Flink is good and it's awesome, the queries are not simple. And you if you write a simple looking Flink query, it is very hard to predict the performance of this query.
00:34:56
Speaker
Have you used Flink in production?
00:35:00
Speaker
No, I had a few customers that that tried to use Flink in production. um Some of them successfully, some of them said um at some point, actually for us, it's much easier to just do a Java code. And yeah, we know it's a lot of code, but we can we know how Kafka Streams works.
00:35:16
Speaker
And for us, it's even simpler than debugging and operating Flink. So do you think the state of, this is what I'm trying to get to, this this idea, which Martin Kleppman's core idea, which I think was good, start with a solid log, which we have, and then build up to a full database.
00:35:39
Speaker
is Do you think there's a solution to that? do you think there's a good solution? Do you think we are still scrambling around in the dark? I think we are still... The problem is this is not a database or it's not a relational database because compared to what we are trying, what Martin Gladman described and what maybe we are trying to build as a community, it's very, very hard to do because now our views are not views in a mono monolithic system, but it's a view...
00:36:11
Speaker
we have a view here in this department, we have in this department, we have view in this department. These are different teams talking different languages, business languages, or programming languages, whatever, or even human languages.
00:36:25
Speaker
and And Martin Kleppmann also says, there is no obvious way to do that, how to do that. He said, okay, maybe you, because you're trying to optimize for different use cases.
00:36:38
Speaker
Maybe we have a department that needs to to have a very fast cache for the homepage. um And then they will use Redis or... but What's the... Yeah, Valky or whatever.
00:36:52
Speaker
um as a cache system. Another department wants to do analytical queries over it and will use, I don't know, Pinot or Iceberg Format or stuff like that.
00:37:04
Speaker
The third department just wants to have a SQL query over it, which they do transactionally. So they will use Postgres. So the problem we have is that we open up to the technology we allow people to use.
00:37:20
Speaker
So are you saying that we shouldn't aim to build one big database as we build this picture back up? and we should just the The fact that it still feels like a bit of a free-for-all is a good thing?
00:37:35
Speaker
Yes and no. I think we are missing... so i don't think that there's one solution, there won't be one solution to to handle all this complexity.
00:37:46
Speaker
I think it should be much easier to build stuff like that. So we are missing so many tools to make it easy to build this thing.
00:38:01
Speaker
Yes, we have Kafka. Kafka is great. It's just a log. It's nothing more. We have Kafka Connect, but Kafka Connect is a bit... Cupid, would say.
00:38:13
Speaker
See, not from a bad perspective. It's good. It does one thing and one thing well, but it is just one thing. Kafka Connect can only move data from point A to point B. Doing transformation in Kafka Connect is awful.
00:38:28
Speaker
Yeah, okay, so let's call it simple and focused rather than stupid. It's less judgmental. but Yeah, maybe. Okay, I agree with that. Okay, so you've got you've got storage of long-term ordered data. You've got getting it in and getting it out.
00:38:45
Speaker
But transformation is something is very, very hard in this system. Yeah, because you will often want to maybe do something simple like rewrite the Avro data into JSON data for certain departments.
00:38:59
Speaker
That's not too bad. That's probably a upper limit of what I want to use Kafka Connect for. Yeah, Kafka Connect with Kafka Connect, that would work. But yeah, i for example, I just want to join the order data with the customer data.
00:39:19
Speaker
Yeah, that word just is a much larger word than it sounds.
00:39:25
Speaker
But it's very, very complicated from a technical perspective. If you're very happy with Java and Kafka Streams, you could argue that's a solved problem.
00:39:38
Speaker
That's very hard to do. So yes, it is a solved problem, but it's very hard to implement. So I mean, in Postgres, you just say, select staff from orders, inner join, or left join customers.
00:39:52
Speaker
And it works. And it's relatively performant. And if it's not performant, you just add an index. Yeah, there's nothing of that simplicity. Yeah. If you do that with Kafka, you need to deploy deploy a Kafka Streams application. You need to think about your state stores, about here um abolish your topics, ah the changelog topics.
00:40:14
Speaker
You need to think about how much memory your instances will need, how much traffic they will produce, how to operate that stuff. Yeah. And then you haven't talked about the performance kind of characteristics of that.
00:40:27
Speaker
Yes, I can join two tables in Postgres by the time I've started writing public static void main, you know, Kafka Streams application. Yeah. So do you think we'll get there? Do you think do you think the future looks like something where the developer experience of Kafka is much, much better? Or are we heading somewhere else?
00:40:48
Speaker
I'm not sure about that. I think Flink is trying to do that. But again, Flink is so complicated. Of course, you can say, yeah, Flink is not that complicated if you just use a managed Flink service.
00:41:02
Speaker
But I'm often in industries that don't like managed services because they run in cloud environments of the big three American companies and not everybody but likes them.
00:41:17
Speaker
I've definitely found, ah i'm not going claim to be any flink expert, but my experience of it is that self-manage will get rid of a class of problems to do with deployment, but it's still quite complicated and hairy just from a query app perspective, I felt.
00:41:37
Speaker
Yeah. So even if you remove the operational overhead, still not as simple as doing on-process query. um I think there people that have ideas how to improve that.
00:41:53
Speaker
I've talked with ah ah with a guy who is building an index over Kafka. Okay. Where the idea is, what is what what does the relational databases are so powerful and so performant because we have indexes.
00:42:11
Speaker
and can build indexes over the data in and the database. And actually, so we would also need something like that for Kafka if we want to do ad hoc queries.
00:42:23
Speaker
But again, this is another piece. Yeah, we have Kafka. Okay, we have no Kafka. We have Kafka Connect. Now we have an index. We need to have a query engine. And we build everything like that that distribute in a distributed way.
00:42:36
Speaker
It is much, much harder than... just a relational database. Yeah, i mean, just a relational database pretends to be one thing. It's got all those pieces inside, but you don't have to think about it.
00:42:50
Speaker
Not that much. Your DBAs will think about that. And of course, it has also complications. and that For me, it kind of sounds like we are...
00:43:06
Speaker
We're criticizing Kafka too much. No, it's an amazing product and it's an amazing project and great community. But relational databases had a head start of 60 years.
00:43:17
Speaker
Or 40 years. Yeah, yeah. Decades, easily. Decades, definitely. yeah yeah um We are now in the distributed world. So everything in a distributed system is much harder than in a non-distributed system.
00:43:32
Speaker
And we are building it back up from first principles like a log. so But that would suggest, i I put to the man writing um a book called Kafka in Action, that maybe that's a futurist book, that give it a few more decades and to catch up, and then we'll want to put it in action.
00:43:54
Speaker
I don't think that's what you're saying, but that's what it sounds like you're saying. I think... i think maybe Maybe we need to clarify a few things. Kafka as it is now is very, very useful.
00:44:06
Speaker
But I think what we are talking talking about and also what Martin Kleppmann proposes is the future we want to head but we are not there yet.
00:44:18
Speaker
And it would be great if we had this future to today, but we are not there. Yes, that's the more optimistic way of putting it. we We have great things now and look how bright the future can be.
00:44:29
Speaker
Yeah. So let's try and ground that in the near future then. What do you think it's kafka the Kafka ecosystem has recently started doing well?
00:44:42
Speaker
And what do you hope the next solution is? think what we are very well at is, again, doing this log stuff. Getting people to understand, to play around with the log.
00:44:55
Speaker
So there's so many companies... that are using Kafka right now that didn't do that five years ago, that haven't but hadn't the ideas you need to wrap around to understand what Kafka is.
00:45:12
Speaker
What we also good at is to technically progress Kafka. I mean, Kafka is just a log, and to this day, Kafka is just a log. But we are, as a community, we are optimizing for how to do it cheaply.
00:45:28
Speaker
How can we store the data? So when I started out with Kafka, um
00:45:34
Speaker
everybody said, don't store data in Kafka for a long period of time. this is not a persistent storage. Then few years ago, tiered storage came. Then the one thing we don't want to talk about today, but maybe you want to talk about the next podcast episode or so, is um can we do it cheaper?
00:45:54
Speaker
Can we have a solution where we but kafka the Kafka brokers don't store data and where we can scale storage and compute independently of each other?
00:46:06
Speaker
So basically, can we have diskless Kafka?

Financial Implications and Data Querying

00:46:09
Speaker
um I think this is a very, very important important and interesting KIPP graphical improvement proposal we have now.
00:46:17
Speaker
Yes. and And the reason I don't want to cover that is we are planning a deep dive into that very topic because it's very technically juicy. Yes. but And I don't think about it. i so I think the technical perspective is very interesting. But if you look at it at the financial perspective, it gets much more interesting. And not because of, yeah, okay, you can store the data in S3 and stuff like that. But it changes...
00:46:41
Speaker
the calculation of Kafka, the financial calculation, because currently you need to think, okay, will I need the data in the future? And how much will it cost me to store the data for a long period of time?
00:46:56
Speaker
And so my favorite example for this is not, yeah, maybe we will need it for AI training or stuff like that. But very simple. If you have a service that relies on this data,
00:47:08
Speaker
And you know that in a year or two or three months, you will have another service that will want to rely on this data you have in Kafka. If you're in a bank, it's just the transaction log of all the credit card transactions you had, for example.
00:47:27
Speaker
This is something you need in a bank every time, everywhere. Yeah. But if you're building a system that relies on a real-time log of your transactions, how do you think about the past?
00:47:41
Speaker
How do you get your historical data? And currently what you would need to do is you need to have two data streams. You need to to think about, okay, how do I get my initial data from the foreblinking?
00:47:53
Speaker
And how do I get the real-time data stream?
00:47:57
Speaker
Yeah. And it gets very expensive because you have two absolutely independent ways of thinking about the data. You have your initial data, The initial data load and the real-time data load.
00:48:11
Speaker
But if you store all the data in Kafka for ever, whatever this forever means, you can just read the data in one way. And yes, you need to pay for the storage, but you don't need to pay for the engineering months. Yes. Yes.
00:48:29
Speaker
Yeah, one of the nice things about Kafka is it's not really a difference between reading through the data and reading the latest data. yep conceptually. But as you say, it gets very expensive. mean, I always have in my mind as a benchmark for important data, seven years. I think I've picked that up from legal regulations of how long you have to keep financial data in the UK.
00:48:51
Speaker
Seven years is how long I want my data to be permanently around. And that gets super expensive with some Kafka solutions. Super expensive. With almost all Kafka solutions.
00:49:05
Speaker
But it shouldn't do. Seven years worth of log data shouldn't be a huge disk, right? No. It will, but compared to compared to everything you care about, if you have millions of customers, it's a few hundred terabytes.
00:49:22
Speaker
yeah That's not that much. No, if you're making enough trades to fill up that disk, then you ought to be making more than enough profit to account for it, right?
00:49:33
Speaker
Yeah. ah Apparently, it's a a country we are not there yet. Yeah. it's too count It's too expensive and I'm looking forward to the time when this gets cheap. So this is how I see the evolution of Kafka from... It went through a phase of trying to of becoming um the log as reliable, durable, scalable storage.
00:49:57
Speaker
And then it went through a next stage of becoming like really production ready, like monitoring, like making um tiered storage and um object storage cheap, things like that, and making the ideas mainstream.
00:50:16
Speaker
And I think we spent probably the past five years-ish trying to crack the next layer above, which seems to me to be querying. and I don't think we've cracked it.
00:50:26
Speaker
We've just kind of... but if you say that we are at the level, so level of making and production ready, I think, yes, we are there, but we are just there for the basic stuff, for the monitoring operations. I think this is solved.
00:50:39
Speaker
I think, so even if you're on-premises, projects like Strimzy make it so easy to operate Kafka for you. um But the price is a problem.
00:50:51
Speaker
And yes, and the next one is to the query. And querying something like Flink is trying to solve, but...
00:50:59
Speaker
But I'm not 100% sure whether Flink the final answer to that. i Do I want to go on record? Okay, i'm going to go on record saying this. This is just um my in my feeling has always been about Flink is that it's a very good implementation of an outdated solution.
00:51:24
Speaker
That's interesting. As in, it's how we might have built a really high-quality query engine before distributed data was the norm.
00:51:37
Speaker
And i just have this feeling like if you were born into a world where distributed data systems were the norm, you wouldn't have architected Flink the way it is. And now going to have to edit that out because it's too controversial, but give me your take.
00:51:53
Speaker
Okay.
00:51:56
Speaker
I would like it if you leave it in the recording because it's I think it's a very interesting take. ah think I think don't have the with Flink to to judge it on this perspective.
00:52:14
Speaker
um But maybe you're right. i I still think we are looking for... The developer-friendly...
00:52:29
Speaker
ah No, i put it this way. I'll put it this way. When I first encountered Kafka, I thought, this is the answer. that This shows where we've been going wrong trying to make distributed databases.
00:52:41
Speaker
as kind of um ah replicating a central transaction lock. No, that's not going to work. This will work, just um easily appended to logs and then build up from there. I thought, yes, there's a fundamental insight there that makes the core of it simple.
00:52:59
Speaker
When I look at Flink, I do not see a fundamental insight that makes the core of it simple. And I'm always waiting for that.
00:53:09
Speaker
it your next project? No, my my next project is something completely different. But I just... is do you Do you see anything on the horizon that fits that kind of description?
00:53:23
Speaker
Do you have hope for one? Or am I belking up the wrong tree and not looking for the right thing? I'm seeing so many people focusing on the analytical side. But it seems to me like if you go to a current Kafka summit, whatever, and go to the...
00:53:43
Speaker
booth Do you see a lot of companies thinking about transactional data?
00:53:52
Speaker
mean, not. No. i would do How do we help developers to build transactional systems that power businesses and the which is available?
00:54:04
Speaker
Analytical data, which is not, yeah, we need to munch some data and then have nice reports or stuff like that. But how do we replicate basically Martin Kleppmann's idea? And I don't see that currently. My instinct will say that distributed transactions are just too hard And what we will end up doing is taking something from the actor model, which is to say, okay, real-time asynchronous streams of stuff, but every time you need a transaction, that is a subsystem.
00:54:40
Speaker
It's an actor or it's a Postgres database or something through which all the important transactional stuff goes through and happens at once. Yeah. But still, it is very hard to, if you have a very distributed data stream, how do you put that in this system?
00:54:57
Speaker
actor in this relational database.
00:55:00
Speaker
And, you know, I had a customer that said, yeah, we we love the idea of Kafka of doing everything real time. And we have here the Oracle database. ah For a certain purpose, we need we have this yeah this small It's tables and lines of code.
00:55:23
Speaker
How do we implement that with Kafka's dreams?
00:55:30
Speaker
What did you say?
00:55:35
Speaker
Cannot do that. It's okay simply impossible. This is the one, for Kafka, this is just the one way of thinking. We need to think totally different about that.
00:55:48
Speaker
We have still these systems that are very normalized. And the question is, how do we bridge the gap between the normalized relational databases and the denormalized real-time streams?
00:56:01
Speaker
Because everything, both are very useful, but it's very hard to reason about the one with the ideas of the other.
00:56:14
Speaker
And do you have an answer to that question or you just feel like that is the problem? I think so this is one problem. Okay.
00:56:24
Speaker
And I don't have an answer yet. Yeah, yeah there is.
00:56:31
Speaker
i mean, in a way, it's that fundamental problem between how do we talk between departments without... without forcing our way of talking to them being talking to ourselves and without being tightly coupled to other people.
00:56:48
Speaker
ye And I think the fundamental insight there is has to begin with data because data is the only truly decoupled thing. Data without thought of how it's going to be used. Yes, that's the point.
00:57:02
Speaker
Data that is independent of the system or Or in Pat Helen's word, you would need to think, you need to start thinking about with the outside data. Because the inside data is the easy one, easy stuff. The outside data hard.
00:57:17
Speaker
And not because Kafka is hard or because because Kafka is a bad system and it's not thought through or it's just too young, something like that. But because it is just hard.
00:57:30
Speaker
Because we are talking about how do we model data that is accessible to different people in our organization. And this is hard. Now, i playing ah computer science historian slash devil's advocate, I'm going to have to say, well, this is exactly the problem that Cod was wrestling with when he invented the relational database, right?
00:57:55
Speaker
He was like, we're we're making a bad job of storing data the way it wants to be written. We should store in a way that doesn't know how to be read, doesn't know how it's going to be read. Have we just recreated that problem in the log-based world?
00:58:12
Speaker
Isn't it this the same question? so it isn't So can you repeat his question again? i think this is very interesting. what he he said that, if I can paraphrase Cod, his fundamental insight was, our problem is that we're storing data the way we want to write it.
00:58:32
Speaker
We need to be storing it in a way that doesn't know how it's going to be read back and shouldn't need to care. Then I would say this is not what um databases do.
00:58:44
Speaker
yeah But how do I phrase it correctly? So of course, databases store the data are very, very independent.
00:58:54
Speaker
Hmm. but
00:59:00
Speaker
the data, the structure of the data that is stored in a database is very tightly coupled to the way we read the data from the database.
00:59:11
Speaker
And it is very specific to the applications that have the access to the database. And if one application tries to change something in the database, it has impact to everybody else who has access to the database.
00:59:32
Speaker
Really?
00:59:34
Speaker
so think a well-normalized database will, within one node, and that is a big limitation, but within one node, I find I can query out the same fundamental data in lots of different interesting ways that might be appropriate to me. Yeah, but you have different teams that are accessing the same data, and over time, you will have a lot of Yeah, intermingling and a lot of a lot of data that belongs to this person and a lot of data that belongs to that person and they don't know about each other.

Data Mesh and Decentralized Governance

01:00:13
Speaker
And then you need to start changing the data you have in the database and the every change in the database often has an impact on other people. This is what is happening in every large IT organization we have in the world.
01:00:30
Speaker
Yes. Now, I think, again, correct me if I'm wrong, but I think what you're getting at is that in the perfect world, we would have our nicely normalized data that worked for everyone.
01:00:44
Speaker
The reality is that the data structure must change over time. the perfect world, we would have also a very nicely architectured monolith, where teams can work independently of each other. And this monolith is a non-distributed system. It is easy to deploy, easy to manage, easy to to develop with.
01:01:05
Speaker
And if we wouldn't need to build microservice systems that introduce so many problems with distributed data and distributed components. We could just use monolith.
01:01:17
Speaker
But we're not in the perfect world. Yeah, yeah. Because we don't just... We distribute systems for two reasons. One is scale of the technology. And that's the boring reason, I would say? Yeah, yeah. well it's certainly it's certainly boring in the sense that it's the easier one to solve. Yeah. The juicier and more complex one is coordinating people and their different needs and their changing different needs is really hard. Yep.
01:01:44
Speaker
So we split things up. And we need to do the same for the data. Yeah, yeah. I see that. And Kafka and also the data mesh, basically, if we want to introduce another buzzword today.
01:01:59
Speaker
okay unpack the, ah put it on the table and unpack it. Okay, o whoa, whoa, whoa, data mesh. So the idea of the data mesh is, the will oh i think the the headline is data mesh is to data what microservices are to services.
01:02:21
Speaker
How do we build, how do design data systems that work independently of each other and where different teams can work together? And basically the question is, okay, how do we do that? And I forgot her name. I have here.
01:02:42
Speaker
Look, I have the book here. Wow. with Within arm's reach, you've got your reference books. Nice. So if I'm not sure whether I pronounced the name correctly, but Zamak Tengani,
01:02:54
Speaker
um he she has these four full ideas of a data mesh. Maybe I'm not sure whether but get them. um So I think one is we need to think about data as a product.
01:03:07
Speaker
And currently, this is what every analytical department struggles with. They just get some data and it's not optimized for them. it is optimized for the source, for the ones who produce the data. What we need to to have instead, and this is what we talked, I think, for the last 45 minutes about, how do do we model data that is usable for other people in our company?
01:03:31
Speaker
Yes. How do we write it in a way that isn't the way we want to write it, but the way that's going to be used in multiple different ways? Yes. We've talked about 45 minutes and probably about four or five decades in an industry.
01:03:44
Speaker
yeah And now, help so the idea is, okay, now we have this log, and how do we optimize the data in the log process? to follow this and an answer would be, yeah, we have, we need to have data products where we have somebody who's responsible for that.
01:03:59
Speaker
We need to have life cycles. We need to have um governance for that and so on and so on. but One thing we need to be able to,
01:04:10
Speaker
um to explore the data, to think about that, to look at the data and to see, oh we have, actually, we have this piece of data already in our data measure and this piece of data. And if we join them together, we will have all the information we need.
01:04:28
Speaker
so Are we there yet?
01:04:32
Speaker
We have Kafka UIs, but are Kafka UIs the answer to that? I'm not sure about that. It's a good start being able to see all the data in an organization, but it's, yeah, it's probably not, we haven't reached the end game yet.
01:04:45
Speaker
Yeah, we have Async API. But it's not the end game.
01:04:52
Speaker
Yeah. Do we have data products? I don't think the question is whether we do that. This is more often this is ah purely organizational. This is not a technical problem, data products.
01:05:04
Speaker
This is the organizational problem. um ah Okay, um data products, then decentralized governance is one of the other core principles. And the idea is what we have learned with the enterprise service buses and the messaging systems is if we put all the responsibilities for the data in the system to a single team, it will not work.
01:05:28
Speaker
Yes. Teams will be overloaded and I work with that systems. Every change takes three to six months. This is not possible. You cannot do that. So basically the idea is we need to decentralize our governance.
01:05:43
Speaker
Are we there yet? I doubt it. Again, this is not a technical problem. This an organizational problem. Ah, data exploration, data products, decentralized governance.
01:05:56
Speaker
And what is the fourth point? Domain ownership. Domain ownership. We need to have people that are responsible for their domains.
01:06:09
Speaker
Basically, we need to have team that is responsible for their data products. this is very much tied to the idea that we don't have centralized owner ownership, centralized governance. I think all of these four pieces are very tightly coupled.
01:06:24
Speaker
yeah And you cannot do, okay, you can kind of do data products without anything else. But as soon as you start say, yeah, we would like to have a decentralized governance, you're already in the idea of data products and domain ownership.
01:06:39
Speaker
Hmm. And self-service is something that is also quite important, which I would put it into this decentralized governance ah field, which is to say, okay, we need to help developers to make it very easy for them to use the data and to produce data to it.
01:06:59
Speaker
I had a customer, they basically produced the same data to and q IBM MQ and Kafka. And then they had a customer, internal customer, and they said, okay, we have two choices.
01:07:13
Speaker
We can use the MQ or we can use Kafka. We know how MQ works, but the problem with MQ is there's just one team managing it. And to get the approval for the data takes six weeks.
01:07:29
Speaker
If we want to get the data out of Kafka, it will take maybe four hours. to get the approval for that because somebody needs to click a button. Right, yeah. This is not a technical problem. but It's a purely organisatorical problem. Yeah, yeah, yeah.
01:07:44
Speaker
And it's the organisational problems that eventually like strangle you. Exactly. So is this the current state of the promise of Kafka that it can, as well as doing the technical things it does, it bypasses some of the organizational problems of sharing data around. And that in itself is just phenomenally valuable.
01:08:09
Speaker
It is. Definitely.
01:08:13
Speaker
Yeah. So we we keep hoping for things on top of it, but you are you are a lot more optimistic where you are than perhaps you've sounded in some of the past hour.
01:08:26
Speaker
Probably, yes.
01:08:29
Speaker
I think, yeah, we are getting there slowly. We are not perfect. We are working on it. um Things are moving.
01:08:41
Speaker
Give me one prediction then to wrap up. what When you work on the next edition of Apache Kafka in Action, what do you hope you'll be talking about in optimistic terms?
01:08:55
Speaker
I would hope that it would be much, much simpler to work with Kafka in a transaction So basically, to use Kafka as a transaction data store and to query it simply.

Future Aspirations for Kafka

01:09:10
Speaker
Yeah. i I think that's two things. I am pessimistic about the future a transact of distributed transactional systems.
01:09:21
Speaker
but very optimistic about the potential future of queryable um transaction logs. By transaction, i don't mean transaction like with asset transactions. and Okay, what do you mean?
01:09:33
Speaker
I mean this, I want to do something. And here, so basically to use Kafka more like, more like a relational database, even knowing that this thing is not asset-compliant, but it's eventual to um eventually consistent.
01:09:55
Speaker
Right. So basically building applications with Kafka.
01:10:01
Speaker
That sounds to me like you're looking for an easier way to do that reconciling transactional actor somewhere in the middle. Maybe.
01:10:12
Speaker
Maybe. Which, again, I think would relate to making the usability of the querying processing layer much richer. I agree with that.
01:10:24
Speaker
Okay. that's That's the hopeful future of Kafka, but the present the present couple with a good book, let's say, but where is a good one. Definitely. I think and i think we we have a fantastic community. We have a fantastic product.
01:10:39
Speaker
um And I think if you think it There are so many projects that benefit from Kafka right now without all this future we talked about.
01:10:52
Speaker
um and I think Kafka is a great product for so many different use cases, even today. On that note, we've got a good thing to use and ah work and perhaps good research projects to go with.
01:11:07
Speaker
Anatoly, thank you very much. that was That was more philosophical and more chewy than usual, and I really enjoyed it. Cheers. Thanks for having me, Chris. Pleasure. Thank you, Anatoly.
01:11:18
Speaker
As always always, you can find links to everything we've discussed in the show notes, including a link to Martin Kleppman's talk, which I would thoroughly recommend to anyone, and a link to Anatoly's book, Apache Kafka in Action, which he co-wrote with Alexander Kropp.
01:11:33
Speaker
So shout out to Alexander wherever you are today. If you've enjoyed this episode, please do take a moment to like it, rate it, share it with a friend. While you're doing that, I'm off to speak at a conference in Miami next week. It's one that's being put on by Modern, who've been longtime supporters of this channel.
01:11:52
Speaker
And I'm going to be recording a few live episodes out there. So if you're at Code Remix next week, come and say hi. And if you're not, make sure you're subscribed and there are live episodes coming down the pipeline soon. Until then, I've been your host, Chris Jenkins.
01:12:08
Speaker
This has been Developer Voices with Anatoly Zelenin. Thanks for listening.