Introduction to Apache Kafka
00:00:00
Speaker
Apache Kafka is our topic this week and it's an interesting tool because it's sort of like so many different things in programming. Sometimes it looks like a database. Sometimes it looks like a message queue.
00:00:13
Speaker
Sometimes it solves the problem of overnight batch jobs taking too long to run. Sometimes it looks like a data integration suite of tools. And if you're dealing with large amounts of data, or large numbers of departments trying to cooperate,
00:00:30
Speaker
I'm going to go on the line and say it's essential to at least know about Kafka and to know what it does, even if you don't use it, it's something you want in your mental toolbox. So I've brought in a friend of mine, he's a Kafka expert and a consultant, Neil Buzing, and we talk about all those different ways of understanding Kafka and looking at it.
00:00:49
Speaker
all the things it can actually do for you. We talk about when it's the wrong solution to your problem, and when it's the right one, how do you actually introduce it into an organization successfully? So if you have data problems, integration problems, or processing speed problems, let Neil add some knowledge to your toolkit. I'm your host, Chris Jenkins.
Neil Buzing's Background and Kafka Fundamentals
00:01:12
Speaker
This is Developer Voices, and today's voice is Neil Buzing.
00:01:29
Speaker
With me today is Neil Buzing. Neil, how you doing, man? Doing great. How are you? I'm very well. I'm very glad to see you. We've crossed paths many times in the past, but I've never got you on record before, so this should be fun. And maybe a little scary. We'll have to see. A little bit of fear, never harm. Just put the edge on things, right? Exactly. So you are a consultant and developer in the world of Kafka.
00:01:58
Speaker
which is how we met. I've got a background in Kafka too. But for those that don't know, let's start here. What's Kafka? Why do we care about it? Kafka is a very fast distributed system to basically allow you to build real time jobs to do work. And the core of Kafka that I like to get into is
00:02:22
Speaker
It's simple in regards that Kafka itself is pretty dumb. It is an immutable log of events. So it is a record of what's happened. And the first thing I usually try to tell people about it when I'm going over Kafka is.
00:02:38
Speaker
Especially those that are already developers working in the Java space or in a coding language that I can speak to effectively is you're trying to record what has happened. So give me an event. Tell me something you did versus giving me a command and telling me what to do. So from a Kafka standpoint, I want to record events that I have done and let other people react to it.
00:03:06
Speaker
and do their work independently of me. And Kafka does this very well.
Kafka's Capabilities and Integration Tools
00:03:10
Speaker
Kafka liberates the ability to write applications to where what I have done, what my program has completed is completely independent from what other people have to do as a result of it. And by making
00:03:30
Speaker
the infrastructure in Kafka to be high throughput, low latency, durable, which means if something goes down, things still work, or data isn't lost, available, which means if part of the system goes down, the data is still available. As an application developer, I don't have to worry about any of that. So Kafka liberates me from having to build infrastructure
00:03:57
Speaker
to send messages, send events to others, and I get to focus on the business logic of building that. And the paradigm shift, like I like to tell people, is telling me what, write an application that says, whatever you've done, let other people do something or not do something as a result.
00:04:20
Speaker
And there's a lot to Kafka. It's very hard to summarize and put together in like a one minute section or one minute. And so I will ramble and give it to, and maybe flounder on the way there. But it's, I mean, at a simple lesson, I think it's where Jay Kreps has mentioned when he developed, it's an immutable lock.
00:04:42
Speaker
And if you think of it that way and you don't put anything more to it, then it's pretty easy to grasp and understand. The complexity comes with all the integration that you're going to do with it. There is stateful processing with Kafka streams or other streaming platforms. There's integrations with Kafka Connect. Those are things built on top of the core Apache Kafka components.
Debate: Is Kafka a Database?
00:05:07
Speaker
But at the heart, it's an immutable log where you record stuff that's happened and you don't have to worry about if someone read it or didn't read it, that's up to them.
00:05:19
Speaker
I'm going to ask you my favorite Kafka question, which usually divides the room. Let me guess, I think I know what the question will be. Let's see if you can re-empt it. So you've said you're storing large amounts of data with high availability and durability. This is really sounding like a database. Yep, I knew that's your question. Yeah, is Kafka a database?
00:05:43
Speaker
My answer is no, but it has the same building blocks that many databases have, and it can give you characteristics of a database when you need it. So it can emulate options to a database, but
00:06:00
Speaker
I think that's more from a practicality standpoint versus a theoretical one. I think you could argue that Kafka is a database theoretically. It meets the ACID type requirements that a database usually tries to strive for. But it's not going to be a database where I do relational inquiry. I'm not going to query Kafka on its own to find out events that have happened. I need to build something from that. I'm not going to be able to
00:06:30
Speaker
use it for any type of search, any type of business use case by itself. So is it a practical database that I can use on its own? No. But is it a distributed system that has database aspects that ensure that I don't have to build certain things myself? Yes. I don't have to worry about durability. Kafka takes care of that. I don't have to worry about availability.
00:06:59
Speaker
takes care of that. So, from that standpoint, I can see why that conversation comes up in it. But as a consultant, as a client, if someone comes to say, can I replace MySQL with Kafka? I would say no. What's your use case? What's your business problem? Maybe we can replace MySQL because you're not using it the way typically a relational database is being used. You're using it for
00:07:26
Speaker
a low level event queuing mechanism, then yeah, we could replace it. But if you're, if you're searching for users who bought product three days ago, no, replace that with. Okay. Okay. This is perhaps something your world-class qualified to answer then, because it's very abstract. Let's pin it down. When someone calls you in to talk about Kafka, what problems are they actually trying to solve
Kafka in Industries and Real-Time Applications
00:07:54
Speaker
with it? What are they building?
00:07:58
Speaker
As a consultant in this space for many years, I've been brought in to bring in Kafka for a variety of verticals. I would say if you're brought into a company that hasn't used Kafka before, so they're trying to get to real-time streaming, the typical use case is we have data.
00:08:23
Speaker
that we're refreshing in a system that happens overnight or is two hours old, four hours old. We can't react to it in real time.
00:08:34
Speaker
So typically, we want to move to Kafka or a similar technology, but 99% of the time it's Kafka. That's the tool of choice in this space by most companies. And the thing is, I need to take this effort that I'm doing and speed it up. And that's in verticals in healthcare, in financial, in retail and others, but that's typically it.
00:09:03
Speaker
Once you get Kafka in place, which may take a while, and you use it as a conduit. So your first task typically is moving data. Now you have data access accessible in Kafka as it's streaming through. So the benefit you now have is you now expose that data for other services that could benefit from it.
00:09:31
Speaker
So a typical use case would be taking data from legacy databases and streaming it through Kafka into, say, an analytical database, a Pino, a Druid. And that allows people to do queries on the data, analytical type queries. So how many people are buying iPads at noon at this brick and mortar store in Minnesota?
00:09:59
Speaker
Minnesota's in the US here. We're just talking about that. But I realized we started recording after that conversation. So we got this analytic, and now we're flowing data through Kafka to get it to this other analytical data. But now developers can say, I could build my own microservices listening to those events to maybe determine
00:10:24
Speaker
Maybe there's a price issue in a product. All of a sudden, this product started to have more purchases. I could build alerting systems, event-driven systems, reacting to that data as it's flowing through Kafka. That's typically after clients have had experiencing using Kafka, integrating it with their business requirements.
Building Systems for Kafka and Microservices
00:10:51
Speaker
That type of questions
00:10:54
Speaker
is part of the consulting work, and that's usually the fun part, is you now get to expose them to new use cases that they can do with Kafka that they couldn't do before. Typically, I do that in Kafka streams, which I mentioned is part of Apache Kafka, but it's not the core Kafka that people tend to talk about when they ask, what is Kafka?
00:11:19
Speaker
So I didn't go into that, which I'm sure we probably will. I think we'll touch on that. So that's a lot we brought into. There are clients, and the one client I like to talk about because we did a talk at Kafka Summit, so I know it's public, was work at Sentin with Brian Zelly. Brian Zelly and I did a talk in 2019, San Francisco Summit, where we built an event system
00:11:46
Speaker
with Kafka. So we were building a routing ecosystem because we wanted to make it easier for people to put their data into Kafka. It's a lot easier to tell people to come to Kafka to get your data if we already have it.
00:12:01
Speaker
Teams will say, hey, you have data that I'm interested in. I'm going to write consumers. I'm going to learn about the Avro data model that you have on your topics. I am going to ingest that and do business logic on it. You go to a team that says, we need your data so we can let other divisions, other parts of the company do something.
00:12:23
Speaker
What's in it for me? So don't make it hard for them. Make it easy. So we built a system that made it easier for them to bring data in to Kafka. And then over time, people will then start using the product, getting access to the data. It's all about exposing data so people can do their, their business. And if it's in Kafka streaming, they can do it in real time.
00:12:49
Speaker
So the house of this is some kind of, if you could publish your business data and then subscribe to it in another place, you quickly find there are lots of other reasons to subscribe to the same data for different purposes. Yes. Yeah. Good way to put it. I, some people will wonder, and I'm sure you can answer this is, um, if it's published subscribe, is it a message queue?
Kafka vs Traditional Message Queues
00:13:18
Speaker
Not yet, there's the KIP, a Kafka Improvement Proposal to sort of solve that problem. It is not a message queue. It is like a message queue. In a lot of ways, it can be. Typically, when I built IBM MQ systems back 10 years ago, the idea was for the one retail business problem that I solved,
00:13:46
Speaker
It was one application produced the messages to the queue. Totally async, didn't care about any response from it, and another one picked it up. I could have replaced that with Kafka in a heartbeat. The queuing was pretty much an event system. I submitted orders that I created, and now I want someone to fulfill them, for example. That is a paradigm. With queuing, though,
00:14:15
Speaker
There's certain, I need global ordering of processing, which you can't get with Kafka unless you want, really, really want it. Kafka is trying to work through throughput. So its goal is you don't need global ordering. You typically need just ordering of individual messages. Like if you update your address, I update my address. You update your address again, I update my address again. I need to make sure my events are processed in order.
00:14:45
Speaker
Your events are processed in order, but hopefully mine and yours don't have to be processed in order between each us. Yeah. So Kafka has that. Yeah. You and I exist on different timelines. Yeah. And the same for the shopping basket that I'm trying to fill, right? Correct. Okay. And Q's, the other part that a lot of people do with Q's is they want the response mechanism.
00:15:12
Speaker
In other words, and there's temporary cues with them, I'm gonna publish something, but I want you to publish a message back to me after you've completed it. That's very hard in Kafka. It's meant to be totally decoupled. When there is a sense of coupling that comes back to the problem, that's where it becomes usually not the best pattern to use Kafka. I've done a talk on it, and I actually, it's one of the,
00:15:41
Speaker
Most controversial talks I think I did was called synchronous Kafka. Um, and it's typically not the pattern you want to do, but there's frameworks that add like the ability to re get a response back from the actual producer. So I want to produce a message and I want, so I want you to create an order for me. I want it to be asynchronous, but I want you to tell me the person who published it that the order was created.
00:16:06
Speaker
That's not how Kafka typically works. And that type of queuing design is a harder one to implement in Kafka.
00:16:14
Speaker
I sometimes think that, not always, but sometimes it's a sign that you're not really architecting your application correctly because you've opted for a Q or CAF or some mechanism that says, let's be asynchronous, let's fire and forget, but you haven't really embraced the fire and forget. You're pretending it's still a function call with a return value. So yeah, that can be thought of. And typically you try to have it solved.
00:16:42
Speaker
without Kafka or a different means, but you'll be surprised. You're brought in. They, they invested into Apache Kafka and part of the investment in new technology is what can I remove my, from my technology stack as a result and saying you shouldn't remove, remove your JMS cues. Your rabbit cues should remain or something like that.
00:17:06
Speaker
And you're like, so I'm spending more, I'm investing in more technology and I can't liberate myself from something else. So there are times when, when you have a small set of use cases, like you should have rabbit or you should have JMS, but it's one thing.
00:17:23
Speaker
There are ways to do that in Kafka. And I will work with clients to decide, do we want to do that? What's the risk of doing that? Or do we want to continue using a technology that was more designed for that? That's architecting, trade-offs. That's part of that. But it's doing that with the right questions other than you can do everything in Kafka. Kafka's great. Let me show you how to do it, which it is, by the way.
00:17:50
Speaker
I know you're a great fan of Kafka, but you've opened yourself up to a very juicy question there. When do you get people calling you in saying we want to use Kafka and it's the wrong choice?
Kafka's Role in Modern Applications
00:18:04
Speaker
It is. A lot of times around the microservice, the infrastructure is still very rest oriented, very much request response.
00:18:17
Speaker
And you built a very command centric. Set of microservices or macro services or a monolith, depending on terminology and where people are in their, their journey. If you are like create an order for me and let me know when it's done.
00:18:37
Speaker
type of thinking, that's gonna be a hard shift into Kafka versus I created an order, I'm gonna tell you about it. If you're shipping, that's on you. If you're pricing, maybe that's on you, that's a different service. If you had adapted that well, then you're better off. And there are times where you need to be synchronous. I mean, usually we develop something that is front and center to users on their phone or on their computer.
00:19:06
Speaker
And they need a response back on what they're doing. So there is a synchronous nature to it. If you look at a ride share application, all that data is typically flowing through Kafka to make sure that that user gets that ride in a reasonable amount of time, but they're not using Kafka from their phone. Their phone is doing a.
00:19:27
Speaker
and HTTP requests, server send events involved, maybe even WebSockets if you're lucky, but it's not Kafka.
Development Patterns and Java Integration
00:19:34
Speaker
That is behind the scenes. So you're going to have that other integration that needs to happen there. And so if people try to move Kafka two to the front end and they're having to couple those systems, that may be the case where they're not ready to bring in Kafka yet. They're not necessarily got to a
00:19:55
Speaker
more event type thinking. And that's usually when they're not either, they need to pivot more in their architect design to bring Kafka in and use it effectively. That's often the hardest thing with technology. And I think it's one of the reasons why we tend to see a lot of new things that are incremental improvements and old things, because it's not adopting new technology that's hard. It's adopting new ways of thinking about solving problems.
00:20:23
Speaker
It's hard. So it always reminded me a lot of functional programming. That's exactly where I was going to go. Kafka Streams is built on Java, which is an object-oriented language with functional aspects to it. Kafka Streams is very functional and it uses Lambda functions of Java very beautifully to where it is
00:20:49
Speaker
Is pretty elegant but it is very much the idea of functional design reacting and thinking and that paradigm shift for me. Was hard even though i did i did learn functional languages in college in the industry i've never done things functionally.
00:21:08
Speaker
because that's not the technologies that were prevalent when I started, and still aren't. I mean, there is functional aspects to stuff, but the dominant language is functional wasn't their primary design. I mean, Python is not a functional by initially, or people could argue with me, maybe. Java certainly isn't. So the paradigm shift of Kafka is kind of the same way, and that is the challenge, that is the discussion of working with
00:21:36
Speaker
with people in the enterprise to figure out how to think differently. I think the idea of microservices, which is now a decade old type concept longer, definitely makes it easier for people to think and leverage Kafka. If it wasn't for that, I don't think people would be able to grasp and use Kafka effectively. That's interesting.
00:22:00
Speaker
So because they weren't particularly, it's not that like they were co-designed, but you think the two approaches are very sympathetic. I think so. Kafka, for those that are new to it, has this call a, from a consumer side, a consumer group. And consumer is a read only connection. We should start there. Correct. Yeah. Consumers are read only connection, except that it has to keep track where it's read. So there is some rights that go on for those that.
00:22:30
Speaker
Um, want to know how it works, but if I have two consumers, they can share the work to consumers in the same group. If you start building those in more complicated application. So if I had, if I'm reading from orders and I'm reading from inventory and I have two consumers doing them independently, but I built them in the same app. If I wrote, do a, a upgrade to one component and not the other.
00:22:59
Speaker
I still have to bring both down to bring up that macro service. And it becomes more of a challenge of orchestrating the work that you're doing on top of these systems. If microservices weren't there, I think that would be more of a challenge to the people's way of thinking. The idea is I write small applications.
00:23:22
Speaker
One of the things that comes up a lot in my design of microservices for clients is I don't use frameworks anymore if I have a choice. I write Java code. Kafka Streams is a framework. To me, it does all those things that most frameworks need to do that people go to them for. But I don't use frameworks because my applications are very tiny.
00:23:47
Speaker
Most of my streams apps do not make any external calls. They don't make any rest calls. They don't provide any input requests from consumers, from web access. So it can be a couple hundred lines of code. Why do I want to bring in a framework that gives me their opinions that I now have to learn and their
00:24:11
Speaker
infrastructure in order to bring up an application.
Challenges in Writing and Formatting Data for Kafka
00:24:14
Speaker
So you write smaller codes, very independent, you truly can write microservices and not when I wrote microservices 10 years ago, it was me pretending to write microservices, they were, they were small monoliths, or they were macro services, I tried to put everything into it. So I needed a framework to do dependency injection, to do security. And now for most of that, I don't when I do if I need to build a
00:24:39
Speaker
a front end that serves web requests. I'm probably going to bring in a framework because I don't want to write the security around HTTP, the cross-site scripting checks. I'm going to bring in something to help me do that. But if I'm using Spring or I'm just doing a simple Kafka producer and consumer, I'm just writing in the least amount of framework code that I possibly can. Because it should be a fairly straightforward thing to write some stuff.
00:25:09
Speaker
as a series of events and read it back as a series of events. That should be lightweight code. But that harks back to something you said. You said you built a demo and gave a talk because if people could write things in easily, it would open up Kafka to them. What was it that was hard about writing stuff into Kafka?
00:25:35
Speaker
Um, they had to, that's a good question. I don't consider it hard to write into Kafka. Um, but it is, I have to create a connection to a Kafka broker that is not HTTP. It is a proprietary connection that requires setting up infrastructure of, of open ports and security around that. Kafka client library does all the work for you.
00:26:05
Speaker
But there's still something new that a person needs to bring in. The data format, how you're storing the data, and that put becomes, there's no gatekeeper to validate their data. They're publishing directly to Kafka. You usually then write other streams, applications to validate, check the data. Then someone has to build that. So most of these clients were already doing RESTful
00:26:31
Speaker
production of their data to RESTful services. So we built a RESTful front end that then pushed it to Kafka, validated, they would publish JSON, we would convert it to Avro. For those that don't know Avro, Avro is a binary, a serialization of bytes that is smaller than writing JSON. And it is well supported in many libraries.
00:26:59
Speaker
the, the orchestration around it, maybe not so much, but the binary format, I can do it in any language. And because it saved about 30% in storage and it was strongly typed. In other words, I knew it was a decimal versus, is it an integer or is it in Jason? You don't know of a numbers, what kind of number it is. So you have more, more data check.
The Consultant's Role in Kafka Projects
00:27:23
Speaker
So we did all that for them.
00:27:25
Speaker
because we needed to make sure that the data was useful to those that were consuming it. So it wasn't that it was necessarily hard. It just wasn't their problem. It wasn't these other teams' responsibility to get their data to others in a streaming way. So we did it for them and tried to give them the interface that they were most comfortable with.
00:27:50
Speaker
Right. So that, that raises the question, like how much of your work is like getting, how much of it is the architectural side of this is how you build this kind of real time streaming app versus, okay, this is how in detail you write the code that does it versus like being a kind of DBA. This is how to actually live with it in production.
00:28:18
Speaker
If every week is different, every client is different. Um, so the work for the centene, the part that we talked about, that was a four to six month pilot project. Let's get it up and running. So I was very much hands on writing code and the architect was Brian.
00:28:37
Speaker
and great guy and had that vision and it was taking his vision and putting it in paper and paper meant in Kafka streams, in Kafka and developing that. So I wasn't really an architect from a building an application standpoint, I was the Kafka architect. I was the one determining how we wrote the producers, how we wrote the consumers, how we leveraged Avro, which
00:29:02
Speaker
keeps coming up because it was a pain point on many things with that one. The data modeling, the data governance of it. For other clients that are pretty much new to the Kafka in that we have a business problem of trying to get from 12 hours to minutes. Help us from end to end do that. Then it's finding a team below you to do the work
00:29:24
Speaker
The daily development work in Kafka because there's a lot to do but you trying to how many applications we need to build how do we deploy those applications. You'll be surprised how much a Kafka developer and that level needs to become a dev ops person. Just to make sure that we can get our applications deployed cuz i'm going from some clients that deploy.
00:29:47
Speaker
A handful of applications to now hundreds or twelve instances of a consumer so it can be performing five to ten different consumers when before they would write maybe two applications to do that. So it tends to. Is that your preference of how you architect it very very granular or did naturally in that way. I think it naturally leans that way.
00:30:14
Speaker
That's a good question. From my standpoint, the more successful projects are the ones that are more granular. I think that usually indicates that people have decoupled the problem set better, which makes it easier to adapt to Kafka. Right. So if you find that you're deploying a lot more services and
00:30:43
Speaker
They're communicating a synchronously with each other it sounds like your deployment management and your monitoring and your observability. Suddenly become much bigger issues than perhaps they were before. Yes and getting dev ops involved or becoming a dev person to help get that involved becomes.
00:31:04
Speaker
a big priority. It is something that around. So I'm actually repeating. Well, I shouldn't say repeating. I'm doing another talk this fall at current. So Kafka Summit on Kafka Streams metrics and doing observability of your Kafka Stream application. So you can make sure it's working well. And three years ago during the pandemic, I did a virtual one of it. I am now trying to modernize the talk.
00:31:34
Speaker
and talk again because the observability piece is huge and people don't know or need when they don't. They need to invest in it and they don't know how much time it is to really do that. And every organization is different in their journey of observability. What tools do they use? How easy are developers able to bring tools into that and observe them?
00:32:00
Speaker
But yeah, it's a critical piece in a lot of the journey is a lot about getting visibility into the behavior of the apps, and that it is just about more than actually writing the apps themselves. Okay, so is that the price you pay for adopting Kafka, or is that the price you pay for adopting a genuinely distributed real-time application?
00:32:30
Speaker
I think it is a price you pay for building a distributed application. I do think Kafka adds more. I mean, because there's not necessarily standards around doing that and because it's
Monitoring and Observability in Kafka
00:32:44
Speaker
a distributed system, how you do that, you have to invest in doing it the Kafka way too. But you would be, if you were using another technology, you would have to do theirs as well.
00:32:58
Speaker
The, I think, so I mentioned Kips, a Kafka improvement proposal out there. This one's in the 700 range. I can't remember where it is. I think it's a very important Kip out there. I wish I had the number now that we're talking about it, but it's the ability for consumers to push their metrics to the brokers and make the brokers have visibility in how the clients are doing. So for example, many people use cloud providing services for Kafka.
00:33:27
Speaker
And they're monitoring tools of that, or they built Kafka and have it on prem or self-managed and they built their tooling very well around monitoring that they have all the connection set up. They have the dashboards. Now you go to them and say, we need to monitor the health of our clients too. Like, isn't monitoring Kafka enough?
00:33:49
Speaker
Like, no, it isn't. If a consumer is lagging, you don't know why, or it's not even easy to find out. So you have to monitor each application. And that's usually where people go, oh my gosh, now I'm like, what is, I should just monitor Kafka. So there's a KIP out there that allows clients to push their metrics to the brokers. So the brokers could then make them available for the monitoring tools.
Kafka Streams Overview and Benefits
00:34:16
Speaker
Now i don't have to write scrapers of prometheus if i use prometheus and graphana to go and scrape the metrics from each consumer. Necessarily doing each one differently i can just expand to what i'm looking at on the broker metrics themselves. But that requires a change to the client libraries to allow clients to push their.
00:34:38
Speaker
how well am I doing metrics to the brokers so then it becomes easier for them to display what's going on. So we need that KIP. I'm a big advocate of that KIP. If monitoring is easier, people do it better. If monitoring
00:34:56
Speaker
doesn't require all these extra steps, people do it. And then you're not being called at 3am as a result. Which would be very nice if we could just get to that world where everything was cheap and easy to distribute and we now end up at 3am, except for small children. So what you're saying is, I mean, to put it in equivalent space, if I had some kind of queuing mechanism, I would monitor the health of the queue.
00:35:20
Speaker
but I also end up needing to monitor the readers to see how they're progressing through the queue. It would be nice if the readers could report back to the queue and then I've just got one thing to monitor everything. Yeah. Okay, we've talked a bit about DevOps and monitoring. Let's talk a bit more about programming. Tell me about Kafka Streams and what's that for and what headaches people have with dealing with it.
00:35:48
Speaker
Well, Kafka Streams is a the best Java library out there if people ask me. Um, so I should, since no one's going to ask me, I'm going to state it. Um, if you disagree with me or please come onto the podcast and discuss your favorite Java library. Exactly. Um, it is a pure Java library that allows you to do stateful processing with Kafka. And there are plenty of alternatives out there.
00:36:15
Speaker
You can use flank, you can use spark, you can use Apache beam. These are all technologies to do stateful processing of events. With Kafka and staple usually means I need to join data like credit card fraud. Am I gonna?
00:36:32
Speaker
enrich it with the user data? Am I going to compare it to other credit card uses at the same time to do alerts? Was this credit card used in the UK and France at the same time and the physical location? I need to alert on that. But that requires knowing both events that didn't happen at the same time. So I need state. That state needs to be stored somewhere for days, hours, or forever for me to enrich my data and do something meaningful.
00:37:00
Speaker
Most of the technology in the space requires infrastructure to set up and do that. So I have a set of servers running Flink that I schedule my jobs to, or Spark, or even Apache Beam, Google Dataflow. There's tons of options out there.
00:37:18
Speaker
Kafka Streams approach is it's just a Java library, 100% self-contained. You spin up Java code today. If you're at a client that you do Java work, you can use Streams. It uses RocksDB database internally for the state that it needs. So we talked about is Kafka database. And the answer is not by itself, but it has those components. I can't search a topic for an event.
00:37:44
Speaker
I need to store it somewhere where I can search it. That's what Rox does. RoxDB is the availability of that state. Kafka is the durability. So Kafka streams will put the data in what's called a changelog topic. So if my JVM crashes, my pod dies, someone physically cuts off my hard drive and throws it away. I can bring up that Java application on that pod and it will rebuild its state from Kafka.
00:38:13
Speaker
So Kafka is the durability, Kafka is that part of the database that you need to make sure that your data is not lost. And you can get and rebuild quickly. Rocks is that availability aspects to making sure that I can search for stuff needed. So Kafka Stream State is part of the Java library. That makes it fast in regards, I'm not having to make any calls out.
00:38:39
Speaker
to do work, all the work's done in there. So if you write streams effectively, you're just going to Kafka for your, or you're letting Kafka data come to you, basically. You're not going to Kafka. The data's coming to you to do your enrichment, to do your joins. That's the paradigm difference of Kafka streams from the other technologies. And as a Java developer of 20 odd years,
00:39:07
Speaker
It was very easy and natural for me to use Adapt and bring into clients who are already in that space as well pretty easily. Yeah, I can imagine that going into new clients, if you don't have to set up extra infrastructure on top, that already is a huge advantage. But a library like that has to, for it to be useful, it needs to worry about being distributed and highly available. Correct.
00:39:33
Speaker
And Kafka is that distribution and highly available aspect behind it as well. So if you run jobs on a Flink farm and you need to do a lot of work, you may spin up 10 CPUs and they would distribute work between them. Kafka streams distribute work through Kafka. So if I have an order event,
00:40:00
Speaker
And I need to join that to a user event to enrich that order with user information. And that user is on a different streams pod. Because of how I built the app in, in my old world thinking 10 years ago, I would say, Hey, even the JVM that has the order, go grab the user data.
00:40:22
Speaker
Go grab the user data and let me enrich my order. That's the restful thinking. In the Kafka Streams world, it's totally different, and it's so elegant at
Adopting Kafka: Challenges and Mindset Shift
00:40:34
Speaker
it. That's one of the reasons I love Kafka Streams, is I have the order, but the other worker has the user.
00:40:42
Speaker
So it's not me going to grabbing the data from the user. It's me rekeying the order so the one that has the user will get the event instead. So I'm moving my order to where my user is and letting that one do the work.
00:41:00
Speaker
From a high level standpoint is basically Kafka Streams uses Kafka to distribute the work across all its workers. So if I have 12 Streams apps running and this one, so they each have one twelfth of the orders, one twelfth of the users, they're shifting work to where I need it to be, not going to where I am bringing the data to me.
00:41:25
Speaker
Yeah, it's I think a good mental model, which probably comes from the name is like you've got different streams or flowing gradually together to form a larger river, right? Yeah. Yeah. Okay. So what makes all this hard, Neil? That's a nice wide question. Oh, man, what makes this hard? There's, well,
00:41:53
Speaker
It a lot of times it's trying to bring the people that understand the business problem trying to solve. And get them to understand the technology well enough to trust you that you need to do something a certain way.
00:42:07
Speaker
And it's also getting the developers to really understand the business domain. I mean, and that's no different to Kafka than it is to any technology in the past. I mean, I've suffered through with ORMs, I've suffered through that model extensively in 10, 15 years ago of getting object model and database models to make sense.
00:42:30
Speaker
That's always the gap. I just think there's a lot to Kafka, that that gap is wider. It takes more to get up to speed in Kafka than necessarily... I mean, it's just distributed systems are hard and understanding that.
00:42:48
Speaker
I think it's also a paradigm shift of thinking, like from object programming to functional, from SVN to Git. There is a paradigm shift that SVN to Git. Oh, I mean, since I mentioned that my son is using SVN. When I went from SVN to Git, I felt like I'm too old for this. I'm never going to get it.
00:43:12
Speaker
Then all of a sudden I got it and I'm like, oh my gosh, this is liberating. I understand how Git works. Yes, there's a lot of nuances to it. I like using the Git example because I think others who switched from a previous tool from Git to Git have that same shift. It's like, I'm never going to get it. When you get it, you're like, this is the best thing ever. This is the way you'd build distributed
00:43:37
Speaker
types of code sharing, you you decentralize it and all that stuff. I feel Kafka with event thinking is similar. There is the I'm never going to get it type approach. I can't think from command thinking to event thinking. I'm always in the restful mindset that I need to get a response back in order to do my work that you feel like
00:44:02
Speaker
This is like, oh my gosh, I need to go back and find a different passion. But once you get it, you're like, oh my gosh, this is great, this is liberating. So the challenge is trying to get business people to get that feel of their getting it and the benefit that they're going to get from it. So as an architect, as a developer, as someone advocating for Kafka,
00:44:28
Speaker
It is trying to show them the business benefit of using this technology. And that is hard. And I think you go with, well, others are doing it. You can approach others are successful in doing it. So trust me that you will be successful, but let me help you get there. It is at least you have the ability to reference the success. There's, I mean,
00:44:56
Speaker
what, 80 of the Fortune 100 companies are using Kafka, the numbers are high. So it is a successful tool that people use. So it's trying to get that shift. And once they see it, once they get that paradigm shift, I think then things become a lot easier. But that switch was hard for me, and I see it hard for others. If it's not hard for you, let me know. I would love to learn how you got that so quickly, because I want to be able to teach that better.
00:45:25
Speaker
That is an important thing it really is it's like i think as architects and architecturally minded developers we go around looking for ways of doing things that are fundamentally simpler. And we're we're prepared to suffer a lot more to get to that.
00:45:47
Speaker
So some of us are naturally attracted to beating our head against Git or functional programming or Kafka in order to reach that promised land where things drop into place and suddenly it's so much simpler you wouldn't go back to the old way. And there's a challenge that because you worry are you are you going to get to where it's simpler or you did you just make a hard problem.
00:46:11
Speaker
equally hard, but now on a new technology stack. Yeah. And as a consultant, you don't, or as a developer, you don't want that. My success is by people saying, okay, Neil, we don't need you anymore.
Successful Kafka Implementations and Business Alignment
00:46:24
Speaker
Um, we're happy versus, okay, Neil, you made it very hard for us to move on. We're not happy, but we don't need you anymore type. It's a, you want people to feel
00:46:37
Speaker
benefits from using Kafka to be happy that they made the transition. And that's the goal. But you do worry about it. Did I make a complicated problem just a distributed complicated problem? So now I have more things to worry about. Yeah. I think I'm going to pick an example of Angular version one. That was a new way of doing things, which just turned out to be a different, awful way of doing things.
00:47:05
Speaker
That is one that so I I didn't do I didn't do any angular So and but I there was like a week where I wanted to okay I want to learn angular enough to be where I can talk about it and that was one of those things like a I didn't have the time and be I wasn't getting it and I'm kind of glad because
00:47:26
Speaker
It does look like it was one of those things that would be a continued challenge. The front space is a whole different complicated one to talk about, I'm sure. I'll let someone else talk about that one. Yeah. But it's one of the most fun ones because you get to see people happily using the thing you built.
00:47:48
Speaker
And that relates back to what you're saying. You want your clients to be happy. They went down that road and now they're using it for non-technical reasons. They're using it to make the business move forward. Do you have any favorite success stories that you're allowed to share? Well, I like talking about the centene one because I know I'm allowed to share because Brian had us do a talk on it. I hear from them that
00:48:12
Speaker
things keep are working well from what conversation I had. So that's definitely a great success. That is a hard question because you don't want to misspeak about what someone's doing with it. I'm trying to think if there's... I've definitely... To that point, I've been at the clients where
00:48:40
Speaker
The step to get DevOps, CICD, deployment process with Kubernetes. And I've mentioned this at Open Source North, which is a conference locally here. I mentioned in a talk there that you're going to need to invest in those. And ideally you invest in those before you go to Kafka. If you're an organization that's new to Kafka, new to microservices, new to Kubernetes, new to cloud,
00:49:11
Speaker
you're going to want to pull something off and try to get other things first. And I've been at one where we were doing Kubernetes was relatively new. I mean, the organization was using it, but not to the level that you probably want them to. Trying to deploy, do they deploy Kafka on VMs, bare metal or pods was also another conversation. It's just too much. So those that were,
00:49:41
Speaker
successful, and I'll refer back to the sense and team one because I know I can, is you have a champion from the business side that has a vision that's solving a business problem. Hmm. That's where you're going to get success. When, when the champion is a technology evangelist within the organization wanting to use their technology of choice, which is a Kafka in this example, it may not always lead to the best visual success within the organization.
00:50:08
Speaker
So the best for me is having that business advocate, like Brian was at Centene, and me being the technology advocate and bringing those things together. You need both to be successful.
Kafka's Role in Data Accessibility and Real-Time Processing
00:50:22
Speaker
So if you have one where it's just the technology or the business isn't trying to find the counterpart to evangelize the technology, then you're probably going to flounder. Or I think you will.
00:50:37
Speaker
My mind is trying to think of what I can talk about on that, but yeah, that's the part that where the challenges probably existed. So I would say one thing about that that is critical and the organizations that have people that really understand the data model domain are critical. You don't want just everybody putting Kafka bytes into Kafka. If you can't
00:51:02
Speaker
Come up with an enterprise solution to make sure people can use that data access that data know how the data is available how the data relates. You just have a now you just have more data somewhere that no one knows how to get to and that is critical. Yeah but do you think sometimes there's an element of if you build it they will come if there were a way to get business data in a structured reliable high quality manner then people would start using it.
00:51:33
Speaker
But you can't put the whole organization on the same page in the same day. Yeah, that is a fair point. You need a pilot or something there that gets that to show the value and get more people to embrace it and use it. That's probably a good better way of putting it. Do you think going back to the abstract then, do you think the.
00:51:57
Speaker
the most important thing for technology like Kafka? Is it making data available to different parts of the organization, which Kafka is good at? Is it being highly available or is it being real time? Or something else? Well, I'm trying to, the answer is yes to both, which is a bad, I think, I mean, it has to be real time.
00:52:26
Speaker
But I think that comes with the highly available too. But if I'm not able to get data to being where you can react to it within pretty much when it happened, then I'm not going to get this company to the level they are needing to be within their organization. They're going to lose to their competitors.
00:52:52
Speaker
if it's not real time. Everybody's expecting real time. When an alert on my credit, when a credit card gets used and the alert doesn't come in until six hours later because email isn't reliable or texting, that doesn't help me. I can't react to that. I'm frustrated. So if I'm building a system that isn't the immediate, customers are frustrated, business owners are frustrated. So that real time is needed. I don't know how,
00:53:22
Speaker
If you don't have availability though, I think you kind of don't have really the real time. You need to make sure it's there. So I'm having a hard time separating that. Well, let me put it from another angle. Are there people you're talking to? Are they mostly saying, our system just isn't fast enough? Or our system keeps crashing? Or I know the data is over there. I can't get to it because of integration.
00:53:49
Speaker
I think that's that last one is my data is there and I can't get to it. I think that's probably the one is where they come to us. I can't get to my data. And that was, that's a very common pattern. The, the last few clients I've been at is a we can't get into a real time, but we can't even get to it at all. Yeah, the the pulling data out of a legacy system,
00:54:16
Speaker
is you're going to be your enterprise consultant, you're going to be working with how do I get data out of a legacy system so people can use it. You have a lot of mainframe data that people can't use because it's too costly so they don't open up. You can't, I don't, MIPS are too expensive on my AS-400 or my mainframe so you can't access it.
00:54:37
Speaker
Well, I don't care if it's real time or not. I need to get it. Nope. You can't access it. So getting access to data is a front and center problem in many organizations. So maybe that's the probably the primary one is I want, I need to get to the data. And if I can also make it real time in the process, then that's an extra win. Yeah. Yeah. I think Kafka is one of those technologies where it can solve an immediate problem and have some accidental extra benefits along the way.
Future Demand for Kafka and Career Benefits
00:55:08
Speaker
So where do you think will be five years from now? Do you think Kafka will be more mainstream? Do you think the techniques of Kafka will be more mainstream, but not the technology? What's your prediction? Well, let me, let me roll back five years or seven years and then answer that. So when I, so 2017 timeframe is when I went all in Kafka. I'm like,
00:55:33
Speaker
I'm going to invest in learning this well because it's a business and personal benefit from doing so. And I co-founded a company solely in this space. So there's definitely value there. When I did this five years ago, I'm like, if Kafka is the leader for five years, if I can be this in five years, I get return on investment. In other words, I felt that it was worth the effort.
00:56:02
Speaker
It becomes that I felt in five years it would become more like other technologies where more people know it to where then you being an expert in it isn't as beneficial because it's more mainstream. It's more common. I didn't think it was going to be like the next technology that faded because it was too hard to bring in
00:56:29
Speaker
to then replace. You're not going to replace it with a similar technology. Someone's going to have to reinvent the reason to move away from it. But I thought it was going to be more mainstream. It is so hard to find Kafka developers. It is a hard aspect from a consulting to find people that know it and want to be in it.
00:56:57
Speaker
So from that standpoint, if I use the last five years, I think the next five years is very strong for Kafka, because if you know Kafka, because you're in high demand, people will seek you out. I get more LinkedIn, really obscure ones on Kafka related inquiries, but it's there. And I don't see that changing. I don't see businesses moving away from it.
00:57:26
Speaker
Like i said because i don't know what the next thing is that would would replace it and the community aspect of cockpit is so rich so vibrant that the people that are improving it is. Top notch people that are making that product better and it's open source that people continue to leverage so.
00:57:49
Speaker
Five years from now, I guess maybe I'm thinking my thoughts from five years will be now like a 10 year where it's no longer people seeking out. I'm guessing more will move to cloud services to where they need less Kafka operational experts, but the need for people to build event systems, real time system thinking in ways to extract data and make it available, that will remain.
00:58:18
Speaker
But the operational side will probably diminish because things are improving it to make it easier to run and more people are leveraging experts and organizations to run it for them. Yeah. Yeah, I could see that. So whether you're dealing with Kafka or functional programming or one of those other ways of changing your thinking, you think the future is bright for those new minds?
Conclusion and Future Engagements
00:58:45
Speaker
Do I think the future is bright for
00:58:48
Speaker
In regards to the space or engine. People who can make the mental shift to build applications in that way. Do you think it's worth it as a career investment, not just as a mental roughage? Well, I'm definitely a biased opinion and could probably say that I have my tunnel vision on, but I think so. I think there is still the, the mental thinking of it is.
00:59:18
Speaker
is independent of the technology. And that has made so many things available today that weren't available 10, 15 years ago. Rideshares wouldn't happen, credit card alerts at the levels. A lot of the user experience that people get on their phone that's real time, distributed systems are behind all of that to make that happen. And I think it's vibrant and that I don't see that changing. You keep thinking, is the space going to become saturated?
00:59:48
Speaker
I, like I said, I thought it would be by now and it's not even close in my, in my mind. I think there are a few headline businesses doing this really well, but the majority of businesses are not even close to that way of thinking yet. Yeah. Yeah. A lot are in there. Yeah. I think, I think you hit the nail on the head there. There's a lot in it. There's a lot, like I said, there's like 80% of the fortune 100 companies are using it. Hope my math is right.
01:00:12
Speaker
But are they using it well? Are they the ones that are showcasing it? There's still a lot to learn there. There's a lot for me to learn on it. I mean, as any developer, if you look at code you wrote five years ago, you should be improved from to where you're not happy with that code. At least that's the goal. That's not to say the code is wrong, it's to say that I've become better. I think that's important.
01:00:42
Speaker
Yeah, we're constantly improving and hopefully so is the industry along the way. Yeah. Yeah. A positive note to end on. And I think I'll let you get back to improving the world in your corner of it. Neil Busing, thank you very much for joining us. Thank you. And that's all from Neil for now.
01:01:02
Speaker
Looking at the calendar, it's currently September 2023, Neil and I are both going to be at a conference together in a few weeks time, that's current over in San Jose. So I'm looking forward to seeing him in person and if by some happy coincidence you're going to be there too, do come by and say hi.
01:01:21
Speaker
If you're not going to be there, but you still want to say hi, the internet has your back. My contact details are in the show notes, as always, and now would be a great time to leave a comment or a like or hit subscribe and make sure you catch our next episodes. I think that's all for now. Until next week, I've been your host, Chris Jenkins. This has been Developer Voices with Neil Buzing. Thanks for listening.