Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Why did Redpanda rewrite Apache Kafka? (with Christina Lin) image

Why did Redpanda rewrite Apache Kafka? (with Christina Lin)

Developer Voices
Avatar
940 Plays1 year ago

Would you ever take on a rewrite of one of the largest and most popular Apache projects? And if so, what would you keep the same and what would you change?

This week we’re talking to Christina Lin, who’s part of Redpanda, a company that’s rewriting parts of the Apache Kafka ecosystem in C++, with the aim of getting performance gains that aren’t feasible in Java. It seems like a huge mountain to climb, and a fascinating journey to be on, so let’s ask why and how they’ve taken on this challenge…

Christina on Twitter: https://twitter.com/Christina_wm
Kris on Twitter: https://twitter.com/krisajenkins
Kris on LinkedIn: https://www.linkedin.com/in/krisjenkins/
Redpanda: https://redpanda.com/
Redpanda University: https://university.redpanda.com/
Seestar framework: https://seastar.io/
Apache Flink: https://flink.apache.org/

#redpanda #kafka #apachekafka #streaming #python

Recommended
Transcript

Introduction to Apache Kafka and Red Panda

00:00:00
Speaker
Regular listeners will know I'm a bit of a fan of Apache Kafka. I think it's a great system for storing data, for processing it, and just for shipping it between different parts of an organization at scale. It's great if moving data from A to B is your job, and so often it is. And in software terms, Kafka's a very mature project.
00:00:22
Speaker
It's been around for over a decade. It's got a large stable user base. It's got a large stable committer base who are actively moving it forwards. And it's a backbone piece of infrastructure for a lot of big companies.
00:00:36
Speaker
So in one sense, Kafka's arrived. And I got really intrigued when I heard about a company that took a look at Apache Kafka and said, let's keep the protocols the same, but we'll do a complete rewrite of the implementation in C++. That's a bold move for any company. Why? What's their motivation?
00:00:57
Speaker
What's so important that you want to take on a project of that size and maturity? And what are you actually rewriting? Are you rewriting the entire stack or just parts of it? Are you being completely compatible or are you deciding to make tasteful breaking changes? And most of all, what do you think you can do differently this time around?

Christina Lynn's Journey and Red Panda Motivation

00:01:20
Speaker
Now the whole industry is older and wiser.
00:01:23
Speaker
Lots of questions to ask. So today we're talking to Christina Lynn of Red Panda. Red Panda are that upstart company taking on an Apache staple. And I've seen in the wild, I've seen people using a mixture of Kafka and Red Panda in their tech teams. So clearly Red Panda have added something to the software world. Let's see what they've got to add to the conversation. I'm your host, Chris Jenkins. This is Developer Voices. And today's voice is Christina Lynn.
00:02:07
Speaker
Christina Lynn, thanks for joining us. Hi Chris, how are you? I'm very well. I'm glad to see you here. How are you doing? It's my pleasure to be here. I'm doing great. Great. I've got many questions for you because we share, we kind of share a background. You are currently head of developer relations for RedPanda. Yes.
00:02:30
Speaker
I was heavily into developer relations for a Kafka related company, so two sides of the same wall, right? I love that. I love that. I guess we can have really good conversation then. Yeah. Yeah. So let's start off with the big issue, which is, okay, Red Panda, Kafka, similar space, but why do you even want to be in this space?
00:02:53
Speaker
Yeah, I can kind of talk a little bit about it from my perspective as my career evolved. So I was doing a lot of Java programming back in the days. And then I started a position in an insurance company. And at the time, we're doing a lot of SOA and a lot of work. And when it's SOA, it's a lot of system integrations and services integrations.
00:03:17
Speaker
And that is when I get into... We started using a lot of IBM solutions, but it becomes very clumsy. And that's when I started doing a lot of Apache camo. So that's how I get into the camo space. So I was doing a lot of data integration.
00:03:33
Speaker
And the time I also was working for JBoss and that was why I got into Red Hat doing a lot of JBoss work. Plus I got to do a lot of my very lovely Camel project as well with the Camel crew and all that. So it was great. A lot of system integrations and I kind of see why a lot of the need of
00:03:52
Speaker
having data integrated, and we were using Messaging Queue at the back for SOA integrations and all that. So Messaging Queue was, I was heavily using Messaging Queues from IBM MQ to Active MQs and Web MQs and all that kind of stuff. And that fits in naturally. When Kafka came out, we were like, awesome, this is a lot faster
00:04:15
Speaker
you know, streaming services for us to kind of quickly get data in and out. Like it was able to get a lot of IO throughputs for that. So we love that. And so that's why I kind of work and Camel was kind of introducing Kafka as part of ecosystem.

Comparing Kafka and Red Panda

00:04:30
Speaker
So that's why I got into Kafka and kind of using that as the backbone for system integration to pass through to build that, you know,
00:04:39
Speaker
microservices, backbones, and stuff like that. And then when I was looking for other solutions, and I was kind of looking into Quokkis, and we were kind of introducing, it was kind of at the time where we were using Quokkis, and I saw this really tiny container, because Quokkis was introducing this really tiny container, so people doesn't have to spin up the whole backend services, the developer can quickly run things. And I saw this Red Panda solution, I was like,
00:05:06
Speaker
What is this thing? Why is it? Because at the time Kafka still needs, you know, zookeepers and, you know, all that, you know, brokers and all that. I was like, why do I need, why, why do I don't need all that kind of stuff to actually start developing? So that's why I started looking into Rependa. And that's why I got into this space of, you know, Kafka and Rependa. I kind of, I hope that answers your question, but that's how my, how I learned about the whole data thing. And I really like the data aspect of things. So I'm slowly getting back into the data engineering and learn how data engineers were using
00:05:36
Speaker
those streaming platforms, and I think there's a lot of potentials for future development as well, so that's kind of how I got into that. The shape of a story I've heard a lot, it starts with desperately wanting to ship data between different systems, and then grows, right? And often, as much as we talk about real-time data, it's just sheer the effort of connecting data from A to B, which is the killer use case.
00:06:05
Speaker
From what I know about Red Panda, like you talk about smaller container sizes, that's one of their big pictures, right? Compared to Apache Kafka, Red Panda is kind of going chasing the size performance deployment ease.
00:06:24
Speaker
Yeah, I think when Red Panda first got out, it was very well known for its performance and the size of its container and how much resource it needs to adopt.

Technical Foundation of Red Panda

00:06:34
Speaker
So I think that's why people start to adopt Red Panda is just because of the simplicities and the size of it needs to execute the similar stuff. So that's kind of how I see it and how people first got into it.
00:06:50
Speaker
But it's a curious road they've taken to get there. To look at something like Kafka and say, I'm assuming this is how it originated. You look at something like Kafka and you say, I would like this to be smaller and faster. I'll rewrite the whole thing in C++.
00:07:07
Speaker
Right. I think this question should go directly to our CEO, our founder, because he actually wrote the entire project. But when I asked him about this question, because I think he was very well into the low latency streaming RPC at the time when he was working for
00:07:24
Speaker
and he was doing a lot of low latency streaming development at the time. So for him, I think when he was starting to develop this application, he was facing a lot of unpredictabilities and a lot of the things when they have higher traffic throughputs, things crash.
00:07:42
Speaker
Or when you get a machine that's able to handle the load, it is super expensive. So I think as a developer, he's trying to figure out a way to kind of become more efficient on processing the data and getting data streamed. So that's why I think he started to look into Apache Kafka because he was solving that problem. And then he was able to find a way to kind of develop things. And I think he didn't first start with C++. I'm actually asking, like, why did you choose C++? And he was like, no, actually, I started with Rust.
00:08:11
Speaker
Right. And then I think at the time when he was developing things, the Rust wasn't mature enough. I guess it is now, but at the time it wasn't. So he has to go back to his favorite C++. He was developing a lot of C++ program already. And I think it was one of the framework in C++. I actually solved a lot of problems.
00:08:34
Speaker
In C++, that actually solves a lot of problems for us, right? It's the C star framework. Yeah, C star framework is a very unique one because it allows a lot of the asynchronous programming inside your computer. Because if you think about your computer, your computer consists of CPU and CPU is split into many cores, right?
00:08:58
Speaker
And how a lot of the programs were developed, they were developed in a way that a lot of things were taken.
00:09:11
Speaker
use an operating system to make things a lot more flexible. So it loots a lot of the control over how the CPU works and how the memory works. And I think Kafka was developed in Java. I think Java is a great program. I am a Java developer for two decades now. I love Java and I think it's a great program. But I still think there's a benefit of different tools for different
00:09:38
Speaker
for different purposes. I think in terms of controlling the hardware, C++ is a better language for that because of how you can utilize cores. Instead of relying on all the memory management for JVMs,
00:09:57
Speaker
relying on your operating system to do contact switching for your CPUs, the C-STAR framework allows us to control everything. So we can allocate the amount of memory per core
00:10:13
Speaker
So this core will control this part of a memory. And that fits great with the Kafka nature. Because remember how Kafka works? It has a lot of partitions. And then you're writing logs into each partitions. So basically, we can assign multiple partitions to a single core. And this core will just take care of all the things in this partition. And this will avoid context switching.
00:10:36
Speaker
I think a lot of people have this misunderstanding of what is causing the latencies of writing data into my system, right? A lot of people think the problem was the disk, but it is not anymore because we're not in a very old disk space where we have one single, like, you know, like little penhead we're trying to kind of write data into. Yeah, we're no longer spinning a magnetic disk. Exactly. We're not spinning anymore.
00:11:00
Speaker
We're working with SSDs with MBME architectures, meaning that I can write multiple times into the disk and having all this data allocated in our disk in a very different place because how the file structure works as the XF
00:11:18
Speaker
XFS sorry about my pronunciation is allows you to parallelly grab your data quickly from your disk and then put it back to the memory so in terms of that your disk is no longer the problem your CPU time is the problem and being able to utilize CPU allows us to quickly grab the data and provide it to the users and vice versa doing the same thing to quickly put it back into the storage.
00:11:41
Speaker
And that's why I make Red Panda super fast with that C-Star. And I think Alex saw that, and that's why he implemented everything with the C-Star

Protocol Compatibility and Developer Adoption

00:11:49
Speaker
framework. And that's why it's the secret behind why Red Panda is super fast in terms of performances and lower latencies, low tail latencies, and all that kind of stuff. What kind of year are we talking about for this horse? I think it was three, four years ago, maybe. Really? That recently? Yeah. Okay.
00:12:11
Speaker
Okay, I know some people would be disappointed that you didn't stick with Rust three or four years ago, but... I mean, I think it was just at the time, maybe a little bit earlier. I didn't exactly know when Alex was implementing Rust. Maybe it's a little bit earlier, but yeah, this is from what I know. Okay, fair enough. Then there's the question of like, okay, if you've decided you're going to rewrite Kafka,
00:12:36
Speaker
Do you know why it's decided to stick to the exact protocol? Because you can, as I understand it, you can drop in Red Panda and largely have your writers and readers, your producers and consumers, transparently work with either.
00:12:52
Speaker
Yeah, it's just because I think the number of users we don't want to... Because I think from Ring Zero, which is the broker side, and then you've got the client side, right? So I think when Alex wants to do it, he wants to kind of also support that ecosystem for the Kafka users. So that's why we choose to adapt the Kafka APIs instead of inventing our own.
00:13:15
Speaker
We can provide that faster ring zero experience, but also providing that for the existing Kafka users. So that's something that we try to solve. It's similar. I always put that as an analogy of cars, right? Car is a very good thing where you can hop on it, driving it, and then you can get to the places. But how you build a car is different. So over the times, we have really nice diesel cars, petrol cars, and then now we have electronic cars.
00:13:40
Speaker
I think it's just how you build the cars a little bit different. But the way you drive it, we want the user to have the same experiences and kind of how things are built. But internally, how it's done is it is up to the car manufacturer, right? So similar. Yeah, no matter how they implement it, you've got pretty much the same user interface on the car, right? Exactly, yes. Except I can never find where the heated mirrors are online. Who is in the wrong place?
00:14:04
Speaker
I can see that you want to keep the same protocol to reach the same user space. Right, exactly, yes. Where do you draw the lines around that?
00:14:24
Speaker
Why do Kafka do you want to rewrite in C++? Is it just the broker? Are you targeting Kafka Connect next, for instance? What's the scope of the project?
00:14:37
Speaker
Currently, I think what we're going towards is more on the, again, the Ring Zero experience, right? So, not actually the connector itself, but the actual broker we wanted to kind of make, because Red Panda is a very small company compared to the large community out there for Kafka and even Confluent is a
00:14:57
Speaker
multi-million dollar company where they have a lot of money to throw around, but Red Panda is very small, so we want to make sure that we invest things in the right place. So we want to make sure that we give people the best experience from how we can control. So we focused on the Ring Zero experience on that.
00:15:17
Speaker
We don't expect to expand a lot outside of the connectors. We expect to work with the connectors. We work with the community. For example, FriendsGo, which is the Go library for connecting to Kafka, was developed by one of our engineers. He was donating to the community and showing people how this is done. We were talking to them.
00:15:43
Speaker
enemy of the community. We're trying to be friends because we all work in the same... We're trying to make this space better, right? So giving people more choices. This is what I think we should be and how the community should work all together. So I think we should... So to answer your questions, I think for us, it's more focusing on the broker itself. So things on top of that, I think we added the
00:16:05
Speaker
the Red Panda console, which allows us to have a very nice GUI interface for developers for them to see what's going on. I know there's a lot of projects out there that can do the same thing, but I think it's for us just to make it more smooth, transition from people to see what's going on, and maybe use that on top of even Kafka as well, and all that kind of stuff. That's where we're at right now. I think diplomatically, I would say there's plenty of room for more development in the Kafka user interface.
00:16:35
Speaker
Yeah, you know, like, you know, different opinions, sparks different ideas. And I think that's a good energy to the community. Not just, you know, if you all have people speaking the same thing, that you're not going to have innovations. Yeah, you know, it makes it reminds me, I was at another company talking to some people who worked there last year, and they used Kafka in production. But all their developers locally were using Red Panda.
00:17:06
Speaker
That's the first time I heard about Red Panda, and I want to ask you, why do you think they would have that divide?
00:17:12
Speaker
I think we're seeing the same thing. From the community side, I see the same thing. I see a lot of developers were using Red Panda internally as a development and they have Kafka because their operations staff decided on using Kafka. I think it's just the easiness of how to use Red Panda and how to start it. I think adopting Kafka as a beginner, I don't think it's the most easiest thing. First, you have to understand
00:17:39
Speaker
Java and a lot of the people I talk to right now are Python developers, Go developers, Rust developers.

Advantages of Red Panda's Simplicity

00:17:47
Speaker
For them, Java is intimidating. And then to download the package and then at the time, you still have to start the zookeeper and then you start the broker. It takes for them a lot of
00:18:01
Speaker
They just, it's harder to get around. So for them to start recognize one single binary, they started up and they can start working. I think for them, it's easier. And then the footprint is a lot smaller compared to what is taking up. So I think that's one they want to do even caucus. Well, I think I mentioned that the first time I heard about is caucus. And when they were doing the dev suite, it was when they, the dev suite is a way that for developers to not care about
00:18:29
Speaker
you know, setting up the environment that they need to work on, it would just work for them. And, you know, the dev suites has Rependai in their speaking that is because they're doing, I actually talked to the people that adopted this, it's smaller in footprints, easier kind of to get it started. So I think that's why. And I was working on a project, a test container. I don't know if you heard about test containers, but I think they're...
00:18:55
Speaker
pretty well used in Spring Boot community, right? Because they were using that for testing and local development as well. I've seen their download rate compared to Kafka and Rependa. I think Rependa is pretty good in terms of usage. People can just download it and use that as their internal development platform so they don't have to get all that thing started. So I think that I would contribute to that.
00:19:20
Speaker
I mean, that I got to push you a bit on that because, uh, that, that landscape must be changing a bit where Kafka is moving away from having a separate consensus protocol of zookeeper to it being eternal. Meanwhile, you've got red Panda. I mean, I'm guessing if you want to do something really interesting with it, you end up wanting to bring in Kafka connect or maybe Kafka streams. Do you think that single binary argument is still valid or is it eroding?
00:19:51
Speaker
I think it's so valid, right? I mean, for me, so the developers I talked to doesn't really use the Connect that much. They are using Python to develop and connect to Kafka. So I don't see that happening. And Kafka Streams, Kafka Streams is just an API you put on top of the broker, whatever broker that you're using. So I don't think that's going to be a determined issue for
00:20:21
Speaker
using Kafka at all, right? And Kafka Connect, and since I am a camel developer, a very long time camel developers, I write my own camel components. So for me, that is independent. Whatever I put underneath the hood doesn't really matter. I'm writing the integration on the top. So for me, I don't think that's a big issue for that. And especially now, I'm starting to hear less and less about Kafka streams. I think more people are going to fling.
00:20:49
Speaker
Yeah, there's been a lot of push. For the past two years, it's all about Flink. Every time when I do a Flink course, people just come in and they want to hear about Flink. I think it's the push of Flink that added benefit to us because you're free from embedding a particular solution for that streaming services. I don't think that's the
00:21:15
Speaker
Big problem for us. So from that point of view, presumably you're looking at Flink as a, as a good thing competitively. Yeah. It frees you up from having to compete with like higher level infrastructure stuff. Exactly. And I think, I think, yeah, because, and I think a lot of people were adopting Flink because the SQL nature and then I think KSQL was introduced. It was introduced. It was a good idea, but I don't think it was either. They were problems with KSQL. Right. So, um, I think, you know, Flink kind of solves that extra layers problem.
00:21:45
Speaker
Yeah, I'm going to try and keep my knowledge in this space and my biases out of it, but I wanted KSQL to succeed more than it seems to have done. Let's just say that. I liked it a lot. I would have liked to have seen it continue to a glorious future that I don't think is going to happen now. Exactly.
00:22:05
Speaker
But this this raises the question of how are you seeing people use red panda input i mean a lot of python developers you can tell us about camel when you talk to users what are they doing with all this.
00:22:21
Speaker
Currently, our major customer base are more performance-oriented. That's the beginning. If you look into most of our customer base right now, I'm seeing a lot of users that were having problems with Kafka, basically, with the performances and the number of machines they need to start up and a lot of management problems.
00:22:43
Speaker
And I think it was the past two or three years, I think, building up all that. So they were kind of switching to Rependai. That's the most of the use case I see for now. But now I'm starting to see a lot more since we added the BYOC
00:23:00
Speaker
offering from cloud. So I think we kind of convert a lot of the people that wanted to have a quick cluster running on their own cloud account. We kind of have that. And so I'm starting to see a lot more less experience Kafka. Before it was all very experienced like Kafka users. So we got really hard question to solve like, how do I get like, you know,
00:23:20
Speaker
number of performances, like latency numbers, you know, for less hardware usage and all that. So that was a lot of problems we had to solve. But now we're seeing a lot more, you know, like in the mid journey range of people where they were trying to get into Kafka, but they don't know how to do it. And kind of we help them to do all that kind of stuff. Okay. So you're seeing more people just casually moving over from starting their journey still with Kafka?
00:23:48
Speaker
I think most of them, when they come to us, they would know a little bit about Kafka. Well, I'd see one or two exceptions, but they're very little, right? Compared to the majority of the people I see, they still know Kafka, but they want to experiment with other things. So I think they're just looking for the second opinion and different options for these type of features.

WebAssembly and Data Transformation

00:24:08
Speaker
My loyalty here is to the ideas and the architectures rather than specific implementations, so I've got no skin in the game on that one. When you said you have a Go client, this is another thing I wanted to look into. You said people are intimidated by Java, but this is C++. I can't believe the general programmer is less or more intimidated by C++ than Java.
00:24:37
Speaker
You've got Go as a main client. How are people interacting with Red Panda language-wise and how are you supporting that? What development are you doing in that world?
00:24:49
Speaker
For us, I think that Friends Go project was kind of donated as an open source project. So anybody wants to use that, even connecting it to Kafka, we're happy to do so. So that's something that we do. But our main focus are still in the broker space where we're developing a lot of auto rebalancing and leadership rebalancing and all that kind of stuff.
00:25:16
Speaker
We are also working on something called Wasm. I don't know if you heard about our Wasm project. WebAssembly, yeah. Our WebAssembly project is something that we're working on. Because we've seen this thing where people are building very simple data pipelines where they're only doing very simple transformation stuff or validations or very simple
00:25:39
Speaker
Data conversion, masking, and it needs to have a lot of data ping pong because you need to get it out from a socket and then put it back in. So this is something that people do. And we were thinking if we can have this in the broker, let the broker do all the processing. So before it reads out from the memory,
00:25:59
Speaker
does this thing and then put it back in the memory. So there's no huge round trip between the networks and all that. That would make things a lot faster. So can we have this built into the broker without having external things going on? And the idea of bringing WebAssembly was because WebAssembly is a very flexible engine. We can use that to compile
00:26:24
Speaker
different type of languages like Rust with Go and with Java with Python. So developers can freely choose whatever language they want to use for this very simple transformation pipelines and put that in a broker. So I think externally the Flink services will be outside of your broker, which is doing a lot of traditional complex event processing.
00:26:51
Speaker
and a lot of time window-based processing. Internally, with these very simple, stateless transformations, everything can be done in the broker level, so you don't have to do a lot of data transfer in the networks. And we see a lot of networking costs occurred for people using a lot of petitions. You can see replication all around. So if we can eliminate that, that would help a lot.
00:27:16
Speaker
I could see that for like, this is just splitting a comma separated string into a list. You would want to do that in a more lightweight way than flank, right? Yes, exactly. Yeah. I can see that makes sense. Are you in any way enforcing a boundary for what you consider simple or complex there?
00:27:36
Speaker
Yes, so we think that if it's something that you need to kind of hold its date, that is a complex one. So for things that you don't have to keep its date, then it's a simple stuff, right? Because the reason because is when we deploy the data pipeline into the broker, there are several brokers in your cluster. So Rependan is to kind of copy all your pipelines across the broker in order to get all things done inside its own.
00:28:05
Speaker
little machine, right? So keeping the state of where the pipeline is, it's going to be a very huge workload. And you kind of have to know all the status, like all the other worker status. So we don't want to do that. We want to make sure everything is simple and easy. So if it's stateless, it's simple than it is. But again, the reason we're doing that is because there's a lot of people that are not utilizing their entire hardware.
00:28:32
Speaker
But if you already have a very busy broker that's taking up 100% of your CPU time, it is probably not a good idea. So you probably need a couple of different

User Autonomy and Security in the Cloud

00:28:40
Speaker
nodes for that. So it all depending on situation for situations, but I think it's going to eliminate a lot of network costs in late in season and all that. Yeah, I can see provided it's contained, that's going to... You always want to separate storage and compute, but once you've separated them, you want them to be really close.
00:29:02
Speaker
Exactly, right? We'd be sitting right next to each other, but still separate. Exactly. You want the fastest time, drop time of that. OK. I could happily dive into the guts of how that works in implementing the Wasm runtime engine. It's still a work in progress for us right now. We're still developing it. Currently, we got the Go engine running for Wasm, but we're still working on the Python and Java part of the
00:29:31
Speaker
menu python because majority of our users are python users but then we still have java users we want to do but i think that is the something that we're working on simply because the way that they handle memory is a little bit free up compare freely compared so how we write the assembly engine to make it work for internally with the c++ with our
00:29:54
Speaker
thread per core architecture, we need time to actually get that thing worked more efficiently. We can just plug that in, but that is going to cause performance issues. We don't want to do that, so we want to make sure that we can manage all the memory usage and how we treat all the core, how it's used inside your computer. So that's something that we're working on.
00:30:17
Speaker
you've got that classic sandboxing problem when you let users execute arbitrary code on your machine, you've got arbitrary problems, right? Yes, that's right. So that's something that we're working on. And we're also working on other things like, you know, also not just the
00:30:34
Speaker
where the broker part, but also the cloud services part. We're trying to launch our serverless solutions, which is something for developers so they can spin up a topic where if they wanted to do that, to actually get more exposure to developers, because most of them were just running locally, but we wanted to have that presence in the cloud as well.
00:31:01
Speaker
other than BYOC. I think BYOC does better for us, but I think we want to include it in that footprint as well. That connects to something else I saw about RedPanda, because if you're doing bring your own cloud, I want to know if I bring my own cloud, what you offer on that, and also there's something about you using native object storage in RedPanda, which I think is related here I wanted to ask you about.
00:31:25
Speaker
Okay. Yeah, that'll be good. Because I was actually, I talked about this in current, this year, I mean,

Enhancing Developer Experience

00:31:33
Speaker
current PYOC, and actually some of the people that was managing the Confluent card came to me and said, you know, this is an awesome idea. And they were the back-end engineers. And they were like, we should do that. I was like, no, don't.
00:31:47
Speaker
But yeah, but I think it was the way that we did it is we don't want it to be direct management. We don't want like it's a user or from Red Panda to go directly into the cloud because it's not safe.
00:32:01
Speaker
And we don't want it to be very manual in order to, because that's going to cost us a lot. So we want it to be automated, safe. So the way we did it is we had an agent, which is a very small VM that gets deployed into the customer's cloud. So this enclosure is customer cloud. We don't have access to it. The customer needs to install the agent in their cloud.
00:32:24
Speaker
and we'll be controlling everything from outside. But we don't go directly into the cloud. We have the agent pull requests from our controller plane and then does things on the cloud. So everything is isolated. And with this particular data plan,
00:32:41
Speaker
Even when things go wrong with the control planes or whatever we want to do with Red Panda, your things still stay intact because you actually have your own little ecosystem of your own cluster. You're working with the cluster, you're working with everything. And then the controller print is just kind of doing all the updates and all the patchings, all that kind of stuff for you. So basically, we just issue a bunch of commands and an agent just pulls it and just does that on your own cloud and just kind of fix things for you there.
00:33:08
Speaker
Okay, so something goes wrong with that link. You can't change anything, but nothing stops running. Yeah, everything is running. Yeah, I can totally see that argument. Yeah. And I think for people, what they like about it most is working inside their VPC because that VPC is a cost, right? For the traffic goes in and out. So I think what people like about it is like, oh, I can put that in my VPC. Yeah, that makes sense.
00:33:36
Speaker
Again, there seems to be a kind of cultural focus on the developer experience side. Is that something you're deliberately focusing on as kind of a competitive advantage, or is it just when you're smaller, that's easier to do?
00:33:56
Speaker
I think we put a lot of effort in terms of user experiences. There's a user experience group internally inside our company where we think about user experience a lot. When we push out a new feature, we want to think about how easy is that for
00:34:13
Speaker
From a user's perspective, how can I make that easy for developers? We constantly make changes. I think from what you're saying, if the company is smaller, it's easier. Yes, of course, it's always easier because you don't have to go through 300 meetings in order to get one feature change. So definitely, yes, that's going to contribute to it. But I think it's the way that we design our product. I think it's kind of make it.
00:34:35
Speaker
So for instance, the data rebalancing that can be done automatically. So Red Panda will automatically detect what's going on inside each partitions. If it gets too busy within OneNote, it will shift things around automatically for you without you knowing. Tell me more about that. I didn't know about that.
00:34:54
Speaker
So yeah, of course. So we do like automatic data rebalancing, because the way that we do things, it's very different. So Kafka claims that they do K-RAF. But K-RAF only applies to the controller side of the story. So where it used to do things in Zookeeper, they move it to K-RAF. But underneath the hood, K-RAF is doing whatever Zookeeper was doing. And then you still got your petition, your ISRs, and all that doing all the replications and all that.
00:35:24
Speaker
But red panda underneath the hood itself, we don't have an actual controller of things. All the petitions forms a raft ring. So you can see a lot of rings inside red panda. So there's no one single bottleneck where if it breaks, everything breaks. So it doesn't have that bottleneck. So all the petitions, elastics on leader, and then just figure things out. So that's how independent each one of the petition works.
00:35:52
Speaker
So in terms of that, so we get a better failover rate and all that kind of stuff. Other than that, I think we also have a lot better communications between each node. So we prioritize each partition. Like, hey, if your partition, in this node, has too much partition, we'll automatically figure out which one is the less busier node, and we'll rebalance everything.
00:36:21
Speaker
Right. Dynamically. And is that transparent to the user? I don't see that on the consumer producer side. You don't see that. Everything is done internally within the broker itself, and we'll just tell the consumer, hey, the leader has changed, and this is where you're going to
00:36:38
Speaker
get it and all that kind of stuff. It's very similar to think. But I think Kafka is a little unique in that terms because Kafka client itself is very smart. Kafka client itself does a lot of determining where do I go and how sticky am I am and kind of figure it out. The broker itself is dumber compared to the messaging queue we used to have. It just does all the
00:37:01
Speaker
I know it's doing a lot, but it's just doing the replication and all that. I think a lot of it becomes the job of the client. The client needs to know a lot in order to speak to that. We have to also obey that protocol from the client so we can fit into
00:37:16
Speaker
all different versions of that. Of course, we have a limit, right? So some of the less efficient consumers, you know, algorithm will probably just not do that anymore. We'll just say we only support to this versions of client and it's good for the users as well. It's because it's more efficient. So we'll do that. So we'll kind of work with that client code in order to get better balancing for both consumers, leaders and data petitioning.
00:37:39
Speaker
Okay. Surely that's going to, once you switch the partition leader at some point, that's got to trigger consumer group rebalance that the client does see though. Yes. Yes. So we'll try to ask that. The cause and the management of it is automatic, but the client, the client will, the programmer doesn't have to change anything, but the client will do some work in cooperation with that rebalance.
00:38:06
Speaker
Yes, yes. And then we kind of honor that, right? So like the sticky corporate way of doing things, we prefer to kind of stick to where it was before. So there's less movement and all that would do the same. So that's why we have priorities, right? So we'll shift the one that has less consumers and kind of see which one is... We prioritize them in the... I can kind of send you a link of like how we prioritize it, but there's a lot of...
00:38:33
Speaker
Logics you need to kind of figure out like you know the cpu the bandwidth and then how many consumers and i call that and then how fast is like how. And then also i'll send all that and it becomes a determined factor and then we can kind of balance everything. Do you do things like watch how long that problem last before you do something about it.
00:38:54
Speaker
Yes, there's a timeout thing as well. It's interesting. That makes me think. So the way consumer group rebalancing used to work was stop the world, rebalance everyone, start again. And it wasn't too long ago that they changed that to only if you're being reassigned, you have to stop your work and give up and come back.
00:39:17
Speaker
Which i'm assuming you do but i raised the question which version of the kafka protocol you actually following and how far back do you go.
00:39:25
Speaker
Yeah, currently, I can't really tell. I don't have a vivid number, like where we go, but I think we actually try to match the most current one. Actually, we bring up some of, I think, I've heard that from one of our engineers. He actually started talking to the client community, talking about ideas of how can it be more efficient from the client side in order to have that communication. So he actually does work with the
00:39:54
Speaker
the client groups to actually get that thing working as well. So yeah. So you're tracking the latest version of the Kafka protocol, are you saying? Yes, we did. Yes. You must have a fairly large team just dedicated to doing that. That's enough work in itself, right? I don't know exactly how many people were there, but I know there's not too many. Okay. I mean, how large is the company? Just give me an idea from that size.
00:40:24
Speaker
Um, I think we're about 200 ish. Okay. Yeah. In Silicon Valley. I don't know how many exactly. Cause cause we're still growing and we're still hiring and all that. So I don't think we have a lot of people

Serverless Solutions and Advanced Features

00:40:41
Speaker
right now. Okay. So if you're growing, where are you going with this? You've mentioned Grasm.
00:40:47
Speaker
So yes, Wasm and then Serverless, of course, in order to support all the customers. So we also have Managed Kafka, of course, that's already there. Managed Rependaa services, that's already there. VYOC and then we have now doing Serverless and then Wasm. And then we're also working on this. So we already have tier storage, which is, I think Kafka doesn't have it, but Confluent already has it, which is kind of offload
00:41:15
Speaker
the non-retent data offloaded to the object store, right? So you don't have to keep the hot stuff in SSD and the rest you push to S3 or whatever. Yeah, exactly. Yes. So we do that. And then we also added something called remote replica, which is you can rehydrate your
00:41:36
Speaker
entire different cluster so you have your operational cluster here i think a lot of the engineers love that you know that implementation was they can still have the operational side of the house of working with the current clusters but when they're trying to do you know,
00:41:53
Speaker
backfill all the jobs that they need to run, say they have to kind of go back to three days ago, four days ago to run a long process, a large streaming data. What they can do is they can have a separate cluster, which is rehydrated with historical data, and they can kind of use that as the part where they can kind of use that to run their process and their jobs there. So it's called re-remote replica, which is kind of reusing that S3 object store that we have to store historical data as well.
00:42:23
Speaker
Oh, right. So it's not quite the same as something like cluster linking. You join one cluster to another, it's more like reading the same object storage for a different cluster.
00:42:34
Speaker
Exactly. Yeah. So that's interesting. For data, I think data engineers use that a lot more. For example, for the microservices user, this is a lot less because I think for them, this is just a traffic control centers where they can relay their messages. But for backend data engineers, they need a lot of historical data that wasn't stored somewhere or that wasn't stored into databases or that is in data lake, but it's actually easier to kind of see the
00:43:02
Speaker
See the flow of how the data was coming in the logic of data and like the historical like time stamps of it that they like to see that that way so they'll process it from there. I'm curious about that why would you not just connect to the create new consumer going further back on the existing cluster.
00:43:20
Speaker
Because that's going to sacrifice the performance of your current running brokers. You can also do that, but then you're going to rehydrate a lot of the data from your historical data into your current broker, so you're sacrificing part of your broker's performances.

Use Cases and Team Adoption

00:43:34
Speaker
Once you're in the tiered storage world, you're going to mess up that hot cash section.
00:43:40
Speaker
Yeah, exactly. And it's going to separate. I see why that would be an issue. What I see in the company right now is a split between the operational side of the house and then the analytics side of the house. I came from the operational side, transactional side, where everybody's doing microservices. Everything's containerized and we've got micro-ego Kafka.
00:44:03
Speaker
underneath the hood and all that. But then at the same time, I see the new side of the house were analytics, where they were just picking up the streaming side of the story. They were starting to learn Kafka. And they used to use a lot of Spark. They used a lot of MapReduce, that kind of thing. And they're starting to get the data. They're starting to get real-time data. Because now for machine learning, they need to produce a lot of data sets. And the data sets are the way the data
00:44:33
Speaker
Kind of random, I don't know, depends on how they implement it, but sometimes they have to go back, way back, for a lot of data, and they need that. And also they want to have real-time machine learning. I think a lot of them were adopting that real-time machine learning, so they were trying to get that thing hooked up. And I think from this side, there's a huge gap compared to where operational people were so confident. They know how to use the Kafka thing already, but here, I think they're still doing that.
00:45:02
Speaker
Yeah, you kind of want to isolate yourself when you're massively experimenting, right? Right, exactly. Yeah, I can see that. Okay, and how about you? Where, as head of DevRel for Red Panda, where are you taking things into the future personally? So I'm still, so I, you know, when you say DevRel for a startup company, I think most of the startup company doesn't know what DevRel is.
00:45:29
Speaker
So I think I'm doing a lot of technical marketing instead of doing a lot of the real stuff because there's just this huge gap of things that needs to be filled in order to do a lot of
00:45:44
Speaker
before I can go off and start speaking. Because once I do that, nobody's creating content. And you actually need people creating technical content that speaks to the developers and the people that knows the technical stuff instead of really fluffy marketing messages. So I'm doing a lot more of that. And I want to grow the educational side of the story because I feel like there's not enough educational content for Red Panda, for instance, teaching people how
00:46:11
Speaker
typical works and teaching people how, how does that, how does that, what does that mean in order to maintain a repana cluster? And I wanted to kind of make that easier for, and I think there's not a lot of enough course. There's a lot of course for Java developers already for using Kafka, but not enough for other languages. So I want to like build on top of that. So maybe build a better, like, you know, educational content for, for my users as well.
00:46:40
Speaker
Yeah, I can totally say that and agree with it, but that spreads you very thin. Yes, we do have two people, like in our team, but we still need a lot of things to do. Yeah, there's always more education to do. Okay, final question, and this is really like, two final questions. If I want to go and use it, there are two things I need to know. The first is, what's the license for Red Panda?
00:47:06
Speaker
It's actually, you can use it if you don't use it production. What is it? I forgot the name. BSL?
00:47:21
Speaker
BDL, BSL, I forgot. I forget which. I can always forgot about the terms. But I think it is free for use if you use it for development. And if you're using it for production, that's when you kind of need it to reach out to us. And there are community features you can use freely in production. There are just a few enterprise-paid features that you need to reach out to us in order to kind of get it started. So for instance, the auto-balancing part, where I mentioned,
00:47:51
Speaker
And then the tier storage, the remote replica, which is the part you kind of have to work with the license rependa cluster.

Getting Started with Red Panda

00:48:00
Speaker
Okay. And if I want to get started, what do I do next? Well, what you need to do is go to our website and then there's a getting started with rependa, that's the easiest path. And then we also have a university that teaches you how to get started with rependa. Oh, very nice. You get a plaque.
00:48:21
Speaker
Yes. You get a plan for the certification. Well, Dean Lin, I will leave you to go off and write more courses at that point. Thank you very much for talking to us. Thank you.
00:48:33
Speaker
Christina, thank you very much. I'm going to go and check out Red Panda. I'm going to kick the tires and see how similar it is in practice. And one thing I'm definitely going to check out is the GUI. I hear whispers on the grapevine that that's how certain people from the Kafka world have got hooked in. So let's see what it's like.
00:48:51
Speaker
If you want to check it out, as always, I'll put links in the show notes. If you've enjoyed this episode, as always, please do take the time to like it and share it and rate it and whatever, and click subscribe if you haven't already to catch next week's episode. But until next week's episode, I've been your host, Chris Jenkins. This has been Developer Voices with Christina Lynn. Thanks for listening.