Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
DIY Consensus: Crafting Your Own Distributed Code (with Benjamin Bengfort) image

DIY Consensus: Crafting Your Own Distributed Code (with Benjamin Bengfort)

Developer Voices
Avatar
1k Plays1 year ago

How do distributed systems work? If you’ve got a database spread over three servers, how do they elect a leader? How does that change when we spread those machines out across data centers, situated around the globe? Do we even need to understand how it works, or can we relegate those problems to an off the shelf tool like Zookeeper?

Joining me this week is Distributed Systems Doctor—Benjamin Bengfort—for a deep dive into consensus algorithms. We start off by discussing how much of “the clustering problem” is your problem, and how much can be handled by a library. We go through many of the constraints and tradeoffs that you need to understand either way. And we eventually reach Benjamin’s surprising message - maybe the time is ripe to roll your own. Should we be writing our own bespoke Raft implementations? And if so, how hard would that be? What guidance can he offer us? 

Somewhere in the recording of this episode, I decided I want to sit down and try to implement a leader election protocol. Maybe you will too. And if not, you’ll at least have a better appreciation for what it takes. Distributed systems used to be rocket science, but they’re becoming deployment as usual. This episode should help us all to keep up!

--

KubeCon talk on the FCD bug: https://kccncna2022.sched.com/event/182N9/lessons-learned-from-etcd-the-data-inconsistency-issues-marek-siarkowicz-google-benjamin-wang-vmware
The Raft paper by Diego Ongaro and John Ousterhout: https://raft.github.io/raft.pdf
The EPaxos Algorithm: https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf
LevelDB: https://github.com/google/leveldb

Benjamin on Twitter: https://twitter.com/bbengfort
Benjamin on LinkedIn: https://www.linkedin.com/in/bbengfort
Benjamin on GitHub: https://github.com/bbengfort
Rotational Labs: https://rotational.io (check out the blog!)
Kris on Twitter: https://twitter.com/krisajenkins
Kris on LinkedIn: https://www.linkedin.com/in/krisjenkins/

Recommended
Transcript

Introduction to Distributed Systems with Benjamin Bengfort

00:00:00
Speaker
On Developer Voices today, we're talking about distributed systems. Clusters of computers working together. When your software spans more than one machine, or sometimes even more than one country, how do you get it to behave in a safe, sane, rational way? How much of that problem should you push out to the database? How much of it is in the hands of the application developer?
00:00:24
Speaker
To discuss it all, I'm joined by Benjamin Bengfort, who has a really interesting career that we barely had time to touch on. He's worked on self-driving vehicles back in the early days. He's worked on detecting bombs via their 3G signals for naval intelligence. But in his 30s, he went back to school to get a doctorate in globally distributed systems. And that's what we really talk about in this podcast.
00:00:50
Speaker
I

Exploring Custom Distributed Consensus Mechanisms

00:00:51
Speaker
started out by asking him, tongue slightly in cheek, surely we just use zookeeper and that's the end of the problem. But by the end of our conversation, he has me completely convinced the most fun thing I could do next is write my own distributed consensus mechanism. So we seemed a bit like encryption, you know, one of those problems where maybe
00:01:14
Speaker
But Benjamin has persuaded me the time is ripe to dig into distributed computing, bypass the off-the-shelf stuff, and build some software that's genuinely tailored to the problem that you're dealing with. I'm inspired. I'm inspired to do some hacking, and I hope some of you will be too.
00:01:30
Speaker
So, let's do it. Let's go from here in the UK to there in the US, spanning the globe to talk about software that can span the globe. I'm your host, Chris Jenkins. This is Developer Voices, and today's voice is Benjamin Benfort.
00:01:58
Speaker
I'm joined today by Benjamin Bengford. Benjamin, how are you today? I'm doing well. Thank you. Glad to be here. I'm glad you're here because you're going to take us to school, which is going to be fun. You're our Developer Voices resident distributed systems expert. Oh, that's such a high praise. It comes with a badge and little blazer lapel pin.
00:02:22
Speaker
that kind of thing. So I'm going to start with the deliberately challenging question, if I may. I think we can probably all agree that there are times when you need to grow beyond writing to one machine, right? So distributed systems are kind of inevitable, past a certain size. But why don't we just, isn't the answer to distributed systems install Zookeeper and forget about it because it's now someone else's problem?
00:02:52
Speaker
And

Zookeeper and Database Reliance in Distributed Systems

00:02:53
Speaker
that actually is the way that many, many, many distributed systems are built. I think that that's a fairly common architecture. Zookeeper, etc. plus some distributed storage system like Ceph underlying it. And then what you end up with is a single node system that can be placed by Zookeeper, right?
00:03:18
Speaker
And so you've turned your distributed system into maybe a sharded or sort of multi-part singular system and now you can reason about the consistency on that shard. But the problem is, is that you've lost some of the gains that you've made by distributing the system in the first place.
00:03:39
Speaker
Right? So if the why of distributed systems is we want things like fault tolerance, right? So we want to survive failures. We want to have high availability, which
00:03:51
Speaker
is a really loaded term. But for me, the high availability is you might have people in North America, you might have people in Europe, you might have people in South America. You want the servers that they're accessing close to them, right? So that they can get access to their data. It's kind of a different definition of high availability, right? But you want that access close by. Yeah, the definition I always want to go with high availability is if something explodes at 3 AM, do I get woken at 3 AM or can I roll in at 9 AM and fix it?
00:04:21
Speaker
Yeah, and actually, that's like the engineer's perspective or the academic perspective. But when you're building applications, you're thinking about your user. So maybe that's the problem, right? That the disconnect between just using Zookeeper is that you have people who are thinking about the systems. And then the people who are using those systems are thinking about the users. And there's a disconnect. Right? Yeah. Yeah. So maybe we do need to delve a bit beyond just solving it off the shelf. Absolutely. User focused distributed systems.
00:04:52
Speaker
I like that. Yeah. That could be the title of the next book you write.

The Evolution of Consistency in Distributed Systems

00:04:58
Speaker
Okay, so I know you have this theory about the history of distributed systems as being like a pendulum between weak and strong consistency. I think you put it.
00:05:10
Speaker
Yeah, I mean, I think that's true for a lot of technologies, right, especially when you have trade offs that you have to deal with in technology. And I think the hope is that the pendulum will swing from one extreme to the other and then swing back but not as far and then swing back but not as far. And then hopefully, by doing this sort of walk back and forth, we find some, you know, trade off middle ground that's optimal in the generic case.
00:05:39
Speaker
Consistency is no different. And I think studying history is incredibly important, especially with the messaging that happens around these types of systems where they're sort of said, like, these are new, brand new ideas. And then you have someone else who comes in and says, no, these are not brand new ideas. But yeah, what's really happening is, you start off with distributed systems, you want to scale.
00:06:04
Speaker
So you make weak consistency guarantees in order to scale. And I'll start at Bigtable maybe, sort of early 2000s with this talk, but this also happened in the relational database community. So as far back as the 70s and 60s, you're talking about consistency and isolation. You have a whole sequence of events that's going on in the relational database realm and then restarted again and maybe the
00:06:33
Speaker
NoSQL realm, I don't know if I want to call it NoSQL, but in sort of the new version of distributed systems that happened after the, you know, the dotcom bubble popped. And there was all this dark fiber and available serverware that you could, you know, the Google and sort of the big tech guys took real big advantage of. Yeah, when we reached the point where having lots of machines was just
00:06:57
Speaker
The cost of it was easy, and the software logistics of it were hard. Exactly. And the hardware changes also are a big part of the story. Things are getting faster, cheaper. You can have bigger clusters, virtualization. There's a whole other side story to this. Yeah. There was definitely a time in our history when people didn't think about distributed systems, because being able to afford two computers was beyond the realms of imagination.
00:07:26
Speaker
Yeah, and that's how the data relational database community ended up right they ended up in like a vertical scaling model, right? Yeah, because, you know, horizontal scaling was supercomputing, you know, technology and sort of the late 80s, early 90s.
00:07:41
Speaker
Yeah. But once Google has Bigtable, you know, sort of a weaker consistency model and it works for them, right? It works for their advertising portal, but they say, hey, like, let's generalize this and give it to other people to use. Other people start to use it for their applications, but they can't reason about the consistency semantics in Bigtable.
00:07:59
Speaker
So now you go, you swing back the other direction to stronger consistency models. And, you know, at the time, I think it was, you know, data warehousing, things like Vertica, Hadoop with a very strong consistency semantic with HDFS.
00:08:17
Speaker
But now our systems are too slow, right? So you swing back, right? And now we're in dynamo land and thinking about dynamo and S3 and more eventually consistent models.
00:08:32
Speaker
But again, the people who are building applications on top of these things, they don't know or understand distributed systems because that's not their job. Their job is to build a web application. And so the consistency semantics are creating bugs or problems that are very costly for them. And so you swing back into things like Spanner. And so it keeps swinging back and forth, this consistency pendulum.
00:08:56
Speaker
And right now, where I'd say that we're at is we're in the new SQL phase. I think it's what it's called. New SQL, where everyone is trying to do distributed SQL transactions. So give application developers a consistency semantic that they understand, SQL and isolation levels, but do it in a distributed fashion.
00:09:23
Speaker
This is the rise of things like, yeah, all these different databases that aren't relational databases, but want you to access them with SQL. Yeah, and like Postgres drivers. Yeah. Don't look under the covers, because honestly, it's a lot more complicated than you want to think about. Exactly. And the consistency problems do pop up. You know, Aurora is notorious.
00:09:47
Speaker
Sometimes that's the worst thing when it behaves mostly like a consistent system and then there are edge cases where suddenly you need a degree to understand what's going wrong. But maybe we should step back a bit and say, because you've said weak consistency and strong consistency, maybe we should define those to know where we are and then move forward from that.

Consistency Spectrum: Cassandra Example

00:10:10
Speaker
Yeah. So first, what I want to say is I personally believe that consistency is a continuous spectrum.
00:10:17
Speaker
between weak and strong, and that's an academic opinion of mine. There's an academic debate about whether that's true or not.
00:10:27
Speaker
I point to two things to talk about that. First, Cassandra will actually on a per query basis, let you choose how many replicas are involved in the read and write query. One of the ways that you might think of consistency as a spectrum is that if you have a 30 node Cassandra cluster, which ridiculous, but if you did and you chose one replica for your read or your write, that's the weakest possible consistency that you can get.
00:10:54
Speaker
Whereas if you choose 30 replicas in that cluster, you now have the strongest possible consistency that you can get. And then any number of read-write replica sets in between, you get this sort of varying consistency over time.
00:11:10
Speaker
So is this what we're saying, that in a distributed system, you write some data, inevitably the data ends up first on one machine? Inevitably, yeah. And then it's spread across the cluster, and by the time you come to read it, are you going to get the same answer regardless of who you talk to or not?
00:11:28
Speaker
I mean, I think that's a good definition of consistency. So even before we get into like, what is weak and what is strong consistency, maybe we should talk about consistency. Folks who are used to databases probably recognize the C and acid as consistency. But in distributed systems, it's actually C and I, right? It's isolation and consistency. So it's not just data consistency. It's also
00:11:54
Speaker
what is the experience of one user querying the database from an external perspective. So broadly, what you might say is given a system of more than one replica and just treat it as a black box. If you give reads and writes to the system,
00:12:15
Speaker
are you always going to get the same answer from two different concurrent clients, right? That's what consistency is. And so in a weak consistency, like if you have two replicas, Replica A, Replica B, and Client 1 connects to Replica A and writes to X, and Client 2 connects to Replica B and reads from X,
00:12:36
Speaker
And then client A reads from replica A, they're going to get two different answers for what x is, unless replica A and B coordinate somehow to ensure that that right is the same thing that they see. Yeah. And that does kind of translate to isolation. You know, we talked about read committed, you know, snapshot isolation, that kind of stuff. You read your own rights, that kind of stuff.
00:12:59
Speaker
Exactly. That's more about transactions, but it's also about the data consistency model. Have we violated any invariance? If X is a number, should X always be greater than zero in the classic accounting ATM transaction case that a lot of people like to talk about in school?
00:13:19
Speaker
Or, if it's a unique value, what happens if one person writes to X on A and one person writes a different X to B, and it's supposed to be unique, one of them is supposed to get an error back. This wasn't unique. But if A and B don't know about X, then they're both going to succeed, and you violated the C in acid in that case. You violated the uniqueness constraint.
00:13:42
Speaker
Plenty of systems, even like modern systems, say the way to solve this is to have concurrent readers but a single writer.
00:13:51
Speaker
That is one way to do it, certainly. This is the high availability Postgres model, where you have one read-write Postgres replica, and then you have a lot of standby replicas. It's now the terminology, and you can have standby replicas that can take over if the first one fails. That's mostly fault tolerance. Or you can have standby read replicas,
00:14:18
Speaker
so that you can have that sort of high availability, you can read from those replicas. But you do introduce an inconsistency in the form of stale reads in that case, right? So if replica B is a read-only replica and it's behind the read-write replica, you can read something that's not current, right? You can read something that's untrue.
00:14:41
Speaker
Yeah. So you end up the main server is in San Francisco. You're reading off a replica in Australia because that's where you are is delayed. But then when you go to post a new transaction that has to go to San Francisco because that's where the writer is. It is. And you could end up reading your order history and the order you've just put in hasn't made it around the round trip by the time you do that. Exactly.
00:15:09
Speaker
And it introduces the transaction problem again, because if you want to read then write, which is a very common operation, you can't read from one replica and then write to the other. You have to do your read then write on the primary. Yeah. And then you push that into developer space to make sure they're handling those semantics.
00:15:30
Speaker
And so now your Australian user has a worse experience than your San Francisco user. So what was the point of distributing the system in the first place? It's faster, but it's also more wrong. That's true. Actually, I say that facetiously, but is that ever the right trade off? Faster but wrong?
00:15:51
Speaker
Yeah, absolutely. I do believe that's a good trade-off. I think that a lot of people want to say, strong consistency is the thing that we want. We haven't even talked about weak versus strong. But there was a reason that Bigtable was successful. There's a reason that Dynamo and other eventually consistent systems are successful.
00:16:12
Speaker
And that's when collisions are rare, right? When it's a rare that two clients are going to be working on Object X at exactly the same time.
00:16:24
Speaker
And when collisions are rare, a weaker consistency system does improve the performance, availability and fault tolerance of the overall system. And you just have to somehow deal with those collisions downstream. And if you can do that at the application level, then this weaker consistency system behaves better overall.
00:16:50
Speaker
But what that means is that the application developer has to know what the semantics are of that weaker consistency model so that the application can deal with it. And that's where the gap is. That always seems tricky because it always seems like when we buy into a distributed system with those semantics, we start off trying to follow the rules. And six months into developing the project, we start to behave like everything works the way it does in development.
00:17:19
Speaker
Right. Because everything is always completely consistent in development, right? Because the network is so small and fast. Yes. Or you're just developing on a single node, which is even worse. Yeah. I think one of the issues is that in the case that I just talked about where collisions are rare, they're even rarer in development because you're not pushing the system as hard as you would be in production. And then even in production, they're hard to notice.
00:17:48
Speaker
If we talked about this client A reading X and client B reading X, client A and B have to talk to each other in order to figure out that something went wrong. They can't just sit there and be like, oh, this is a consistency problem.
00:18:03
Speaker
You know what I mean? It's really hard to observe consistency problems, generally speaking. Yeah. It normally ends up that you just get weird behavior that the users are complaining about, and then you realize. And that's a horribly far down the pipeline time to find out about these problems. It is.

Debugging Consistency in Distributed Systems: etcd Case Study

00:18:20
Speaker
And distributed debugging of consistency problems is horrendous.
00:18:26
Speaker
etcd recently had a massive bug that was a consistency related bug. And the way it manifested itself was it kept overwriting decisions that it already made. And it kept that, you know, basically these in these very few very rare cases, etcd would fill up its storage volume and then like blow up.
00:18:47
Speaker
And yeah, I mean, there's a great KubeCon talk about this from last year's KubeCon 2022, KubeCon in Detroit. So if you're interested in that bug in that CD, I do recommend that video. I'll link to that in the show notes. Yeah, absolutely. I'll have something linked to you.
00:19:05
Speaker
Anyway, this problem is happening and everyone is trying to debug it and no one can figure out what's going on. Is it on the application side? It's clearly an NCD problem. NCD is running Raft. Everyone's looking at Raft. Did you do something wrong with Raft? Is it the strong consistency system? Is it something wrong there? Is there a bug?
00:19:28
Speaker
with the invariance is raft not safe, but it actually turned out to be the consistency of the level DB transaction that's under raft when it's writing to disk, right? They were actually opening two concurrent transactions to write to level DB and not closing one of the transactions. And so one transaction would beat the other transaction and that's what was causing this failure.
00:19:53
Speaker
So, you know, but, you know, back to the sort of original point was just like, you know, the weird thing that we saw was etcd filling up its storage volume. But how do you diagnose that, right? Like, how do you fit in? It's so rare, like, it only happened in these few cases, like, tons of people are running etcd successfully in production without this, you know,
00:20:15
Speaker
very edge case, you know, very low probability occurrence happening. And it's very difficult to fix. And if you watch the video, the story of how they found it and fixed it is is months worth of effort and heartache. Yeah. So it's important to get it right when you start.
00:20:33
Speaker
as ever. But you can't just say that without giving me some guidance and hope. That's true. That's true. Because the thing that makes this really hard is I think our brains are hardwired for linear time.
00:20:49
Speaker
Well, that is extremely philosophical. And you're right, we think in instances of time. And when you asked me to describe weak versus strong consistency, before I even defined those things, I started in with consistency is a spectrum.
00:21:06
Speaker
But we as developers, we still think about these positions on the consistency spectrum, right? There's weak consistency, there's eventual consistency, there's causal consistency, sequential consistency, and then linearizability. And I agree that we think in linear time, but we don't think in linear time, like, I'm not thinking in your linear time. I'm only thinking in my linear time. And so, but the two of us are
00:21:36
Speaker
talking concurrently. And so as a developer, how do you develop a system where there's multiple actors in the system that have to linearize some sequence of events? And obviously they can coordinate just like you and I can coordinate, but there are two separate
00:21:59
Speaker
linear streams of time. So you're saying our first step is to imagine ourselves in a crowd? Yeah, absolutely. Yeah. So how about dancers? I think dancers is a good, a good analogy. That's actually how I prefer to develop. There's actually two, I think, models of distributed systems development. So the one that I
00:22:29
Speaker
subscribe to, I suppose is very close to the actor model. Okay. But what you do is you develop a single node, right? Which implies that you're going to have a homogeneous system of all of these same nodes, right? And you think about the behavior of that node. And what you want to do is you want to come up with the simplest set of behaviors
00:22:54
Speaker
on the replica so that when you add them all together, you get more complex behavior, right? So from simple local behavior, you get global emergent behavior. Yeah. So in the dancer analogy is you think about the choreography for one dancer, and sure, it looks great when there's one dancer, but if you put 20 dancers and they're all doing the same choreography, it looks amazing.
00:23:23
Speaker
So that kind of model I follow is an actor model. This is, I'm reminded of something Louis Pilfold said to me recently. It's like, it's easier to write a reliable web server for one request and then make a million of those than to make a web server that can reliably serve a million requests. Yeah, exactly. That's exactly it.
00:23:45
Speaker
But that's not the only model. So then there's the specialization model or the specialist model, which is kind of the other model of development. And in the specialist model, you say, okay, we're going to think about these different roles and responsibilities. And here is the responsibility for this type of replica. Here's responsibility for this type of replica. And you have specialization.
00:24:10
Speaker
Um, and that might be more akin to like a cheerleading squad, right? Where you have people who are doing the lifts and you're people doing the flips and you have people who are getting lifted, right? Because we don't have cheerleaders over this side of the pond.
00:24:30
Speaker
or a dance that has maybe more complex components to it, like modern dance versus something else. And if you actually look, we can take this back to the distributed systems, Paxos is the specialization model of development. So Paxos, you have replicas, acceptors, leaders,
00:24:53
Speaker
Commanders, scouts, right? You have all of these specialized roles in Paxos. Whereas Raft is the actor model, right? Raft, you have one replica, here's what it does. It can become the leader, but it can also then go down to becoming a follower.
00:25:08
Speaker
Right? So Raft versus Paxos, you can see those two different styles of thinking about developing distributed systems. Right. Because I'll check my knowledge here. The Raft and Paxos, the surface differences, they behave exactly the same. Paxos came along and it did all this like leader election stuff. Raft came along and said, we can do exactly that, but you'll understand the algorithm better.
00:25:35
Speaker
I mean, I think that was certainly Diego's goal. Understandability. And Diego Garcia and John Ossoff, the authors of Refs. I think I might have gotten Diego's last name wrong. Diego. Yeah, anyway. Hopefully you can cut that out. I thought you had a subtitle caption on the YouTube version that corrected it for you. Sorry about that little brain fart.
00:26:05
Speaker
Yeah, I mean, certainly the goal of Raft was understandability, right? And even the name of the paper was a more understandable consensus algorithm. And, you know, I remember the initial way that this went about was they taught Raft and they taught Paxos. And then they gave a survey to students that they were teaching to see what they understood to show that Raft was more understandable than Paxos.
00:26:34
Speaker
But they're the same in the sense, if you think about multi-Paxos, where you have a leader that can cover multiple slots, or rather, sorry, multiple ballots versus- Sorry, what does that mean? Well, so Paxos is all about safety and theory, right?
00:26:54
Speaker
You know, Lamport defined Paxos thinking about safety, and thinking about these multiple processes that we're running. So replicas have slots, and the slots hold commands. And what the goal is, is to have replicas have the same list of commands in all of the same slots.
00:27:16
Speaker
And the way they achieve this is by asking a synod. All of these terms and facts, they ask a synod to decide what command should go in one slot. And the synod does that with balloting.
00:27:33
Speaker
So slots and ballots are kind of different, although whenever people think about Paxos, they always unify them. So they are a little different. Whereas Raft is about a single, totally ordered log of commands, which obviously is a simplification of the Paxos model. And then Paxos has all of these invariants for the different roles.
00:28:01
Speaker
for leader, acceptor, committer, proposer, scout commander, all these different roles have all these different invariants that they have to follow. And Raft replaces many of the invariants with timing and timeouts. What does that mean? Take me through that.
00:28:21
Speaker
Yeah, so Lamport has this idea of sequential consistency, right? And if you are a distributed systems academic, you don't believe in time.
00:28:34
Speaker
There's no such thing as time once you've got tenure. I suppose that's the natural response Yeah, so instead of time what you have is you have things happening before something else, right? So, you know this happens before this this happens before this and if they're you have if you have two events that have no relation to each other It just means that you can't establish it happens before ordering so in
00:29:02
Speaker
Distributed systems, you don't have time. You have

Raft vs. Paxos: Simplifying Distributed Algorithms

00:29:06
Speaker
the ordering of messages that are received by the system, right? So, you know, you don't put a timestamp on it because my clock could be different than your clock. So instead, like, I got this message, then I got this message.
00:29:19
Speaker
which implies a total serializability of messages, which is one very big bug that a lot of people who are just writing distributed systems make, because they forget that most web servers paralyze requests. So you have to have this sort of sequential ordering of messages in order for all of this to work.
00:29:44
Speaker
And so, Ralph says, okay, you know, that's fine. But, you know, broadly speaking, at a higher level of time granularity, you can
00:29:56
Speaker
As long as we're at like the millisecond level or the second level, that level of granularity, we can have timeouts and things in order to control these roles and responsibilities. So there's heartbeat timeouts and candidacy timeouts that control what roles each node in the system has.
00:30:16
Speaker
as opposed to Paxos, where you send a bunch of messages out like saying, hey, I would like to be the leader for this slot. And then if you get enough responses saying, sure, you can be the leader for this ballot, then you proceed. But if someone else says, no, I'm actually at a higher ballot, then you flip-flop back and forth. And it's all about the timing of messages so that one node, all the nodes get the messages at the same time in the same order rather than just actual clock time.
00:30:46
Speaker
Right. Already this seems very hard to implement in practice. Yes, and there's a whole lot of academic literature that complains about how hard Paxos is to implement. Did all the complaints come before Raft was written or after?
00:31:09
Speaker
That's hard to say. A majority of the complaints came before raft was written. Pre raft, I think, use zookeeper, use chubby, use that CD to build your system. That was the right thing to do, right? Because someone or many someone spent a lot of time dealing with the Paxos implementation, right? And you didn't have to worry about it. But post raft,
00:31:31
Speaker
Right? I think it is a little simpler to implement these consensus algorithms. So epaxos and raft, with a little bit of time and care, those things are able to be implemented in an effective and safe way. Really? So you're saying it wouldn't be mad to roll your own?
00:31:51
Speaker
I don't think it would be mad to roll your own. And I think that we're seeing the effects of that, right? Cockroaches rolled their own, etcd rolled its own raft. Kafka recently. Kafka recently, Matt's red panda. You know, I mean, it's not uncommon, right? You know, these, the HashiCorp has one console.
00:32:13
Speaker
So there are tons of systems out there that you can look at the implementation, they're open source, and that you can slightly tweak to your use case. Even inside Raft, we were talking about read versus write. Well, Raft Alexa leader, all of the writes can go through that leader, but are you allowed to read from a follower? Yes or no?
00:32:34
Speaker
And actually, if you can read from the follower, then you've got what we talked about originally, which is the read secondaries, which might give you stale reads, but you still have strong, strongly consistent writes. So even when you're implementing Raft, you still can make choices. Are we aggregating commands? Are we sharding Raft? There's still a lot of things that you can do even inside of that implementation. So now post Raft, post Epaxos, we're in a world where I think people can
00:33:03
Speaker
and should explore writing their own consensus algorithms. Wow, I didn't think anyone would ever say that.
00:33:12
Speaker
I'm willing to do that and to say that. We haven't even talked about the eventually consistent algorithms, but also writing gossip protocols and thinking about anti-entropy type eventually consistent replication is also another thing people should be considering doing because what you can do then is you can embed your application semantics into the storage system that you're writing.
00:33:37
Speaker
Ooh, that sounds exciting and weird and avoiding all those lovely abstract out your distribution layer things. Tell me about that. One of my favorite examples of this is Arculite, which is Raft SQLite. So it's a Raft replicated SQLite server.
00:33:59
Speaker
Okay. The way that you do transactions in Arculite, I'm not an expert in Arculite. What I looked at this implementation and what it looks like you do is you take your transaction, everything between begin and end, a bunch of commands, and you bundle that all together and that's one raft command. You raft it around and then it gets executed on SQLite server, and then you find out as a user whether the transaction succeeded or whether it was rolled.
00:34:28
Speaker
Right. Because SQLite is naturally single-threaded, right? Is that still the case? I don't actually know if that's true or not anymore. Because I know that SQLite has a lot of locking inside of it. And you can actually open a SQLite database now for multiple processes. Oh, right. OK. Maybe I'm out of date there. But you're now saying it's possible to do it from multiple servers. Right. So yeah, RQ lite basically makes SQLite a server.
00:34:57
Speaker
Yeah.
00:34:59
Speaker
Also, just while we're here, SQLite is an incredible technology. I am such a big fan of SQLite. It is really just an amazing piece of database technology, and I highly recommend using it even inside of your Kubernetes clusters because it's easy to attach a volume and drop a SQLite server there, and then you don't have to deal with all the HA Postgres stuff, and you still have a relational database under the hood. It's an incredibly effective tool.
00:35:29
Speaker
Yeah, OK. One thing I remember finding out to my surprise is that there's a SQLite database in the iOS core libraries. Yeah, a lot of applications make use of it as well. And I think a lot of browsers use something very similar. I think it might be called IndexDB.
00:35:51
Speaker
Yeah, they're internal HTML5 database that SQLite under the hood most times, right? Or a fork of SQLite or something like that. I don't know for sure, but yeah, I think that it's gotten around. SQLite, MariaDB, LevelDB, RocksDB, there's all these embedded database engines out there that are just available for distributed systems researchers to use or distributed systems application developers to use.
00:36:20
Speaker
you know, to create higher level distributed systems from. Cool. Yeah. Haven't we, though, we seems like we've got very firmly into this idea of elect a single writer. And however you do that, and however you change whose leader now, we're still doing single writer model.
00:36:42
Speaker
Yes. And that's certainly one of the ways that things goes down, especially with Raft and, you know, the leaders of bottleneck, right? And, you know, I'm primarily interested in
00:36:57
Speaker
large geographically replicated data systems. So I like the idea of someone in San Francisco and Australia and Europe. I want to see how those database systems work, but in practice, even just having something on one side of the continent, the other side of the continent can introduce a lot of problems, even with a fiber trunk between them.
00:37:20
Speaker
So the first method to deal with the leader-based system is to shard the system. So you have multiple leaders operating on different tablets of data or different shards of data and you're running multiple raft
00:37:40
Speaker
And this is how a lot is how cockroach works. That's how Spanner works. You'll see basically huge numbers of raft quorums inside of a lot of bigger distributed systems. And they're just working on different pieces.
00:37:56
Speaker
And one of the reasons that works is, let's say that you have a three-node system, well, we know that the leader is doing most of the work, right? The followers aren't doing most of the work. So what we do is we break our object space down into three parts, and then hopefully what will happen is that you will have
00:38:15
Speaker
you know, part one on one replica, part two on one replica, part three on replica, and they're all sort of leaders for each of their parts. And so you're maximizing, you know, CPU and, and, you know, disk writes across all the partitions.
00:38:31
Speaker
Yeah, but how do you generally tend to shard that? Is it like, to simplify, you've got all the accounting stuff in San Francisco and all the user data in Europe, and all the product catalog in Australia, or are you sharding it by all the keys from A to M end up in San Francisco?
00:38:55
Speaker
Or are you sharding it from the provenance of the first right, which for GDPR compliance or CCPA or the other privacy laws that we have, basically the key space is provenance related. So whoever the first rider was, it has to stay, the leader stays in that region.
00:39:13
Speaker
Oh, that's particularly spicy. Yeah, exactly. But again, this is just another point, right? Like your application should determine, right, how this is happening. And, you know, some database systems might give you control over that. But, you know, being, you know, it's, you're going to get to the use case eventually where your sharding mechanism maybe doesn't exactly match.
00:39:40
Speaker
you know, what the database is providing you and then there's an opportunity to start thinking about rolling your own solution and ensuring that your sharding technique works. I never thought we'd hit the day when you were saying generally companies should be thinking about rolling their own distributed systems as they grow.
00:40:00
Speaker
Like I said, it's probably a controversial opinion, but I think that we are in that phase right now. I don't think that phase will last forever, but I think that's what might help us with the innovation that we need to get to the next stage.
00:40:18
Speaker
And frankly, in the early 2000s, that's where we were at then, right? Google was rolling their own. Twitter was rolling their own. Amazon was rolling their own. Square was rolling their own. Dropbox was rolling their own. So, you know, they have been phases in databases and distributed systems technologies where companies thought, okay, it's actually better for us to roll our own.
00:40:37
Speaker
And at the time, those companies that I just mentioned weren't big tech, like they are now. Back then, it was Microsoft and Oracle and Yahoo, and these were the little guys.
00:40:49
Speaker
And so now we're back in that same phase again, right? Where now with privacy laws, more globalization, more cloud locations than ever before, right? I mean, five minutes and I can have something running in every single continent, right? Except Antarctica. Except for Antarctica. That's true. I can't get there quite yet.
00:41:12
Speaker
So 6 out of 7 is not bad, right? And if you look at the planned rollout of these cloud locations, South Africa, Northern Africa, Brazil, Peru are getting new data center locations. So it's easier than ever before to just put something up in Japan, put something up in Europe, put something up in India.
00:41:34
Speaker
So we're in that phase. All the problems that come with it. And all the problems that come with it. Yeah, exactly. And also we're facing the problem. So yeah, I think you should roll your own. We're in that phase where innovation is necessary and the more people that are working on it. And if you feel enabled or empowered to do it, that's where the innovation will come from.
00:41:52
Speaker
Right, yeah. So you're more saying it's that's the next natural swing of this pendulum. Exactly. Okay, I think then you have to give me your Benjamin Bengford guide to writing my first raft algorithm. Oh my goodness. Yeah. Well, I go with raft in that. Yeah, writing raft is easy at first.
00:42:21
Speaker
and hard after. So the step number one is you want to make sure that there's a single thread, right, or a single
00:42:34
Speaker
place where all messages are handled in Raft. So whether you're using gRPC or you're using an HTTP client, your server in the thing that you're embedding, all of your messages are going to be coming in concurrently. And you want all of those messages, whether they're from a client or from another replica, to go through what I like to call one big pipe. So step one is make sure that you totally order your messages within your system.
00:43:03
Speaker
If you're using Go, channels are great for that. If you're using something that has new texts or something like that, you want to make sure that your central thing is handling one message at a time in as close to the order that they're coming in as possible. Some software transactional memory thing. Exactly. Step one is make sure you've got the one big pipe thing locked down.
00:43:32
Speaker
Step two is implementing your timers and remembering that your timers have to go through the one big pipe, right? So you might start to think of these things more like events, right? So a message received from a client or from a replica is an event and a timer going off is an event.
00:43:54
Speaker
And these events have to be handled in order with one big pipe, right? So one central process. Okay, so if I'm using like a channels or coroutines thing, I might have one machine that's just responsible for sending heartbeats and that goes into the coroutine pipeline. Exactly. It's something within my single raft atom that's time ticking. Exactly. Okay, okay, okay.
00:44:21
Speaker
I'm with you so far. And really, if you do that, you've solved like 80% of the bugs that the undergrads and graduate students who are being taught this usually have, right? Because they usually end up having lock contention issues.
00:44:40
Speaker
or other types of concurrency issues like concurrency issues between the heartbeat and the networking component, right? So you just want one thing that's rafting, right? One thing that's managing the state and that goes through.
00:44:54
Speaker
The second thing that's important to think about is you want your RAF replicas to fail hard, right? You want them to die totally. Which is another thing that I think a lot of people have problems with, right? So you don't want to get into like a partial shutdown state.
00:45:15
Speaker
You don't want to get into partially writing to disk or anything like that. You want things to die and die completely. And so thinking about what are you using for your storage under the hood, I personally recommend LevelDB as the storage layer under the hood to start off with because it's so easy to get started.
00:45:37
Speaker
Or you could do something like Badger DB that has transactions and buckets. But you want to make sure that you're using something so that you understand the consistency model from that sort of central process all the way on down. And that if something goes wrong, or something shuts down, it all just completely goes away and dies completely. Right. And your partial right gets rolled back or
00:46:00
Speaker
And your partial rate gets rolled back, exactly. Or one mistake that people make is they have a heartbeat messenger. And so every time the heartbeat timer goes off, they send a message. But that's dependent on that replica being a leader. So if you lose your leadership status, that heartbeat timer needs to stop.
00:46:30
Speaker
but it could be that it's sent a heartbeat message because it's an independent process. This is another reason that everything happens in that one order of events. The heartbeat timer goes off, that middle thing is going to send heartbeat messages. It knows that I'm the leader state, it sends its heartbeat messages out. Then the next thing comes in and this is like a vote and the vote term is higher than my term. Well, now I've got to go down to a follower.
00:46:55
Speaker
Right because you know the term has changed right and I got to get all the messages from the log that I miss right then the next thing comes in and maybe it's An append entry is from the new leader, right? Everything has to happen like one at a time whether it's a timer message or whether it's a a network message right Okay, and then within that
00:47:17
Speaker
I'm building like, can I download like the raft state machine spec and just implement that? Yeah. So if you go to the paper, right, in the paper, there's like a one page description of the algorithm, right? It's in blue and red and has like black tags and it's just like a box.
00:47:39
Speaker
And it says like, this is what a follower does, this is what a leader does, here are the invariants, here's when you should be sending these messages here on these timeouts. It's just one page, you can pull up that page and as long as you get the things right that I just said, more or less, you will end up with a working raft system. You make it sound so easy.
00:47:59
Speaker
I don't want to imply that it's easy, but with a little effort and a little consideration, I think that it's buildable, it's doable. It's a fantastic exercise, even if you're not going to put it into production on your own, for just understanding the semantics of a distributed system. There was a time I would have said anyone attempting to do that was mad. You're now saying it's a sane thing to do.
00:48:23
Speaker
It's saying now, I mean, the devil's always in the details, right? So I've tried to head off most of the issues that people have had by minimizing the amount of new Texas that are around state and minimizing the amount of concurrency issues by using that one big pipe thing. That is where a lot of the edge cases come in.
00:48:43
Speaker
What you're not going to solve by doing this exercise is things like, you know, the FLP problem, or as I prefer to call it, the leader thrashing problem, where, you know, you have the timing is like, if you set up your timing wrong and raft, the leaders will just thrash, right? So one person will say, I'm the candidate, and then the other person will say, I'm the candidate, and then they'll say, I'm the candidate, and they'll just keep trying to be candidates and just keep trying to climb the term numbers up and up.
00:49:09
Speaker
So you need to make sure you do things like have stochastic intervals for candidacy timeouts, so everyone's not becoming a leader at the same time. You need to configure your system so that you have enough time between timeouts, right? So that you have this sort of reasonable outage when you don't have a leader, but also so that the new leader can elect itself without brashing.
00:49:34
Speaker
Right. Yeah. You know, there are problems that you'll encounter when you're running this in production. You know, that aren't on that one pager. But you know, getting to that good quality state where you can get a leader elected, you can kill the leader and a new leader gets elected, and you can get totally ordered commands through the wrap system. I'd say that's a doable exercise. Cool. And what happens when you've got that when you've got like,
00:50:04
Speaker
you've got it working and you've got your leader up.

Implementing Data Writes with Raft

00:50:08
Speaker
How does it actually work when you want to actually start writing some data? Do I then just send my data packets to Raft and it has a special message that says, now I'm just writing this data? That's actually a great question. Now what do we do with it? You're building some application. If you just want
00:50:34
Speaker
like nominate a leader and like tell me where to write, probably you should just use SED for that, for zookeeper if you're writing in Java. But now that you have Rafta up and running, you can start thinking about the application semantics of whatever you're doing.
00:50:52
Speaker
The simplest thing to start with is a key value store. What you have inside the key value store is you're just totally ordering all of your puts. If you want linearizability, which is the strongest form of consistency, you can also totally order all of your gets as well. Every client says, I want to put this value to this key,
00:51:20
Speaker
and then raft turns that into a command, replicates it around, it's in this log, and then it executes that command onto their local database. And then now if you have concurrent puts that are coming in, they are totally ordered, so you end up with that sequential consistency, this happened before this, this put happened before this, even if they come in in different nodes.
00:51:42
Speaker
And then you can start to think to yourself, okay, well, you know, we have this object space that belongs in one shard and this object space that belongs in another shard. So maybe now we have two raft replicas that are working on opposite shards so that we can increase throughput.
00:51:58
Speaker
And now you have sort of a distributed key value store that might be using key hashing or something like that to distribute the different shards. So now you start to think about your application, but maybe that sort of key value stores, and if the key value store, you start thinking, okay, well, I want caching, I want eviction. So now on top of my RAF cluster, I'm gonna have another process that will start to do eviction.
00:52:24
Speaker
in a consistent manner. So eviction becomes a consensus process inside the system. So you can start to see how anytime you want something that has a strong consistency semantic in your application, you already have this underlying consensus thing running on the nodes and you're not asking some external system to do it for you. You can have these internal processes that are working.
00:52:50
Speaker
So you can literally describe the custom menu of commands you want to be rafted. Exactly. Like with Arculite, right? Like any insert or update is getting rafted. Any select just goes to whatever replicas.
00:53:09
Speaker
underlying thing that it has. You can absolutely, or you could create a lock server. Now, if you have a lock server, maybe you just use etcd for the lock server, but you can create a lock server if you want to do more complex key value transactions. Maybe I don't want to just do a put on x, I also want to do a put on x, y, and z, and I want to make sure that I lock all of those keys before I put them together.
00:53:34
Speaker
Yeah. Or you can implement your own check and swap, right? I want to put X provided X is currently this value. Yeah. Or maybe I want to start thinking about a file system, right? And I want to create a distributed file system. Or I want to create a distributed compute system where the executors are working on specific pieces of data and that we have fault tolerance on the executors.
00:54:03
Speaker
and you start thinking about, well, I don't need something as massive as yarn or mesos or something like that. So now you have this little consensus algorithm that's working to manage your distributed compute, or maybe now I want to build a graph database. You can start to see how you start to build up how your consistency model, what you're going to allow to have a weaker consistency, what you might be allowed to be replicated differently versus what types of operations go through consensus, and they're already on your data storage system.
00:54:32
Speaker
Yeah, yeah. It reminds me of a little hack I've been playing with is like a networked game engine, where you have this bunch of commands, which is I'm joining the game, I'm leaving the game and so forth. And then you've got within that session thing, a packet of commands that actually relate to the game that's actually been played, like fire the missiles, for instance.
00:54:56
Speaker
It's a bit like that, isn't it? Once you've got this base layer of session consensus management, then you can stick whatever you like in the middle of that data type. Absolutely. I don't know if anyone's ever said this in history, but by gosh, you've made it sound like fun. Well, if there's one goal I could have, it should be fun. I do want it to sound like fun. This is where I think Axos struggled.
00:55:23
Speaker
Paxos focus focused on safety, right? Raft focus on understandability. You know, epaxos focus, you know, I also recommend looking at epaxos too. So we're talking about raft, but epaxos is another, you know, there's basically two optimizations to base Paxos. Optimization one is called the leader oriented Paxos, which is multi Paxos raft and a whole host of others. And the other optimization is optimistic fast path.
00:55:54
Speaker
and this is what FastPaxos and Epaxos do, is they try to get, basically, maybe I should back up for a second. For any consensus algorithm, there has to be two phases minimum. There has to be a proposed phase and an accept phase. Minimum. You probably also have a commit phase or an execute phase, probably. But there has to be a proposed phase or an accept phase in order for it to be safe, in order for that coordination to be safe.
00:56:23
Speaker
what leader-oriented optimizations do is they just do the proposed phase all at once. Basically, the leader says, okay, I have proposed for all of the following slots. They're mine. It just takes away the proposed phase and everything after that is except phase. That's the leader-oriented optimization. The fast-packs of this optimization is we go straight to accept and we skip propose,
00:56:53
Speaker
And if we detect a conflict, then we have to do proposed accept. And yeah, so what happens is you're optimistic in that there's no conflict. You can get these accepts through and you won't detect an issue. But if there is a conflict, then you have to go through the slow path, which is a proposed accept, but it actually has turned into three phases, right? Because you have that initial accept, then a propose, then an accept.
00:57:20
Speaker
So those are the two basic optimizations. Raft is the leader optimization. Epaxos is the fast path optimization. Right. Yeah. I see the optimism in there and that must work well when collisions are rare. Exactly. All the fun of writing rollback code.
00:57:39
Speaker
You do. And actually the most complicated thing about epaxos is that dependency thing, which gets into our point about applications, right? In order to detect a dependency, that's kind of an application level thing, right? Like, how do you figure out, like, how do you tell epaxos, like a generic epaxos, that this thing is dependent on that thing, right? X is dependent on Y, but not Z.
00:58:04
Speaker
And so that's the tricky thing. And so epaxos is probably better used in embedded into more specific applications where those dependencies are known. But epaxos massively increases the throughput because you don't have a leader bottleneck anymore.
00:58:22
Speaker
And so you'll get much higher throughput from epaxos at the cost of having to figure out this dependency thing. But you can still get those strong consistency guarantees across the distribution system. Yeah, it's just as strongly consistent as Raft is. And should we be ambitious enough to try and implement that one ourselves? That one is tougher than Raft. Unfortunately, there isn't that sort of nice one-pager thing that Raft has.
00:58:50
Speaker
Absolutely for fun. I think you should do that for production. There aren't as many reference implementations, right? So one thing we talked about with RAAF is there's a huge number of reference implementations, right? HashiCorp, SCB, you know, Cockroach, like there's just a huge number of reference implementations that you can look at. Not so much with epaxos, but I would like to see epaxos start to grow.
00:59:15
Speaker
The criticism with epaxos is that you can't shard it like raft, right? So raft, you have this leader that's maxing out throughput. So if you have three nodes, you put three leaders on each of the nodes. Whereas with epaxos, all of the nodes are maxing their throughput. So the idea is if raft can do 5,000 messages per second, then you can get up to
00:59:39
Speaker
15,000 messages per second with three, whereas EPACs can do 15,000 messages per second, but it requires all three. You're maxing out the CPU of all three. But you don't have to shard. Except that you have to do this dependency thing. Are there particular kinds of applications where this shines?
01:00:02
Speaker
Yes, anything where the latency between messages is high. ePaxos was developed specifically for geographic replication. Right. So, you know, my research, you know, what we thought is, you know, we have, you know, vertical Paxos, which is you have a configuration Paxos and then the object Paxos or the command Paxos.
01:00:26
Speaker
if you have an epaxos that's managing your sub rafts, right, that's actually a good way to really create a big system that goes across different regions and has privacy guarantees and provenance aware things. Is that, I mean, that's sounding like a real mess. Is that something we can hire you to do for us? Are you busy at the moment?
01:00:53
Speaker
I am busy at the moment, and that's exactly what we're thinking about, is how do we create these sort of large distributed systems that are plan at scale, and how can you replicate across them? And our sort of mode of operation is, you know, mixing and matching, right? So having some layers that are doing EPAC, so some layers that are doing rafts, some layers that are doing eventual consistency with probabilistic entropy,
01:01:18
Speaker
Yeah, so that's how we're building our distributed systems at Rotational Labs. Who are the clients for that? Are we talking just the major, large geographic players, or is this something that smaller companies might need?
01:01:36
Speaker
I think this is something that smaller companies might need right now, small and medium sized companies. Like I mentioned, it's very easy these days to put your application into a different region and therefore grow your market. You have machine translation for I18n issues. You have the cloud services to place your application in those countries.
01:02:03
Speaker
It's easy to do that, but then you just get into all sorts of complications once you start to manage those distributed systems. And what the cloud companies want to tell you is just use our service and it'll just work by default. My favorite example of this going terribly wrong was the Pokemon Go release by Niantic.
01:02:26
Speaker
I missed that one. Help me out. Yeah, so Niantic was commissioned to create the Pokemon Go mobile app, which is one of the first very successful augmented reality games. I remember we were all joking back in the day of people walking around with their phones looking for these Pokemon
01:02:46
Speaker
Yeah, when you see the little monsters on the street, three of them. Exactly, yeah. You'd use the camera and you'd find these Pokemon like in the world and there was gyms and you could throw your Pokeball at it and capture them and everyone spit that spike because now everyone's walking around. And it was a very successful game, but the release was highly chaotic.
01:03:07
Speaker
was highly chaotic because I think they released in 50 countries simultaneously. They first released in Japan, then the US, and then it was 50 countries. And people were having login issues, registration issues. They weren't being able to capture Pokemon. It was a real big issue.
01:03:30
Speaker
When they diagnosed it, it was a distributed data system issue. The databases that they were using weren't replicating well, and all of these things were consistency symptoms. They ended up hiring Google to fix the problems, and so Google brought in their cloud resources, Spanner, and some other things to fix the issue for Niantic, and everything went smoothly after that, and it was a great tale.
01:04:00
Speaker
But when you're a small to medium company, paying for Spanner is horribly expensive. And buying in Google is never cheap, right? Exactly. Exactly. And you do have to understand, I mean, magnetic head experts help them redevelop their application for these new global consistency models. And so
01:04:25
Speaker
if you want to go to market in a different region, I think you do have to consider this problem because you are going to have issues if you're selling something and there's an inventory issue. Is it fair that whoever's closest to your server gets the best deal? You don't really want that to happen, so you want to make sure that you're
01:04:51
Speaker
you have a distributed system set up well. And we're certainly interested in making sure that people can do that.
01:04:59
Speaker
And there are an increasing number of companies trying to do that right across the globe. Because it's initially so easy and obviously business desirable, but the devil's in the details. The devil's in the details and you don't know necessarily where your product is going to land or where you're doing is going to be getting the most traction. So it is important I think for companies to branch out besides just their local regions. Yeah.
01:05:24
Speaker
Yeah. Years ago, I sold a software thing and it turned out my best market was Austria. Far and away. Didn't see that coming. How do I target Austria? Yeah. I love stories like that. Well, from here in the UK to there in the US, our distributed system must come to an end.

Conclusion and Future Inspirations

01:05:48
Speaker
We could talk. We might have to have a sequel. Maybe we'll try and be in two different countries for that one.
01:05:54
Speaker
You can be in Australia or I'll be in Singapore or something. There you go. Thanks very much for joining us. It was a pleasure. Thank you for having me.
01:06:04
Speaker
Thank you, Benjamin. And as I said at the start, that really has inspired me to do some coding. And if you caught our episode a few weeks back on Erlang and Gleam, I'm thinking actors. I'm thinking sending messages to a single serializable point. I'm thinking predictable failure modes. This might be the Gleam project I've been looking for.
01:06:26
Speaker
I'll keep you posted. If I manage to get that together, I'll let you know. And if any of you out there break out an editor and put something together, please let me know. But you don't have to start a repository to get in touch. My contact details are in the show notes, as always, and the comment box, the like button, the subscribe button and such, they're always there for you. Send your feedback across the globe and my inboxes will linearise them for you automatically.
01:06:53
Speaker
And also take a look in the show notes if you want to check out rotational lamps and see the problems that Benjamin's actively solving with all these techniques. And you can also look in the show notes for links to the original Raft paper and a few related papers that might tickle your fancy. And I think that brings us to the close of this consensus slot. I've been your host, Chris Jenkins. This has been Developer Voices with Benjamin Bengfort. Thanks for listening.