Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
#9 - Onyx with Mike and Lucas image

#9 - Onyx with Mike and Lucas

defn
Avatar
64 Plays9 years ago
Summary: A fun, loooong episode to discuss the awesome Onyx with Mike Drogalis and Lucas Bradstreet Show notes at https://defn.audio/2016/09/06/episode-9-onyx-with-mike-and-lucas/ Credits: Guests: Thanks to Mike and Lucas for the great conversation and all of the lovely software. Music: Thanks to the very talented ptzery for the permitting us to use his music on the opening and closing of the podcast. This track is Melon Hamburger. You can give his work some ❤️ and hear more on his SoundCloud https://soundcloud.com/ptzery
Transcript

Introductions and Personal Updates

00:00:30
Speaker
Hi everybody. Episode 9, Deafen. Welcome. Hi, Vijay. Hello, Ray. Hi, so yeah, Ray MacDermot over here in Belgium. Still hanging in there, waiting for the Brexit. Still not happening yet. One of these days. And how's Holland hanging, Vijay?
00:00:47
Speaker
Holland is spectacular, beautiful and rainy as usual. And I think today is some classic cars rally today coming all the way from Luxembourg here. So pretty old cars and stuff, I think. That's amazing news. The only news I have is that I left my washing out overnight. So that's how exciting my life is. But that's why we're recording the broadcast to make it more exciting. Exactly.
00:01:13
Speaker
Yeah.

Listener Highlights and Event Announcements

00:01:14
Speaker
Okay, so let's get on to the show. First of all, we'd like to bring everybody's attention to the amazing fact that Rich Hickey listens to this podcast.
00:01:28
Speaker
Oh, the horror. Oh my god. Rich, if you are listening, thanks for listening, I think. And we hope to have you on this show pretty soon. But anyway, that's the big news for last week for DeafN. Yeah, we're in our best behavior at night, so okay. So we do some things about the news and events then, Vijay?
00:01:52
Speaker
Yeah, a quick check on one of the upcoming events, I think, in Tampere. I don't know, man. Yeah, somewhere in Finland. There is Cluj Trae, Cluj-ter. I think Cluj Trae, because it ends in rare. I think that's good. Yeah.
00:02:11
Speaker
It's, you're just putting your name in there. They put it in there, you know? I can't help it. But Closure Trae. We should stop butchering it. Okay, so that is an event and we'll of course post the link on the show notes obviously and David Nolan and other great guys are coming there. So if you're around, you should visit Finland.
00:02:36
Speaker
And of course, we'll be there as long time listeners of the show know that we'll be showing up in Euroclosure in October.

Introducing Michael and Lucas

00:02:45
Speaker
We haven't still decided what we're going to do there. Apart from attending, we'll probably record some amazing discussions there, but we'll keep you posted.
00:02:54
Speaker
But let's get on to the main discussion. And as you guys know, we are trying to get different guests every other week, so you don't get bored listening to me and Ray blabbering in our village idiotry. So this time, we were very lucky to have not one, but two guests. And one is all the way from Australia, and the other one is right now in the US.
00:03:17
Speaker
So those guys are working on one of the, probably, you know, every language needs like a killer application. So I think Onyx is, okay, I already announced what it is going to be. Exactly. Ah, damn it. Anyway, anticlimactic.
00:03:36
Speaker
We'll have to edit all that together now. But anyway, go on. Introduce them. The mystery guests. The mystery guests who build Onyx. It's Michael and Lucas. Welcome to the show. Hi, guys. Hi, Michael. Thanks, guys. Thanks for having us.
00:03:53
Speaker
Hello, Michael, and hello, Lucas. First of all, thanks for joining the show. And I know it's practically 2 AM or 2.30 already in your time zone, Lucas. Yeah, 2 AM in Singapore. Oh, wow. OK. Are you having a party tonight, Lucas? Is that what it is? I'm the one that sacrifices for our distributed team. This is the party. This is the party, OK.
00:04:22
Speaker
This is amazing because we have, well, I think we need somebody somewhere in the Pacific somewhere. Otherwise we are now practically covering half of the globe or maybe even more in terms of the time zone spans. So Lucas all the way at 2 a.m. and here it is almost 9, sorry, 8.30 p.m. And what's the time there in your time zone, Michael? 11.29 a.m. Wow. Okay. So real distributed computing and distributed podcast.

Understanding Onyx

00:04:50
Speaker
It's perfect, perfect for the subject matter, isn't it? Exactly. So first of all, can you give us a quick intro about yourself, you know, what you're working on and how did you get into closure? Maybe we should start with Michael.
00:05:05
Speaker
Sure, so I've been doing Clojure for, that's 2016, so I guess about six, seven years now. I had been in college- Were you doing it before rich? I mean, Clojure started in late 2005, so I suppose I picked it up about four years later. I wasn't quite part of the early then, but not too far after. But yeah, I was in college, kind of a hardcore Java person, and I
00:05:34
Speaker
Just sort of hit my limit i realize that the complexity budget runs out really really quickly and i started searching around for something else and functional programming enclosure and Haskell became particularly appealing and.
00:05:49
Speaker
The rest just kind of happened. Kind of left college, started looking for a job, started looking for a hard problem to work on more particularly. And databases, distributed computing, that sort of topic kind of caught fire with me. And it was really a matter of putting together what I knew were good design principles, learning closure the last couple of years, and then having a lot of distributed computing, which is kind of now how I got to here.
00:06:16
Speaker
Okay, fantastic. Before we switch to Lucas, what is your favorite editor, by the way? My favorite editor, I am an Emacs person. Yay! It's so sad. Awesome. It's so sad. I don't know if you're going to go with that. I apologize. All right, fine. I'm sorry. It's not a problem. I've offended one of the hosts. I just think it continues.
00:06:39
Speaker
I mean, this is the same response. Every time somebody says, what is your favorite editor? I'm sorry, it is Emacs. It's like apologizing to every entire world. It's like a family fight. You know, we said you can swear on the shore, Michael. Don't start now, right? Lucas, tell Lucas, please, please, please end your introduction with a better editor than that.
00:07:04
Speaker
Well, I mean, I have to say I'm pretty jealous of all the tooling that Emacs gets around Clojure. But yeah, I've been working in, I was working in industry and analytics systems for oil and gas companies. And yeah, I picked up Clojure a few years back after
00:07:25
Speaker
working in C sharp for way too long. I wanted to move over to a functional language and I was searching around for something to work on next. I'd done a bit of Haskell in the past, which I really liked. And I looked at Scala and that kind of didn't really look very appealing to me. And Closure looked just weird enough to be interesting. So I tried it out and I've been loving it ever since.
00:07:52
Speaker
Yeah, excellent. So you both mentioned that you were kind of into a bit of Haskell as well. So just maybe as an aside, what kind of like drew you more towards closure, the kind of untyped or dynamic type thing rather than Haskell?
00:08:09
Speaker
For me, it wasn't so much the type system. When I had a look at Closure, it just seemed to have a lot more of the... Well, the libraries were a bit more settled because you could use any Java library you wanted to. Whereas Haskell at the time was a little bit more immature on the library front.
00:08:36
Speaker
Yeah, I mean, I really love Haskell still, but Closure just really stuck with me. Yeah, it was also... I was going to say it was a pragmatic choice for me as well. Same thoughts. Okay, so it was more about what you could do with it rather than whether the language was particularly beautiful or not. It was about, well, how can we practically do something with these things?
00:09:02
Speaker
Exactly. And nowadays I think it has an even better story because you get the benefit of closure script as well, which is really a secret weapon. But do you have a secret hankering to rewrite everything in Haskell? You know, from time to time. And then that gives me some sad faces.
00:09:30
Speaker
anyway screw Haskell we're on the Closure podcast today yeah okay great well anyway that's a nice backstory

Onyx vs. Other Systems

00:09:39
Speaker
actually so it's all about getting stuff done which is great for what you guys are working on now isn't it? Maybe it's worth giving a quick introduction to Onyx because
00:09:52
Speaker
You know, obviously, well, not obviously. You guys know a lot about it. That's very obvious. I know a little bit about it. Vijay knows a bit about it as well. He's been doing some work with it recently. But obviously it would be nice if you guys could give us a kind of high level view of what onyx is and what problems you're trying to solve with it.
00:10:11
Speaker
Sure. So Onyx is a distributed high performance computation framework aimed at elevating closures strengths into the programming model. Essentially, it is a hybrid platform for doing stream and batch processing, which is kind of just
00:10:32
Speaker
a fancy way of saying that it's able to take a lot of data either in a continuous form or as a statically partitioned form and apply some transformations or aggregations to your data and very quickly compute answers to relatively ad hoc questions. So when you break it down, ONIX is really a new programming model for how you can do distributed computation that emphasizes data structures and plain functions. And it's also a runtime to support that new programming model.
00:11:00
Speaker
So it's big on data. Right. It's markedly different from almost anything else out there that I've seen in that the model with which you communicate to ONIX how your program is going to run is purely through data structures. And this is a vast departure from how almost all other platforms operate. They normally give you some sort of a code-centric API, which has a lot of advantages. But I really think that when you look at its design,
00:11:28
Speaker
the development level API really ought to be built on top of a data layer first, because it gives you much more flexibility. So is the typical use case more like ETL, like Extract Transform Load sort of use cases, or do you think there is some other real-time analytics, these kind of things? They also scale well with Onyx.
00:11:51
Speaker
I would say that real-time analytics is its strength, but it can absolutely be used for ETL type tasks as well. We don't try and dictate what it really should be used for. But yeah, real-time computation is definitely a strength of its. Given that its execution model is very streaming-based, the batch is built, batching is built on top of the streaming.
00:12:20
Speaker
That leads to different strengths and weaknesses compared to other systems, but I think it does both quite well.

Architectural Evolution of Onyx

00:12:31
Speaker
Look, it's funny because the way I look at that also, I try to explain it at work is that you can always slow down streams, but you can't speed up batches. You know, so you can, you can kind of think of the streaming model as a unified kind of model for everything and batch. You can just take elements of that stream and batch it together. Right. I've heard it put as, um, you know, batches, a special case of streaming.
00:12:57
Speaker
Right, right. Which is kind of what we have at the moment in many industries is a total separation, isn't it? And that was kind of the foundation of this Lambda architecture and all this kind of stuff where you had the batch on one hand and the real time on the other hand. So this is a kind of unifying perspective as well. This tool that you're dealing with, the Onyx is a kind of unifying the batch and the stream together, yeah? Absolutely.
00:13:26
Speaker
Okay, so the other thing you mentioned is that it's a distributed compute platform. So does that mean it's not suitable for like single computers or, I mean, I guess when you're doing development, you're on a single computer.
00:13:41
Speaker
Sorry, I missed the question. Hangouts kind of garbled away. Okay. All right. So I was saying you mentioned that it was a distributed computing platform on X, but how does it work on one machine? Can you do it on one machine or is it, you know, does it optimize for one machine as well? If you've got a big machine with a lot of CPUs, a lot of memory, or is it, is it only for the distributed computing world?
00:14:04
Speaker
Yeah, it certainly works with a single machine, and we get that question a lot for designers who anticipate having much higher volumes in the future, but don't have a lot.
00:14:14
Speaker
don't quite have that volume yet. So it'll work fine on a single machine. We don't specifically optimize it to work for a one node use case, but as you add cores and you tune it for the hardware that you're running on, Onyx will continue to take advantage of increasing number of cores because everything that is meant to be fanned out is multi-threaded.
00:14:34
Speaker
So as you adjust the performance tuning, Onyx can continue to take advantage of additional capacity that you add. So we do see that a lot because people end up wanting to build things that are more dynamic from the outset, and Onyx is really appealing because of its data structure-centric API.

Fault Tolerance in Onyx

00:14:51
Speaker
And hopefully that decision pays off. It's a risky one as to whether you want to go down the road of immediately building something in a distributed context that you think may grow, because you certainly do add complexity when you design like that.
00:15:04
Speaker
It's an option, and if you think it's gonna be a good play, then I think it's a good way to go.
00:15:10
Speaker
I think that what we see is that people want to do some things, like you said, that are fairly small. And as things grow, then they want to have more horizontal scaling. But sometimes they want to start off with a kind of vertical scale, just one big machine, just to do one particular task, then throw it away. But rather than having to dispose of, especially if you're an Amazon, to dispose of like 20 machines is,
00:15:38
Speaker
or provisioning 20 machines is sometimes more tricky than just provisioning one. Right. And I think that's indicative of a deeper problem where you have people who are attracted to something like ONIX almost solely because of its flexibility. And that sort of seems to say that there's something about the programming model that a lot of these systems offer that is a
00:15:59
Speaker
a paradigm mismatch between, you know, why can't I switch between having one machine and having n machines? And why does it have to be such a drastic gap? People kind of want to have that layer of in between the ability to shift up without having to incur a significant deployment cost or a complexity cost. And I think there's something more there that hopefully Onyx is seeming to tackle.
00:16:26
Speaker
That's nice, yeah, because I think the key thing is here that you don't have to change the programming model between those two environments. I think that's the thing I see at the moment is that in traditional batch ETL, even streaming, message kind of processing stuff,
00:16:45
Speaker
is fairly, what should we say, fairly small scale. And the move to things like Spark or Flink, it seems like a huge jump all of a sudden or storm or things like that. So maybe how would you compare it to those things, to those standard big data systems?
00:17:08
Speaker
So I would say it's closest to Flink and Storm as opposed to Spark in that it's built to be streaming first.

Getting Started with Onyx

00:17:24
Speaker
I mean our model for programming on top of ONIX is significantly different to theirs, but overall the execution model is quite similar to Storm currently in terms of the way that messaging is performed, the way that fault tolerance is implemented and so on.
00:17:45
Speaker
So what are the major, so can you give us some insight into the design of Onyxx? So what are the major components? Because if you see Flink and Spark and you see there are some technologies that they use that are fairly similar within them, for example Zookeeper or Acca for concurrency and obviously they're really really tight to HDFS because that's where you know they're
00:18:09
Speaker
going to get the data from and also they are I think that they support YARN and this Hadoop ecosystem so to speak because that is the big data world. Can you give us some idea about what kind of libraries or is it turtles all the way down with closure everywhere or what kind of components are there in ONIX in design wise?
00:18:34
Speaker
I guess I'll take that one. We use Zookeeper, like the others. We found it works really well, it's reliable.
00:18:46
Speaker
So we use Zookeeper to basically act as a distributed data store for the job data. So anytime when you submit a job to your cluster, so you want to start a new job, you'll make an API call which will write that data to Zookeeper and then write a message to a log that all of the onX nodes are tracking.

Deployment and Monitoring of Onyx

00:19:14
Speaker
which it uses to essentially decide how many peers are on the network, what are they currently running on, so on and so on. Okay, so you said there is something like a job, so maybe it would be a good idea to explain what job is in ONIX terminology. Mark, do you want to take that one? Yeah, sure. So an ONIX job is kind of akin to the most coarse
00:19:39
Speaker
A course unit of work that you can compute in Onyxx, that's a collection of stages of work to be completed. You would imagine that you have data flowing in through particular slots of a multi-staged computation. This job discreetly breaks the stages down into first-class things called tasks. The gap between tasks is presumed to be a network, so the latency of
00:20:10
Speaker
computing data coming into a task and exiting a task is going to be higher than it would be to transfer it over the network. And that's really why Wionix was built, to be able to fan out the various stages of a computation and have concurrency take over to increase the overall throughput.
00:20:26
Speaker
So does it build something like, for example, in Spark, when I create a program, it is going to generate this DAG, the distributed acyclic graph or whatever they want to call and each, it has stages and each stage has tasks. So in Onyx, I see that you have this, as you were explaining, it is declarative. So I can put the whole program or I don't know what it's workflow.
00:20:51
Speaker
Yeah, so what's fascinating about this is that the workflow is the DAG. You actually declare what the tree is. So there's no impedance mismatch between what your program is and what we compile it to under the hood. We actually take the data structures you provide as is, and those become the program.
00:21:10
Speaker
The really interesting thing that's come up is that usually these platforms have some kind of an optimizer that you tuck under the covers and you have your program. And under the hood, you're going to end up stitching some of these various phases together. You're going to perform fusion. At all points of the compilation cycle, you're going to start fusing things together that would be better performed at a single unit of time rather than spanning it out. We're actually able to move the optimizer around the stack
00:21:36
Speaker
We've been able to elevate the optimizer up to the application level, where when you submit your workflow towards Onyxx, indicating what the stages are, because it's plain closure, you can go ahead and just stitch those things together and have a much better understanding of how it's going to behave at runtime.
00:21:52
Speaker
Yeah, that's like, well, I was playing with Onyx and obviously during my day job, I'm working with Spark a lot. And as you were explaining, every time I write a Spark program, I always need to go to the Spark UI to see the DAG and to see the stages and to see how Spark is optimizing, especially when I want to squeeze out the performance from the cluster. I need to submit a job and then see how Spark sees that one.
00:22:16
Speaker
But in the case of onyx, you know, it's it's really declarative I can I can clearly see this is the step number one step number two step number three. So that's Very very interesting in my opinion and by the way, maybe maybe there's a stupid question So if I have a workflow, so obviously if I understand correctly you start with the workflow and then each work step you can attach any closure functions and also plugins, right and
00:22:45
Speaker
Exactly. Yeah. And in Spark, you have the RDD as the underlying data structure. Of course, they're moving to data sets and data frames, et cetera. But Onyx stays true to the closure. It's just the maps that are floating around.
00:22:59
Speaker
Yep, I never did quite understand why there was a different abstraction for how you pass data around inside the cluster than how it exists outside the cluster. I understand that RDD has more implications beyond just what the structuring of data looks like. There's actual optimizations and performance implications for how RDDs are set up. But really, they're maps. I mean, they are keys and values with their names and the names and their actual underlying values.

Integrations and Testing of Onyx

00:23:25
Speaker
So we keep actual.
00:23:28
Speaker
persistent array maps being passed between the tasks all the time. And that ends up wiping away tons of complexity. Yeah, yeah. And obviously there is a plugin architecture in Onyx, so you can write your own plugins and plug into any of these workflows, right? Certainly. Or at any stage.
00:23:48
Speaker
Yeah, Lucas can actually talk a bit more about that since he's been more of the architect of the plugins as of recently. Right. So in Onyx, there are two main forms of plugins. You have input plugins, which are essentially to read from some kind of medium and that would be
00:24:07
Speaker
the input nodes in your DAG and they cover all of the fault tolerance aspects so they need to deal with stuff like checkpointing and ensuring that the data is processed fully.
00:24:26
Speaker
And then you have output plugins which are used to write to a medium. So say you want to write to a Kafka topic, that would be through an output plugin. And then we have a sort of third form of plugins which uses our life cycles.
00:24:47
Speaker
life cycles feature which is kind of a way to hook in and change the behavior of your onyx tasks so using the life cycles you can do meet things like you know add metrics without actually having to
00:25:04
Speaker
you know, for us to have to add hooks to Onyx or let's say do something like inject Redis connections and you might want to output some data at various stages of the task life cycle. So those can kind of give you a bit more flexibility as well. Yeah.
00:25:30
Speaker
And so in the workflow, because this is a declarative thing, can the workflow be updated dynamically as in, okay, I wrote a program, I defined a workflow, now I want to push changes to the workflow.
00:25:45
Speaker
Not current. Yeah, we've explicitly not allowed that, because it's always been my opinion that if you, I've seen other platforms start to implement this, I think you can get yourself into a world of trouble. I agree. Being mutable.

Onyx in Industry and Future Plans

00:26:01
Speaker
You'll have data that's in flight, and that's really the problematic scenario. What do you do with the data that's currently in flight? And at what point in time do you say that the workflow has transitioned from state A to state B? And you actually can't,
00:26:14
Speaker
answer that question in a definitive way because it's concurrent. You'll have different pieces of data at different points of the stages. Those are questions I don't want to answer and we don't have any intentions to do that. I think it's interesting to consider.
00:26:28
Speaker
Well, you're right though, Michael, you use the perfect word there. It's immutable. You know, what you want is you'd be able to reason about change. And that's the thing which I think is always a problem when you're upgrading the systems or changing a version of something. You know, if you can just mutate it on the fly, then you've got yourself, you know, you might think you've won something, but actually you've lost more. I was just going to go, sorry, go on, Michael. Yeah.
00:26:54
Speaker
You see DevOps running in the complete opposite direction of that. We have immutable deployments now. There's something to be said about not letting things be too flexible in the wrong ways. It's not that there aren't solutions for this with Onix. It would be our recommendation that if you wanted to actually change a job that's live, you would kill the job and then resubmit another job that takes over from where all of the checkpoints are at for your streaming job. There's no problem.
00:27:23
Speaker
Yeah. I was just going to go back just to rewind a little bit about the APIs and the data and stuff, something you mentioned there, Michael, which is why do other people do it differently? And it's kind of like interesting to me about the semantics of the data, because that's one of the things with Spark and all those other guys to me is that
00:27:41
Speaker
When you read their documentation, they kind of tell you, oh, well, you think you're doing this, but actually you've got to consider that. And there are a whole bunch of their operations, their distributed operations. They have to kind of tell you under the covers. Well, we're doing some wacky things here.
00:27:58
Speaker
So their APIs kind of have data semantics underneath them, which they're trying to kind of hide, I think, in their APIs with this map and stuff, map and reduce things that are really distributed map and reduce, but they don't make it very clear to you. Is that something which I get the impression that Onyx is a bit more straightforward about?
00:28:25
Speaker
Yeah, I think some of that is complexity that's inherent to the problem that you just can't get away from because of the physics of distributed computing. You're always going to have the ability to have your network partitioned and have things go wrong. You'll have the scenario where you're going to have data through reduced operation be asymmetrically sent to one node against the other and have problems.
00:28:46
Speaker
And I think that this is why you start to see, I don't necessarily think this is different with ONIX, which is why you see more people with performance problems with ONIX than you do with a database like Datomic. I wouldn't say the problems that ONIX solves are harder, it's just that the amount of work you need to do to make your program operate in a high performance setting is more difficult just because of how computers work and how they work when they're disjoint.
00:29:11
Speaker
So some of this complexity just probably won't go away because it's not a beginner's tool. It's not a beginner's domain. So I hope Onyx does address some of that. But in another sense, it's probably just not going to get easier because of the domain, I would say. Because cap theorem basically brings you problems, in other words. More or less, yeah. The thing I always point to in this case is just use good metrics. Monitor your systems.
00:29:41
Speaker
That'll help me more than anything else could Okay, so that's the kind of thing that I guess people Should be doing by default anywhere with any system, you know, whether it's distributed or not But I guess it's more important with a distributed system because there's a lot more moving parts. Absolutely
00:30:04
Speaker
So if I understand correctly, I'm trying to form some sort of a mental model of how Onyx is working. By the way, why Onyx? I mean, how did you pick this name, by the way?
00:30:14
Speaker
I don't remember, to be honest. It's been about three years now. Now you can come up with a magical story. We're trying to come up with a name for our upcoming product. And we've been at it for a good three months now. Naming is probably the hardest thing that you can do, because all the good names are taken. You want a short name. You want a name that has a domain available. It's harder than actually building the program. That's true. Maybe you can say, hey, I was sleeping, and then one day Rich Heaveke came into my dream, and he told me.
00:30:44
Speaker
He told me distributed computing is hard. I can't do anything. And they're like, OK, I should do something. Shout out to Rich. We're dreaming about you. Anyway, so to get back to the discussion, in Spark, I'm sorry if I'm bringing up Spark because that is something that I'm really familiar with, or at least a bit more familiar with, not really.
00:31:04
Speaker
In that you see that the execution model is essentially there is a driver and there is a lot of executors and the driver is essentially driving the program. And the driver is also responsible for retries, for example, if a note goes down or it doesn't finish on time.
00:31:19
Speaker
So how is this model in Onyx, is it truly like masterless or is it like a supervisor plus, yeah, supervisees, I think. How does it work in Onyx? Mike, do you want to take this question on the log? Yeah.
00:31:42
Speaker
So when Onyx was designed from the outset, I did design it with a leader in followers architecture, similar to Spark, where you not necessarily have a driver, but you do have a centralized coordinator pulling the strings on all the puppets and making things work. And I suppose after about a year after I launched it, I realized that
00:32:06
Speaker
There was some room to do something else interesting here. And I started to have thoughts about, well, what would this look like if it were actually masterless in some sense? Not quite peer to peer, because that's really a fully decentralized system. But what would masterless look like if you took the centralized coordinator out of the system? So no one was really using Onyx at the time. So I had plenty of room to play around, change the API, change everything. And I had been reading a lot of papers on databases the last couple of years. I had a lot of good things in my head about what else could you do.
00:32:36
Speaker
And one of the themes that stuck with me was the notion of a centralized log. Logs are an incredibly flexible architectural kind of thing that you can use.
00:32:51
Speaker
It kind of dawned on me that you could have a system where you don't have a centralized coordinator because you have a durable centralized log and you have all of your worker processes which we call peers replicating each entry from the log and Taking the brain that used to be in the centralized coordinator and then moving them into their own process So really we just took which was once You know a centralized process and then we moved that exact same process Into the worker we replicated it and that ended up
00:33:21
Speaker
doing really wonderful things. It didn't solve all the problems that we had. It made new problems. But you trade one set of problems off for a different set of problems. But the trade-offs are interesting because now you have a linearized sequence of actions that occur throughout your cluster in a durable manner. Again, this is unique to Onyx. And I don't know of any other platform that has this. But yeah, it has allowed us to do some really interesting things where you don't have a driver program anymore, which is pretty cool.
00:33:51
Speaker
Okay, that's pretty nice. But from the semantics, sorry, Lucas. I was just gonna say, closure test check has been fantastic when testing, building this masterless log based architecture. Because getting all of the scheduling and peer join, peer leaving issues, right, was quite difficult.
00:34:17
Speaker
Okay, so you've got a lot of race conditions going on there basically, or potential for race conditions. Yeah, potential for interleaving log histories that could end up in, you know, race conditions, cases where the peers view of the of the system is wrong, that kind of stuff. And test check was just fantastic for that.
00:34:47
Speaker
Maybe I need to ask you that question a different way then, Lucas or Michael. It's like, okay, so if you have this log in the middle, that's great for kind of like work done, which is like the Kafka approach where you log everything that you've done in Kafka or in a database log or whatever. But are you trying to say that you want to use the log as a kind of way of spreading work out? And then if that's the case, how do each of the peers decide that they're the ones to do this work?
00:35:18
Speaker
Right, so it's mostly used to decide on the peers' view of the system in that these peers are currently running the system, which peers should be scheduled on different jobs, this sort of thing, rather than actually using it to communicate the work. Ah, okay, so it's about scheduling rather than breaking the work up. Yeah, it's about getting rid of the coordinator. Right, right, right, okay.
00:35:49
Speaker
So what kind of semantics does Onyx have? Is it like, for example, only once guarantees or is it going to be something like, okay, it can retry multiple times?
00:36:05
Speaker
Yeah, Onyx offers at least once guarantees and quote-unquote exactly once guarantees. The other one being at most once, which we just kind of haven't bothered to do because I suppose I've never had a use case for it and it's not particularly hard to implement, but no one's ever asked for it and I've never needed it.
00:36:24
Speaker
It just never got done But at least once meaning that if there's a failure onyx will continue to try until it processes your data no matter what and exactly once meaning aggregations like a rolling summation where you're not gonna Take integers as they come by and then add them up and if you have a retry it's not gonna add them again It's gonna come up with the appropriate sum regardless of how many times it's retried because of a failure and
00:36:49
Speaker
Okay, by the way, is there any plan or maybe some sort of an ideas behind? Because one of the reasons why we pick Spark at the place where I'm working, I'm working for a bank. So we are looking for machine learning related things a lot, like iterative algorithms, all sorts of, well, I'm working fraud detection, so fraud detection algorithms. Do you guys have any plans or design ideas on how these things can be implemented on top of Onyx?
00:37:19
Speaker
Right, so, hmm, absolutely. It's actually a little bit tricky to do.
00:37:25
Speaker
right in the current messaging model and the current way that Onyx works, which is closest to the way that Storm works. We're currently working on implementing asynchronous barrier snapshotting, which is the scheme that Flink uses. After that's done, things like iterative computation
00:37:53
Speaker
will be significantly easier. So we'd like to see a lot of that kind of work after we get the ABS work done. OK. Yeah, basically, the iterative computing is the core of all these algorithms, right? I mean, distributing the work and then calculating it and combining it again. But that would be very interesting to have in the closure world, I would say, having a proper machine learning algorithms. Absolutely.
00:38:24
Speaker
Okay, let's, let's go to Okay, so building an onyx app. So what just consider me like a like a noob. Obviously, I'm a noob for onyx. How would I build an onyx app? So it's basically embeddable? Or is it I just start with small planning and plugin or whatever, or just keep adding keep using onyx as a library?
00:38:46
Speaker
I think this is one of the challenges of Onyx, even though it's actually a strength in the end, is that it's so malleable and flexible that it's kind of like, well, where do I start? We have an application template. We have a workshop that you can follow. We have tons and tons of documentation.
00:39:04
Speaker
Off of our website, we have first steps where you would start. I'd say the line again application template is probably the best place. If you're going to go sit down and write a serious application that's supposed to do something, you can go ahead and spin up a new one and that should put you in a good state.
00:39:20
Speaker
The difficulty being that if you're going to expose closure to such a great extent, you're decidedly moving away from framework. Framework is really great for getting people started quickly and then forcing them into doing things a particular way. But as you move far away from that, you get more flexibility and more strengths to tap into the core language, but you also make it more tricky for the beginner.
00:39:41
Speaker
So I guess it's kind of our Achilles heel and one which I'm totally fine having because, again, we didn't design this for beginners. This is really a tool for people who are doing serious production work. But some combination of starting with the application template and then having a thorough review of the documentation and learning about what it does for us before diving in too deep would be helpful.
00:40:04
Speaker
Yeah, I think the when I started looking into the docs and obviously the learn onyx project on GitHub, that was very helpful because you know, I think you have this nice test driven workflow that I need to look into the test and then basically learn the concepts one by one. So that was very useful for me. So I think if people who are interested in learning onyx, I think that would be a very nice place to start, I would say.
00:40:31
Speaker
We've done that workshop live a couple of times and it's been always a really fun thing to do. So if you know ONIX particularly well and you want to teach other people, I would recommend maybe doing a meetup and just having people walk through it and ask you questions because it's so self-guided that it ends up being easy for people to teach themselves while occasionally asking questions.
00:40:49
Speaker
Yeah, actually, in October, I'm going to do a Onyx workshop in Amsterdam. Close your meetup. Cool. So my plan is to use that one. And obviously, I'm going to bother you guys with a lot of questions working on this one on Slack, though. Perfect. Yeah. I might come across for that, Vijay. Sounds good.
00:41:07
Speaker
Yeah, sure, sure. Well, the idea is to do it in September, but I'm going to fling forward event. So that's in Berlin. So I couldn't do it this month, but I'm going to do it the next month in October. So it'll be awesome if you can join them. You know, you can get a field report.
00:41:27
Speaker
For different but so I just going back to your sort of thing about building the onyx apps though Mike Are you are you saying that essentially you're you're writing a closure program, you know, so you use the kind of The the pattern of of onyx The the mapping and stuff like that but most of the functions that people write will be closure functions that essentially operate on the data and
00:41:56
Speaker
Almost entirely, yeah. Because that's a huge win, isn't it? I mean, if people are basically bringing their own workflow, then that's a tremendous gift to the developers, I think, because they can bring in their own libraries, their own functions. They don't have to rely upon... Well, they can rely on the ecosystem of Clojure, but they don't have to rely on some special ecosystem.
00:42:21
Speaker
We hear that a lot because people will start to get interested in Onyx and see that you primarily use plain closure functions and data. And I think I have 90% of the work done because I wrote a single JVM app or I wrote something in a different kind of framework that allowed me to still use closure. And so all I need to do is figure out the plumbing and the wiring to stick it all together because none of that has to change. And I think that that's a big win for engineering costs. Excellent, yeah.
00:42:49
Speaker
By the way, the whole distributed state management with log, maybe this is completely unrelated, but I'm just going to throw it in. Did you guys take a look at about at some point that was supposed to be distributed atom implementation?
00:43:04
Speaker
Yeah, I think I have seen that. I think about has primarily been abandoned, unfortunately, because it can't support the semantics of some operations because Zookeeper doesn't support continuous watch triggers, which is when a value changes on a Z-node, it will continuously update any subscribers, which is what Closure does, but Zookeeper doesn't allow that, so you can't really match the semantics.
00:43:31
Speaker
Okay, so that's not really yeah, I remember looking at it. I think even years ago, several years ago, I think that was supposed to be like a distributed atom but managed by zookeeper and then state you can save it to I think there were plugins for MongoDB at that time. But anyway, as you're pointing out, I think it's essentially an abandoned project right now. But aren't we aren't we talking about here, though, about immutable data anywhere. So the distributed atom bit, is that really a thing?
00:44:02
Speaker
Actually, our state management, Lucas wrote almost the entire thing. You can talk about it. It's intentionally designed to be relatively immutable.
00:44:12
Speaker
Yes, so the way we built a windowing and state management feature is that you're essentially building a replicated state machine that your data is being pushed through. And that's all journaled to BookKeeper, which is a distributed log. And this provides nice fault tolerance semantics. If you have a node that crashes,
00:44:41
Speaker
Another node will pick it up, replay the state machine log, which will get you back into the state that Windows were in and your aggregations are in, and then continue going. Excellent. OK.
00:45:00
Speaker
So now one of the challenges for us at least, especially getting into distributed computing is the deployment stuff. So if you see other, I think deployment has been one of the biggest painful parts in storm as far as I remember.
00:45:19
Speaker
And obviously the other stacks, they have their own way of running on top of existing ecosystems like Yarn or Mesos. So how do a developer or a DevOps guy, what is the recommended way of deploying on X?
00:45:38
Speaker
So the tricky part with deploying Onyxx, which I'd used a lot of the tooling that existed before I built it. And I was really frustrated because the primary way of doing this was building against this framework, creating an Uber jar, and then submitting it to some massive process. And while that actually does the orchestration pretty well for the transportation of your program to the various nodes in your cluster,
00:46:01
Speaker
It makes so many assumptions about how the way the things ought to work that you lose any benefit, I think, that you got in the first place.
00:46:15
Speaker
a la carte deployment process where you create an Uber jar for your worker processes, and you actually invoke a function to start up all of its thread listeners, and then you block. And so it keeps it online as a persistent daemon. So virtually all of the infrastructure related to doing deployment is kept outside of ONIX itself, which means that you can run it in a container, you can run it on Kubernetes, you can run it on Mezos.
00:46:40
Speaker
it just doesn't work that way out of the box. So you have to do a little bit extra footwork to set it up the way you want it. Now, my hypothesis is that you were going to have to do that work anyway, regardless of if we ship something that was kind of prepackaged and you would kind of have to backtrack and then set things up the way you want it anyway. So it may be one pain point for Onyx, but I think it's one of those things that pays more down the road as you use it continuously.
00:47:03
Speaker
Okay. Yeah, I mean, I can tell from my experience that the way that we deploy Spark applications is basically driven by Ansible and we have some sort of an entry node to the cluster and then we jump to it and then from there we do the Spark submit, but we use YARN as a resource negotiator. But ONIX doesn't require any resource negotiation frameworks, right? None, only if you want to have it. You can go ahead and integrate it at that point.
00:47:30
Speaker
I think we could do a bit better of a job of adding some extra tutorials for Mesos and Kubernetes and Co.
00:47:40
Speaker
Yeah, yeah, that'll be very useful because whenever I start with this kind of experimenting with this kind of things, I just boot up a cluster with, I don't know, six or seven machines and then I go through the entire life cycle, you know, build a program, deploy it and see how the framework is supporting me in terms of deployment, what kind of access do I need to give to the DevOps teams because these are all big problems, you know.
00:48:04
Speaker
especially if you want to get into this enterprise-y mode of deploying applications. So there it is like super painful situation because I ask, okay, can I get this shell access? Then everybody will be like, no way. So how do I deploy this thing? Because I need to type in spark-submit.sh everywhere. So that is one of the bane of my existence right now.
00:48:31
Speaker
Anyway, let's get into the next interesting thing. What kind of tooling and available around Onyx? I know Onyx Dashboard is one of the projects that you guys are working on. Can you give us some insights on what tooling is available right now for Onyx and what are your plans? Onyx Dashboard is a great one.
00:48:54
Speaker
It's primarily used at the moment to give you a view of what's going on in your cluster, what nodes are currently up, what's the overall state of your cluster, and give you some visibility into what jobs are running, what catalog is contained in the job, what does the DAG graph look like, and so on.
00:49:18
Speaker
Yeah, and we generally recommend that when new users are getting up and running with Onyx that they have a look at what's going on in the dashboard because it gives them a lot better of an idea about what's actually going on internally. Yeah.
00:49:35
Speaker
So is this a separate application that needs to be run or is it inbuilt into Onyx itself? It is a separate application. We are also going to be releasing a sort of daemon that you can start up with your Onyx peers, which will give you visibility into what their view of the system is.
00:50:02
Speaker
So you can see what nodes do they think are online? What job are they actually running? So that could give you quite a bit of the information that the dashboard would give you currently without some of the overhead.
00:50:19
Speaker
Okay, so what is the kind of communication that we need between ONIX dashboard and the peers? Where is the information coming from? Because of the way that a cluster is designed using the log orientated architecture,
00:50:36
Speaker
The dashboard only really needs to look at Zookeeper and read the job data and play back the log that Onyx uses. And then as a result, it will actually know what the current state of the cluster is.
00:50:53
Speaker
Right. Okay. So it's nice and distributed and doesn't have to reach out to each of the nodes in turn and query them and have all this kind of crosstalk. Right. Not unless they want to do some sort of health check on the nodes. No. What's fascinating is that it can almost function as a read only peer in the cluster where it's listening as if it were exactly a node in the cluster, except it's not doing anything.
00:51:19
Speaker
So it's just passively listening to all the activity in a durable context. So the design allows for really good extensibility and Onyx Dashboard is just one of the examples where we tap into the log in order to almost participate in a passive manner. Okay, so actually the log provides you with a kind of database to snapshot and to interrogate and to analyze the current flow of work in the system.
00:51:46
Speaker
Exactly. Absolutely. It's shown itself to be very handy as well when we're debugging issues with clusters because we can just take a dump of the log and then play it back step by step and look at each node's view of the cluster. Yeah, it sounds like a huge win actually from a design perspective as well because you don't take any performance hit for monitoring.
00:52:13
Speaker
Somewhat true, although we do actually need to have node level monitoring on what work each node is actually doing at any given time. The log doesn't actually give you any visibility on how much data is passing through the system, what's the throughput, latency, this sort of stuff.
00:52:40
Speaker
So how do you collect that bit of information then on top of the worker log? So we have another plugin which is Onyx Metrics which provides outputs for a bunch of different metric systems. So if you're using Riemann or New Relic
00:52:59
Speaker
or Datadog. You just hook it up, have it connect to the right place, and it will emit metrics for all of your jobs. OK, excellent. Yeah. So you can use the same operational tooling that you're using now just through one of these plugins. That's right. So the performance-wise, sorry. I was just saying it was sweet. That's all.
00:53:28
Speaker
Sweet. Sweet. So we're talking about performance, right? Because nowadays, there is this huge bleep measuring contest between all of these frameworks. I'm faster than this thing. We can do real-time analytics. And you can do this kind of crap going on a lot between all these frameworks these days, right? I mean, Spark keeps on saying, we are the fastest.
00:53:56
Speaker
Flink says now we can do this stuff better and these kind of things. But when I see Spark API and all that stuff, because that is completely different model as you were pointing out before, that RDD is the central idea behind Spark and then of course data frames and data sets.
00:54:11
Speaker
And the programs themselves are not real Scala programs, but Spark programs. There is a slight difference between writing a Scala program and a Spark program because there are, well, RDDs are not flat mappable, blah, blah, blah, and whatever. Now, how do you see the performance characteristics of Onyx? Because it's really like just using plain closure, right?
00:54:33
Speaker
Right. I would probably start by saying a lot of these benchmarks, as you're saying, are borderline infuriating because they're so context dependent. As someone who's, you know, we've designed these things and we understand how many, how much complexity is involved and how specific.
00:54:49
Speaker
these tests have to be. So having a full comparison across the board, unless it's for your particular application, you should really take it with a grain of salt and instead understand how the design works under the hood. You'll get miles more of productivity out of that instead of looking at a graph and saying, oh, this bar's higher.
00:55:06
Speaker
Yeah. So maybe it's about fundamentals, Michael, you know, it's about what kind of things are essential barriers or enable us to performance. Maybe that's a clearer kind of discussion.
00:55:21
Speaker
Yeah, so while the API is primarily closure and its implementation is closure all the way down, a lot of the designs across these various architectures and products are pretty, they're not the same, but the ideas are similar because you're trying to make progress on a particular
00:55:40
Speaker
set of data. You're trying to do it in a fault tolerant way. You're trying to do it in a way that requires as little messaging as possible, as little checkpointing as possible. And so really, how do you do the least amount of work and have the least amount of overhead while not disrupting your application's progress? And so we have a fairly good performance profile right now using Storm's record at a time model, which is
00:56:04
Speaker
essentially a way to track a record's lineage throughout the system in a very performant manner, but we're switching to what Lucas said, ABS, asynchronous bear snapshotting, in an effort to reduce the number of acknowledgment messages, the number of checkpoints that you have to do. And so I'm hoping that our performance is going to get really, really good after we do this, because we're able to do it in a way that no one else has right now,
00:56:34
Speaker
ABS is primarily implemented through a pull-based architecture, and we've come up with a way to do it through a push-based architecture, which usually push-based will be faster than pull-based. So this is another case where you look at how things actually work and their mechanics are more important than any particular benchmark.
00:56:53
Speaker
But maybe as if you can run the benchmarks, Michael, you can stop the benchmarks being infuriating to you and make them infuriating to everybody else. Yeah, I mean, I suppose we could always publish something and say we're fastest because I'll play the game. It's just like a news case for you. Yeah.
00:57:11
Speaker
Well, it's just like our podcast being the world's number one R-rated vegetarian closure podcast. In that specific area, we are the number one. Yeah. At the end of the day, you only need so much performance in it if you run the application and it's good enough for you.
00:57:28
Speaker
then I mean who cares what's the absolute optimal conditions that this thing can run under. One thing we've seen quite a lot of is after a decent amount of optimization work we tend to find most users performance problems are easily dominated by the work they're actually doing themselves. So what are their closure functions doing in between the tasks?
00:57:53
Speaker
possibly sometimes there are cases where they need to optimise by collapsing down tasks to eliminate communication between peers, but that kind of thing is more about thinking about how your tool is working, not so much what Monix's internal performance is like.
00:58:12
Speaker
I guess what most of the things that people compare performance with is, I mean, I think there's two aspects to using any tool, isn't it? One aspect is you want to make sure that the workflow and the style of programming and the kind of tooling is up to scratch or meets your cultural expectations.
00:58:32
Speaker
but then you want to make sure that also from a management and cost perspective you're not buying into something which is inherently more expensive than something which might be slightly less comfortable but is going to cost you
00:58:48
Speaker
50% less on your Amazon bills or something. So I think that's what drives people. So I guess that's what drives a lot of these benchmarks is in the end people have to rent their machines so they want to make sure they're not going to be overpaying. And as long as you're in the same kind of
00:59:06
Speaker
ballpark in terms of rental costs for compute, then people will focus more on the more hygienic aspects of how you work with the APIs, how you work with the tooling and that kind of stuff.
00:59:24
Speaker
I think this is one of the drivers of why people are able to build businesses around these products despite them being open source, which is what we're doing now. Onyx is entirely open source, and yet we're able to construct a business. I think it's primarily due to the fact that you can shadow a lot of the problems of actually hosting and running these things.
00:59:45
Speaker
and come up with a cost model that's meter-based and usage-based rather than machine time, you can build a viable business model because people are willing to go ahead and focus more on the things that matter from an engineering perspective like API clarity.
01:00:01
Speaker
Yeah, I hope so because that's the thing which I think in the end should drive progress, you know, rather than the, because kind of look back at the technology wars, you know, and people like, I dread to think, you know, that Oracle won the database war because of its performance, you know, so it would be horrible if we repeated that mistake, you know.
01:00:28
Speaker
Sorry, Vijay, I know you're a fan, Vijay, sorry. Me, no. Well, I got paid by Oracle at some point, so I have some history. Sorry, that's just a bit of trolling, bit of in-house trolling there. I have been to the land of Mordor and came back.
01:00:50
Speaker
Anyway, so because there are a lot of lots of other technologies in this space, right? You know, I know. So Lucas, you're explaining that the plugin architecture. So how well does on explain with other technologies? So what kind of tools are available right now? What kind of plugins are available?
01:01:07
Speaker
Right, so we do a pretty good job of supporting a lot of the main ones that people ask us for. Our first couple plugins were Kafka and Atomic. They would probably be our most frequently used plugins still to this day.
01:01:27
Speaker
Kafka because it's a really great fit for the streaming model that Onyx uses and it's being used a lot more in industry I would say. And Atomic because obviously
01:01:45
Speaker
A lot of Clojure users use Datomic, and one thing that's actually been a really great fit with Onyx is using the input plugin that reads the Datomic transaction log, because that allows people to basically replay what was going on in Datomic or react to new events that are happening or being written to Datomic.
01:02:15
Speaker
As far as the other plugins, we have an SQL plugin, Redis plugin, S3 plugin, Amazon SQS plugin, and a few others I'm probably forgetting right now. But we continue to add them as we can and as we get time or as people contribute them.
01:02:36
Speaker
Those are plugins, Lucas. Are they based on a closure protocol? Or so is it relatively, how do you implement them? If I wanted to do one for, I don't know. MongoDB. Pardon me? MongoDB, the snapshot of the databases. No, I'm not going to do it for MongoDB. No, but maybe I would do it for. Rethink. Oracle. I was going to say DynamoDB or something. You would do one for Oracle VJ. You could do that.
01:03:06
Speaker
There has been one DynamoDB one that is quite a bit older. In the last few versions, we switched over to a protocol-based implementation of the plugins. As long as you implement either the input protocol or the output protocol,
01:03:27
Speaker
usually pretty much good okay so how much work is that i mean like how many how many uh things do you have to implement to satisfy the the protocol do you end up hooking to a lot of your log things or is you know under the covers or is it is it pretty much you know if you can essentially tap into
01:03:48
Speaker
the input like say okay I know what the API for DynamoDB is for example so I can find out okay how I make a connection to it and then how I read the data. Am I pretty much good at that point in terms of your plugin or are there more specific things to do? So for the input plugins
01:04:08
Speaker
you do need to do a lot of the work yourself. So input plugins are actually significantly harder to write than the output plugins. They're not too hard, but they're harder. Now this is because you have to implement a lot of the stuff like checkpointing where you're up to in the stream. And also dealing with the retry semantics.
01:04:35
Speaker
Basically, you're implementing the fault tolerance aspects yourself. This will be getting a lot easier in the next version of Onyx though. I'll wait then, I'll wait. I'll wait till Onyx will. I'll try not to avoid us. We were just... You just want to do the easy work. Exactly. Do the heavy lifting, Lucas, come on.
01:04:55
Speaker
We were just talking about not Osborne-ing ourselves and preventing people from actually writing plugins. Output plugins won't be changing much though. Those are actually quite easy because you don't have to deal with the fault tolerance aspects. You just have to take a batch that's given to you and write it out somewhere.
01:05:16
Speaker
So the plugins are essentially, if I understand correctly, that you implement a protocol and then there is a function that gets the segments, right? Yep. Okay. So a segment is essentially a map of information or a closure map. Yeah, they're essentially akin to a record in other systems.
01:05:38
Speaker
Yep. So maybe an interesting one would be, not Mongo, but maybe it's Rethink DB. Are you guys seeing much of that recently? Yep. There is a plugin out there.
01:05:54
Speaker
No, that's what I was going to say. I was trying to recall whether someone wrote one, and I was pretty sure that actually happened at some point. Nothing really bad. Yeah, I was going to say, we could just list, maybe it's CockroachDB. You know, list the most kind of gruesome names of all databases. That one actually looks really interesting.
01:06:14
Speaker
Yeah, it's like a survivable database that doesn't die, apparently. Yeah, that's the thing. That's why I call it Cockroach, isn't it? It's highly resilient. Yeah. Yeah. I love their motivation, though. So basically, we need to write CockroachDB, data into CockroachDB, and then send it into the space, and that's it. We dropped a nuclear bomb on it, and then we relooked, and it was still there.
01:06:35
Speaker
Exactly. And by the time it reaches Alpha Centauri on the new habitable planet, they should probably know how to write 1X plugins. I was going to talk about SpaceX there, but maybe we shouldn't do that this week. Stay sane, man. Okay. Talking of testing.
01:06:58
Speaker
No, maybe we should talk about SpaceX, then we'll see an angry tweet from Elon Musk and then we can invite him onto the podcast. Yeah, we can. That'll be awesome. But anyway, so testing, yes. Okay, sorry. So, Ray, you're taking the discussion towards testing now, right?
01:07:19
Speaker
So how do we, how do you guys test Onyx? Because I saw some tweets by you guys about Jepsen and how it helped you. So can you give us some insight into how you use Jepsen, Jepsen or Jepsen? Jepsen. Yeah. Maybe you should give a quick, like quick intro to Jepsen actually. Cause it's a, I don't think everyone knows it. Probably quite a lot of people do, but maybe it's Michael or Lucas. Do you want to give a quick background on Jepsen first?
01:07:52
Speaker
Yeah, go ahead, Lucas. This one parts your baby. So, Jepsen is a library written by Kyle Kingsbury, which he built to support his Call Me Maybe series, which is a really interesting series of blog posts, which describe how he's tested various distributed systems, primarily databases.
01:08:15
Speaker
Yeah, excellent, excellent series of videos, I think, and blog posts. Yeah, and I think he's really done a lot of good work in changing expectations as well about the level of testing in the ways that different products are tested. Because I think there's been a lot of cases where products are put out
01:08:39
Speaker
into the wild and used in production before they're really ready. And I think he's just really raised the bar for what's expected. I think what he did especially, I think what he did especially was he called people out where they were saying they were safe in a distributed split-brain world. They were resistant to it. They were okay. And in fact, his tests proved that they weren't. And I think he was calling bullshit on a lot of people basically. And I think it's fair to say that. Yeah. And even in cases,
01:09:09
Speaker
where the products don't support it, you know, these things should be documented so that practitioners can make the right choices about the products that they're using. Yeah, because I think a lot of this CAP stuff is all about you making the choice over availability or consistency or whatever.
01:09:29
Speaker
Right, and so at the time, two-member team, we felt that it was important for the sake of both professional pride and also trying to prove that our system, Onyx, is production ready. So we put it through, we built some Jepson tests with it. So we used his library to build some
01:09:57
Speaker
tests around getting jobs running on a five node system and then partitioning them from each other and then checking the results that came out of the job and making sure that no data was missing or in the case of exactly once aggregation that no value was counted twice for example.
01:10:23
Speaker
Even though we had a lot of property tests that ensured that some of the testing space was covered, we found that Jepsen picked up a lot of issues that weren't covered by testing in a pure way. We found that there were a lot of issues around connection handling, for example.
01:10:45
Speaker
or concurrency issues that are picked up. And yeah, we feel that we couldn't have picked up these issues without using something like Jepsen. Fantastic, yeah. So did you talk to Kyle about it? Or has he been involved a bit? Or is it just you're using his tools and kudos to him?
01:11:07
Speaker
Uh, we have talked to him a little, um, we paid for a few hours of his time, but he wasn't, um, he wasn't super involved in it. So we only have ourselves to blame for the methodological issues in whatever our testing did. Um, but yeah, he was super helpful. Um, and highly recommend that, um, people go out there and hire him.
01:11:36
Speaker
So do you get a picture of Carly Rae Jepsen now on your website to prove that you're past the test? Well, what he said was that there is no proving. You can only prove that it doesn't work. It's an honor system. After having been through the whole process of testing it in various ways, including Jepsen, if you're writing a distributed system, I can't imagine that you could have real confidence in it without doing
01:12:03
Speaker
if not Jepsen, some kind of simulation testing, because the number of things that we've seen go wrong is just unanticipated. I'm not saying our work was bad. We put as much effort into our work as we possibly could. It's just inhuman to think you will catch all the things. And some of the things will be really bad and embarrassing. So if you're an aspiring distributed system designer, yeah, really, go out and try to beat this stuff up with the best tools you can. We practice Jepsen-driven development now.
01:12:33
Speaker
Well, I had to do some data cleanup these days. We have around 600 gigabytes per day and I wrote a Spark program and obviously I'm a noob and the village idiot.
01:12:49
Speaker
So it's basically cleaning up huge amount of chunks of JSON data or transforming it into something. So initially I thought, okay, I'm going to look for this data and I don't want to construct a lot of objects. So I didn't use any string builder or anything. I'm just going to do string index of substring. So it's going to be really memory efficient. And in the case of replacing some data in the string, I used replace all instead of replace.
01:13:19
Speaker
And I ran it for two days or something, and everything was going hunky-dory, so I didn't check the results. And then later, I figured out it was filled with exceptions, because the replace all treats the string as regular expression, and replace treats the shit as normal strings. So I used replace all. Yeah, that was fascinatingly painful thing to realize after processing, I think, around, I don't know, 1,000 gigabytes of data. Yeah.
01:13:54
Speaker
I remember this YouTube video, by the way, there was somebody recording developers and everybody was on headphones and all you can hear is shit, fuck. Anyway, that's one of my kids play video games. Exactly.
01:14:12
Speaker
Yeah. So as you can see, the kind of errors are at a distributed scale, and it is like big data scale. Like, holy crap. I mean, it has been running for three days. And my data, when I looked into the data to test what is happening, I put in try, catch all errors, and then write this line. And the whole file is filled with those lines. Yeah, what's clear is that as you get up towards that level, you need a different set of engineering tools to be able to work efficiently. Yeah. That's the primary challenge.
01:14:41
Speaker
Exactly. And this is after tweaking all sorts of knobs on Spark, because to make it run faster, to make it all sorts of stuff, and spending a lot of brain cycles. I'm really amazed with all the work you guys are doing. It's a very, very hard problem, I can imagine. Even for an idiot like me who is just using the tools. You can imagine building it. It's really fantastic, by the way.
01:15:09
Speaker
I was just going back to your point there, Vijay. One quickie is one of the things that I liked about closure at the beginning, actually, was the data set, the data structure things where Rich was very clear about the big all notations around operations on data structures. I think that's the kind of thing, isn't it, that we can't necessarily get for these distributed data structures. But if we could have something similar to that,
01:15:37
Speaker
Yeah, I think for big data, I think there should be big F notation. Right. Okay. So for normal computation, this is going to be big O log F. And this is going to be oh fuck notation, you know. This is crazy stuff. Anyway, so
01:16:05
Speaker
Obviously, so you guys are busy scaling up Onyx as a company. So can you guys give us some idea about who is using Onyx and how? Some use cases or maybe some name dropping? We're seeing a lot of, go ahead, sorry. All right, go ahead.
01:16:24
Speaker
I'm sorry it's a plate and we're starting to see a lot. Can someone go ahead? Yeah. There is some sort of problem with the later election here. Consensus between two processes is already difficult.
01:16:40
Speaker
Hold on, we're going to run Paxos very quickly. No, we're starting to see a lot more people move into real-time analytics. That's kind of been a sweet spot where latency needs to be relatively low, streaming-oriented problems. We've even seen a little bit of action in the financial industry.
01:17:00
Speaker
no name dropping there, but we're starting to see Onyx be used to deal with money. That's a point of pride to see things get stable enough where you are willing to do some real-time financial stuff is pretty great. It's certainly proven to be more stable. The number of use cases that I know about in production is probably somewhere in maybe 10 companies range, but we have people drop in the channel every now and again and say, well, we got this Onyx system. I had no idea.
01:17:29
Speaker
Like, wow, I didn't, you never even had any questions for us. That's fantastic. So really who knows how many people are using it? I'd project maybe somewhere in the 15 to 20 companies, Ranger in production. No, I know for sure that Cognition uses Onyx, right? To my end, Robert Sortford, who was in here for talking about Datomic, he wrote a wonderful blog post. Yeah, you should put that in the show notes. It's a great post.
01:17:59
Speaker
Okay, so now the next question. First of all, not first of all, but probably last of all now. A very big congratulations on getting the funding for Onyx. So your company distributed masonry. Now that shows that there is a lot of trust, even from the external people, I hope, you know, making sure that Onyx has bright future. So what are the future plans for Onyx?
01:18:24
Speaker
Thanks. First of all, we were able to raise an investment round of $500,000 to be able to push us through another year, year and a half. Really, we're looking to shift towards being strictly a consultancy into a hybrid of a
01:18:42
Speaker
a consultancy and a product development company. Really, the goal of Onyx from the outset was to lower engineering costs for building these kinds of systems. It's clearly a very expensive endeavor. It takes a lot of time. And so building Onyx was an exercise to see how low we can make those costs.
01:18:59
Speaker
And what it came down to was that the costs still aren't low enough for me for us to find that there has been an acceptable change. It's gotten better. And I think primarily Onyx has done a good job of increasing confidence as you're building these systems because it's more clear about what you're building.
01:19:17
Speaker
if it's correct, how it behaves. And so confidence has increased, but actual time for moving from scratch into an onyx production system, I don't think is significantly better than any other tools yet, just as a field study from what I've seen. Maybe that's wrong, but that's just my personal feeling on the matter. So the next step to actually trying to decrease engineering costs is to come up with a more holistic solution.
01:19:44
Speaker
We've had years of experience of looking at how people use onyx, where are the pain points, and we've identified the things that would actually in fact have a real impact on
01:19:57
Speaker
how quickly you could move an onyx system from nothing into production. Primarily, we're finding that if you can tighten feedback loops in particular ways, you can increase someone's understanding and get them to move a lot faster. We're looking forward to coming out with a technical preview in the next couple of months of what we've built.
01:20:16
Speaker
I would say right now we're wielding the power of closure and closure script to be able to let you understand what you're building at a significantly faster rate and in ways that other people really have not seen before. So I'm excited to be able to show that off pretty soon. Sounds pretty exciting. And then of course, fingers crossed and we'll wait for the stuff. So right now, distributed masonry is three persons, right? Or if I get that right.
01:20:45
Speaker
Yeah, that's right. Immediately after we took the round, we hired Gardner Vickers as a longtime contributor to ONIX. OK. OK. So I have two questions. So how does somebody get into ONIX code base? So what are the interesting places where people can contribute to ONIX right now?
01:21:05
Speaker
I would say the first place you could look would be the plugins, I would say, because it gives you a reasonable idea of how the fault tolerance mechanisms work. And it gives you an opportunity to contribute something that's kind of self-contained and kind of grows the ecosystem a bit.
01:21:30
Speaker
Yeah, I think in addition to that, looking at the Learn ONIX workshop and then using that as a springboard to investigate how ONIX core is actually working and kind of tinkering around with the various knobs would be a good start. That's pretty nice. And my second question is, are you guys hiring? Soon. I'd say we need a couple more months, but we are definitely looking to expand our development team later in the year and into next year, much more.
01:21:57
Speaker
Okay, so you heard here first, so distributed masonry will be looking for distributed computing programmers. So guys get into ONIX and then build plugins. That's what my message for our audience today.
01:22:13
Speaker
Okay, I think we covered a lot of ground today. Is there anything I missed? Right? I guess I don't know. I don't know if this is a tricky area, but I'm interested in the kind of... In a job. In a job.
01:22:36
Speaker
Yeah, come on, guys. Just say it. You need to have proficiency in big F notation. I can't prove myself as a big F. No, I was going to talk about some of the kind of emerging distributed computing thing that's going on, like lambdas and
01:22:59
Speaker
Other things I don't know if you guys are what if you have a story or thoughts on them on the kind of process stuff or the the serverless stuff maybe maybe that's kind of outside of Onyx at the moment. I don't know. I'm just interested in that in that in that concept.
01:23:20
Speaker
Yeah, I think that kind of comes along with the whole notion of doing a more holistic platform. We have some ambitions to remove the operational and the hosting requirements of Onyxx and then shift that onto us, which is a pretty typical play for anyone doing a platform as a service. You're just going to move away all the networking drudgery that comes along with it. But we're really shooting for some middle ground where
01:23:44
Speaker
We're not, I think serverless is interesting, but it's a little too constrictive, in my opinion, for the amount of expressivity you can have. So I would say you'll start to see Onyxx move in a direction from the hosting perspective where it's a little bit closer to that, but we'll still hopefully maintain the richness of the API that we'll still be offering in kind of the same breath.
01:24:09
Speaker
Okay because the reason I mention it is because some of the things that we're looking at in the stuff that we're doing is this kind of event driven stuff where you can drive events through logs and streams and stuff like that but
01:24:29
Speaker
But for the slower moving things, then the serverless stuff is quite good for that. I would agree. Actually, I think serverless doesn't seem so good. It's definitely not good for big data, actually. But it seems pretty good for stuff where you don't want a kind of server up all the time. You only want it like an hour a day or something. Right. I think that's really what we thought is.
01:24:57
Speaker
Sorry, Michael, I sort of stopped, but then I carried on. So protocol problem. I'll let you finish. No, hangouts kind of burped on us. I was just saying, I agree. I think that's really where it's sweet spot is. It's a good use case for when you need really limited usage and you need it in an ad hoc way. That's what it was designed for. And using it for that is fantastic.
01:25:19
Speaker
Okay, so horses for courses. I think we're right that what we're talking about generally with ONIX is dealing with large scale data sets and large scale problems. And like you say, platform as a service would make things very nice. Yep. Okay, so can I ask one final bonus question?
01:25:43
Speaker
OK, so Mike and Lucas, so you guys are doing a lot of research in distributed computing and spending a lot of brain cycles on it. Do you guys have any recommendation for reading a paper or something like that? Anything would you recommend? Book, probably.
01:26:00
Speaker
I would recommend reading the Morning Paper blog by Adam Collier. Yeah. Adrian Collier, sorry. Yeah, fantastic blog. He summarizes and explains things in a really good, easy to understand manner. And I always pick up really interesting things there. Yeah.
01:26:28
Speaker
OK, great. So I was just going to say, is there anything from your perspective, guys, that you want to add or talk about or any upcoming things outside of the things we've talked about already that you want to mention? Is there a secret fork of Onyx in Haskell? No, we just, no time for that.
01:26:56
Speaker
No, I think we covered the ground pretty well. Those are the main points of Onyx. That was a good discussion. Yeah. I noticed, by the way, that I've talked to you a couple of times back and forth on the Slack channel. And that seems to be a very good place to go and ask you guys questions and get into the sort of swim of Onyx and the feel for things. Yeah, the Clay Jarian Slack channel is the best, yeah.
01:27:26
Speaker
Excellent. OK, so I think that's it. We can round it up, I guess. Yeah, fantastic. Thank you very much, guys. Yeah, thanks a lot, Lucas and Mike joining all the way from all sorts of weird time zones today. It's very distributed in every sense.
01:27:50
Speaker
We learned a lot of things, and especially for me, it's been a fantastic discussion. And by the way, thanks to you, I was able to learn a bit of ASCII doc now. So that was useful for me. Thanks for that. We really appreciated that. I'd vaccinated the work.
01:28:09
Speaker
I finally figured out those weird syntax in ASCII.now. So that was fun. But otherwise, for people who want to get into Onix, I could not recommend more the Learn Onix thingy. And obviously, I'll try to do the workshop in my Amsterdam Closure Meetup, and I'll poke you guys for questions. And I know that you guys are on GitHub and Closureie and Slack. And there is also a mailing list, as far as I remember.
01:28:39
Speaker
Yeah, so those are the ways you guys can get in touch with Onyx Brains. And so that's it from us today, I think. We will post the notes on Deaf and Audio. And obviously the MP3 will be available on SoundCloud and iTunes and all other popular channels.
01:29:03
Speaker
If you can leave a review, that would be very good. And Rich, if you're listening, tweet again, that'll be also very good. So keep it going. Oh, one fan, go on, sorry, Vijay.
01:29:18
Speaker
I'm just going to say, are we done? Because I just want to say final credits thanks to, okay. So just a final credit to Pizzieri who does the intro and outro music. If you can give a better love to his SoundCloud that would be good. The track is Melon Hamburger and it's a vegetarian intro outro music.
01:29:39
Speaker
So that's great. Okay. Right. Thanks a lot, Vijay. Great stuff. And we'll be back to Village Idirachi next time. And we'll be minus the big brains of Michael and Lucas. So thanks a lot, Michael. Thanks, Lucas. Thanks for having us, guys. Thanks. It's been fun. Thank you. Bye. Bye. Bye.
01:30:15
Speaker
you