Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
#6 - Concurrency and Parallelism image

#6 - Concurrency and Parallelism

defn
Avatar
67 Plays9 years ago
A rambling discussion struggling to come to terms with concurrency and parallelism See the show notes at defn.audio https://defn.audio/2016/07/25/episode-6-concurrency-and-parallelism/
Transcript

Introduction and Feedback

00:00:18
Speaker
Right, hello, welcome to Zephyr, episode 6. No guests, nothing else, just myself and Vijay. How are you doing Vijay? No, I am doing good. I think this is just, as you said, two village idiots talking about collision. Yeah, I'm sort of hopeful. No experts to correct us.

Building an Online Community

00:00:37
Speaker
Yeah, it's just going to be a total BS this week, I'm afraid. Yeah. Right. OK, so should we do a quick follow up from last week, last time? Of course. So last week we had super fun with Misha talking about Hoplon and, of course, some nuclear issues as well. But it was very super fun to make and I hope you guys enjoyed that one.
00:01:02
Speaker
So that's and we have been getting some pretty good feedback and from from the slack channel and also online as well on reddit we have been raising to the top every now and then on to the closure a subreddit page with Spectacular two digit words, so please keep clicking
00:01:21
Speaker
I haven't seen actually. So we're up to two digits now. Yes. This is very good. It's fantastic. And it's amazing. Oh, we've actually got we've went into double figures on the Deaf and podcast channel as well. On Slack. Yeah, that's true. That's true. We're really building a huge community out there.

JDK8 Support in Clojure

00:01:40
Speaker
Yeah, now I'm actually sporting a fake mustache whenever I'm getting out of the house, because now I'm a minor internet celebrity. I'm looking out for paparazzi and everything. But anyway, so it's been amazing. And thanks for your feedback. So before just recording, we were talking about JDK8 support in enclosure.
00:02:06
Speaker
Yeah. Actually, the reason why I was interested in it, Vijay, was because I'm doing a little project to do with event streaming, actually, to do with Kafka Streams Library, which is a new library. But a lot of these big data libraries I've noticed, we've talked about this ourselves before, the fact when we first met in
00:02:28
Speaker
in Amsterdam, Closure Days, about the fact that Spark and new systems like this Kafka Stream library, Flink, a lot of them have Scala APIs or JDK 8 APIs, but they don't have Closure APIs.
00:02:48
Speaker
Even though Closure and Big Data play very well together, the APIs don't seem to be coming out there. Now I know we've got some things that are dedicated, Closure-wise, like ONIX and stuff like that, for Big Data. But obviously the whole point about Closure is to do interoperability and to leverage the work that's out there in the JVM community.
00:03:13
Speaker
So, it's a bit of a shame that we're not getting this Javaite interop because we can't take advantage of these lambdas and streams and stuff like that. Actually, I'm saying that. There's a guy who has written a library called ik.cljj that allows you to have some of these things. And he seems to be actively developing it, which is good.
00:03:36
Speaker
But I'm thinking that it would be very nice, even if we don't require JDK 1.8 to have some of the core team at least, having a look at this interop to make it a bit more core to language.

Concurrency and Parallelism Intro

00:03:49
Speaker
The thing I was saying before was that we're gonna talk about reducers today, and that requires essentially JDK 1.7. You know you can patch JDK 1.6 with some contrib library, but really,
00:04:03
Speaker
I think for all intents and purposes it's a dependency on JDK 1.7 and there was no problem with that. Obviously, let's just do that.
00:04:14
Speaker
And also, I didn't see anything on the dev.closure.org wiki as the next release planning. There isn't anything, any discussion or something happening to support the lambdas or streams or having a proper interop though. I'm not sure if there are any plans, but I guess we need to wait, I suppose. Because 1.9, the major feature is specs, right? I don't see any other major things bundled into 1.9.
00:04:39
Speaker
No, that's right. Inclusion 1.0. That's the big one. Obviously, the thing about 1.9, and Alex Malera said this, is that it will still only require 1.6, JDK 1.6. So there's no chance of them mandating 1.8 for closure at 1.9. But they are doing it for some of their other libraries. So Datomic, for instance, now requires JDK 1.8.
00:05:08
Speaker
the latest versions of Zetomic require JDK 1.8, which is, of course, sensible. I keep on saying 1.8, by the way. It's kind of crazy. JDK 8. Because I'm such an old JDK guy. It's not 1.7 and 1.6, is it? It's JDK 8 and Java 8 and Java 9. I think it was the Java 5. Java software development kit 1.8 or something.
00:05:37
Speaker
Probably, yeah. There's probably some hidden 1.8 in there somewhere. Anyway, it would be interesting to find out if anyone else is having similar experiences there with this lack of interop with JDK8. I wonder if anyone is feeling that. Obviously, if anyone's listening from the core team, then
00:06:04
Speaker
It would be nice to share what was happening in that respect. Maybe if we get Alex on. Yeah, then we can ask him. We can ask him about that actually. That's true. I think we've got to answer ourselves for that one.
00:06:19
Speaker
Okay, so this episode is going to be about concurrency and parallelism but before we get on to that we have this news and events and there are lots of things happening in Europe especially. I think of course there is Euroclosure, probably the biggest European closure conference. I don't know if I'm offending the closure exchange guys in London by saying that.
00:06:41
Speaker
But we have Euroclosure in October, I think, October 25th and 26th and there is CFP still open until August 5th. And there is another event in Finland, closure tray or closed. I don't know how to pronounce this but it's pretty awkward name but
00:07:01
Speaker
Anyway, that is happening in Tampere in Finland. That's on September 10th. And I saw that David Nolan is going to attend that event. I think it's a one day event in Tampere, Finland. And of course, there is also closure exchange in the UK, in London.
00:07:19
Speaker
that is happening in December 1st and 2nd and there is a CFP open for that as well so any of these listeners if you are interested in talking about closure I think there's a good opportunity to apply for well apply for their call representations whether you know closure and skills matter closure exchange
00:07:40
Speaker
And apart from that, I mean, Deafone will be there at Euroclosure, as we've been explaining to our fans, which is plural, by the way, that will be there. And we'll have a powwow or something, a quick meetup during the Euroclosure. We'd love to talk to

Understanding Concurrency

00:07:59
Speaker
you guys and ask for new ideas. And I think there are a couple of ideas being suggested on Slack already, like what we should do, like speaker interviews and other stuff.
00:08:09
Speaker
Maybe we'll plan something. We'll definitely try and I think get some interviews for the podcast while we're at Euroclosure. I think that would make a lot of sense, wouldn't it? But actually it would just be nice to, like you said, you just go out and have a coffee or a beer or a water or a coke or whatever and just have a, you know, say hello to people in person. It's always nice, isn't it?
00:08:33
Speaker
Yeah, of course, of course, when we'd love to meet some of the people who are listening to this. It would be amazing to get your feedback in person so you can adjust for exactly. So yeah, you never know. But of course, we've been very open about way the way we offend other people. I hope there are not too many scholar people there. So
00:08:56
Speaker
We'll see. That's gonna be a big problem. But anyway, let's get on to the main discussion for the topic for today. So, concurrency and parallelism.
00:09:07
Speaker
So there have been a lot of discussions about our confusion about what is concurrency and what is parallelism. So we're going to throw in our opinions into the whole confusion part and then hopefully try to untangle it a bit. I think we're a bit confused ourselves, aren't we? Of course.
00:09:28
Speaker
I think this is like a therapy session, isn't it really? It's kind of like, what do you think? How do you feel? Do you have any problems as a child? Did you have concurrent parenting?
00:09:44
Speaker
I think we should start with, hello my name is Vijay, I just googled concurrency versus parallelism. That's like a concurrency parallelism anonymous meeting. The thing I always think about is like
00:09:59
Speaker
is that it's concurrency is, the thing that's confusing about concurrency and parallelism is of course, it's all about things happening at the same time. So to some extent, if you say, oh, it's parallel, not concurrent, or concurrent, not parallel, you're kind of splitting hairs at some point. In the end, things happen at the same time. All that really matters, I think, in the end, is what programming constructs do we use?
00:10:29
Speaker
one thing versus the other thing. And what programming constructs, how does it benefit us to use the programming constructs for concurrency versus the programming constructs for parallelism? And I think that in the end, that's what's really going to matter. Because you can get yourself tied in the knots around what is
00:10:50
Speaker
You can just start at the hardware. At the very basic level, if you've got one single core, then in theory, it can only do one thing. But we know for a fact that operating the systems... And in fact hardware, if you have one CPU, it can only do one thing at a time.
00:11:12
Speaker
Yeah. So the whole point about operating systems is to fake the fact that you've only got one CPU. You know, so if you remember, you know, Linux and Unix, and I think that in the areas of Windows and DOS, of course, in DOS, when you were at the command prompt, you ran your command and you waited until the command came back.
00:11:32
Speaker
Yeah, so it's basically running one thing at a time. Yeah. It's only when you get things like modern operating systems with windows or where you can make a background job. I mean, remember the Unix days. There's an ampersand. You can run a shell that puts this task in the background. Things in the background. What a demon. A demon process. What's that all about? It's this devilish, literally.
00:11:58
Speaker
It's a devil. It's in the background. It's a devil, little devil in your machine demonizing things. It's fundamentally evil. It's fundamentally evil. Yeah. So you have these little background processes running there and clearly if something is in the background, is it running concurrently or is it running in parallel? Well, actually you don't really know.
00:12:21
Speaker
Yeah, that's true. But if there's one CPU, then it must be concurrent in the sense that it's faked. That's true. But if it's got two CPUs or you've got a GPU with multiple resources, then of course it can be running in parallel and you don't know that. Yeah. But I think that the whole concurrency
00:12:45
Speaker
Problem or not even a problem the concurrency came because of the whole time slicing right when so you have multiple tasks to do Yes, and but you have only one thing that can do the things so then you have this time slicing So I'm gonna allocate I don't know three milliseconds of my time to do task a and then next three milliseconds I'm gonna do task B then but in the compressed view of human time It looks like there are running parallely because the task switching is so fast. So the concurrency
00:13:13
Speaker
Yeah, the computer kind of masks it by just being so quick. Exactly. So in my understanding, at least as far as I understand, concurrency is essentially when you have just one thing that is going to do the tasks, but parallelism comes into the picture, then when you have multiple things that can do the multiple tasks at the same time.
00:13:34
Speaker
I think because nowadays we have this quad core machine so there is always some parallelism going on. I mean we're now speaking on Skype that is running and there is another program that is running in the background. I'm pretty sure the whole network monitoring shit and everything. So my operating system is already taking advantage of having multiple cores and running things parallelly.
00:13:55
Speaker
But if you switch to concurrency, then there are, as you were pointing out, both concurrency and parallelism mean that there are multiple things happening at the same time, but the way they're executed. Well, I think the other thing to say is that, you know, like we've been talking about with the hardware, is that
00:14:15
Speaker
The operating systems take care of concurrency for you by either time slicing, like you say, or being aware that something is going to talk to the network or talk to a disk. So they can essentially swap that process out and give something else a chance to run. So they either do this scheduling via resources. If you're listening on a
00:14:40
Speaker
an IO or a network IO or a disk IO, then they can essentially put your process to sleep and give someone else the CPU because they know you're not going to do anything. That's fine, but as you got through the 90s and the 2000s towards the middle of 2005, 2006,

Exploring Parallelism

00:15:00
Speaker
suddenly most law runs out on a single CPU. Up until that point, your computer program has just got faster and faster and faster. Your computer could do more things because, like you said, it was just masking the fact that it was really swapping things out.
00:15:22
Speaker
So once you get to the point where actually you need to exploit parallelism in software, then it's a different ballgame because suddenly the operating system itself and the CPU itself, they need to cooperate.
00:15:40
Speaker
Yeah. So if you think about it, like Windows, I was doing some research and it's at work actually for like big data problems. And if I looked at the kind of the stuff that you need to do, and basically you up until like Windows, I think it was Windows, I might get wrong here, but I think it was Windows 2000 was the first operating system from Microsoft that actually used multi-core machines.
00:16:08
Speaker
was capable of you. I think it could be faked in earlier versions, but Windows 98, but I think Windows 2000 was the first one. To get NT, but on the desktop, Windows 2000 was the first one. In fact, Windows NT was written from the ground up, wasn't it? Yeah, of course. So that's my point is that it kind of
00:16:29
Speaker
The operator in the system had to be rewritten to take care of multiple cores. What was interesting at that point was that if you were running, let's say you had a spreadsheet.
00:16:39
Speaker
then if you had some, if you're running like Excel 2000, then Excel itself was not capable of running multiple cores. Because it didn't have the .NET framework. Yeah, exactly. So then they made the .NET framework like multi-core aware, just like Mac did with the GCD stuff.
00:17:03
Speaker
And then Excel was rewritten to take care of the multi-core parallelism. So my point is that multi-core has brought parallelism into the programming world, and that's the big discussion, isn't it now? With all these functional programming and all this kind of stuff, and this whole notion of immutable data, all these persistent data structures and the tree-based structures, which we'll come to in a bit more detail.
00:17:29
Speaker
But that notion of the hardware essentially requiring us to take advantage of its parallel nature is the thing which is changing in the last five years, last ten years let's say, but certainly the last five years. And you can no longer rely upon faster hardware bringing you any perceived benefits.

Concurrency in Clojure

00:17:54
Speaker
But I think in closure, I think the concurrent concurrency story is much more mature than parallelism story, right? I mean, because concurrency is one of the one of the unique selling points of closure that people are at least rich even announced closures. Okay, this is the whole STM based concurrency and then having immutability everywhere that helps you with shared state concurrency.
00:18:18
Speaker
So you have all these fundamental constructs, like you have the vars, refs, agents, and atoms. And then they are operating on immutable data structures. So if I remember correctly, you can achieve concurrency based on either shared state, I think. So you can share the memory.
00:18:36
Speaker
And there are multiple thread sacs in the shared memory. So that's how you create one kind of concurrent programs. And the other way is basically something like an actor model, where you have this message is passing through different threads and the data is passing from one thread to the other or one actor to the other. So that is actor based concurrency or message passing based concurrency, I think so there are two different ways of looking at it.
00:19:01
Speaker
But in closure, the shared state thing was like the bottom or one of the foundational constructs with atoms and agents. I think that is one of the things that people or at least we say STM is going to help you with writing concurrent programs without worrying about locks or writing synchronization by hand. So I think concurrency is much more mature in closure.
00:19:27
Speaker
But I think that's like you say, that's like old school Java concurrency, isn't it? Where you have multiple threads all running, but trying to access a common state. A common resource. And that's definitely very old school, and you could do that in Java, but obviously
00:19:48
Speaker
It was much more complicated. The locking strategies were much more complicated in the Java world than they are in the Closure world. And that's the source of many bugs, isn't it, in imperative and all programming is this whole notion of things not being initialized properly off by one errors, the usual kind of crap that you get with these mutable data structures.
00:20:16
Speaker
Yeah, but I think there is, if you see the different constructs, I think there is plenty of documentation available like which particular STM based thingy that you need to use in what case because most of the things are
00:20:32
Speaker
For example, if we use STM, then you need to use software transactions. So using Do Sync or I think Alter or Do Sync. Yeah, Do Sync is basically to cover the transactional operations on a given dataset or given data structure.
00:20:48
Speaker
And also we have asynchronous stuff using agents. So atoms are essentially if you want to share data between multiple threads without worrying about okay, this is going to modify the you don't you don't want to get the concurrent modification exception. So that that's where items are going to help you with
00:21:06
Speaker
So I'm going to put some data into an atom. And then there are other threats synchronously accessing the state. And at the same time, you can also modify the atom itself without worrying about getting exceptions. So so without using any any locks. So I think that is the simplicity of of using an atom instead of using using lock based concurrency or writing synchronize methods in Java.
00:21:30
Speaker
But then I think the other one is most popular one is atoms and agents. I haven't seen too much refs in the wild though. I'm not sure maybe the use cases are not really that interesting for the real world cases. I need to check maybe because that is there is also a difference between the coordinated versus uncoordinated change of the data.
00:21:54
Speaker
So if you want to change any data in in a ref, for example, then it has to be part of a transaction. So you need to do that in do sync, or a ref set, I think there are a couple of functions available.
00:22:07
Speaker
Yeah, it's complicated, isn't it? Because of the ordering of things and the optimistic things. Yeah, and also retrying. Because if it fails, it's going to retry. Yeah, I think the use cases for that kind of stuff is a bit like two-phase commit. It's theoretical, all very nice, but no one ever does it because it's just horrible for performance.
00:22:33
Speaker
Yeah, I haven't seen any any do sync stuff much, but maybe well, my my closure code reading is not that much as somebody else would do. You need to get someone to call us in. Yeah, I'm around and tell us a bit about it. People who you who are using do sync, please call us up, then let us know your experience. That'll be cool. But the real world use cases, like you say,
00:22:58
Speaker
Yeah, yeah. Because I didn't get any mileage from from do sync stuff yet. But STM stuff. But primarily, I think I wrote atoms mean I use atoms everywhere, especially enclosure script, well, they're really useful.
00:23:12
Speaker
Well, agents are not available on the closure script. Of course. Yeah. Yeah. So anyway, so that's the basic. I think the thing about atoms and stuff like that, though, is because it's all basically assuming that you've got multiple threads accessing some common state. Yes. So you get concurrency that way. But that's, I think, very different to the concurrency models of things like JavaScript
00:23:40
Speaker
which I think is really the basis of concurrence we really want to talk about today, isn't it? Well, also, you know, based upon this notion of Node.js and stuff like that. So, you know, there's a lot of these concepts that
00:24:02
Speaker
that actually, to write an efficient HGP server, it's not a good idea to, and Java people knew this, of course, before JavaScript guys did. Jettie and Comet and all this kind of things knew about continuations. But that's what you need to do. You need to basically, rather than using a thread per request, which is the
00:24:30
Speaker
which is very hungry on the memory. You need to use one thread or a very small number of threads and basically park the sessions or the states for the inactive threads until you get some IOR coming back.
00:24:50
Speaker
That's the basis of Node.js, but obviously the problem with Node.js is this callback crap that you get, this Christmas tree code where you end up saying, all right, I'm going to do something in my callback, and then I'm going to do something in that callback, and I'm going to do something in that callback, and the other callback, oh my god. Once you get beyond about two or three levels deep, it just becomes totally horrible to reason about.
00:25:16
Speaker
And that's why they call it callback hell, isn't it? Because yeah, yeah, because you have no idea of the execution of the program and you can't, you can't, you can't compose it. That's exactly. Yeah. And also you cannot make a mental model of the program because you keep thinking, okay, this happens, this happens, this happens. And then it, it's like a huge mess.
00:25:37
Speaker
well it becomes a christmas tree in the court because you start off you start off at that in the middle with like twenty characters that it becomes twenty five characters and as you're in den down it becomes like this huge normal for the characters to write all of a sudden yes so so that's a nightmare and of course what these guys do now uh... in the javascript world is they use these promises still a bit christmas tree ish
00:26:05
Speaker
But eventually, they're going to move, I think, to this async await stuff in ES7. But this is what the goal language has for a while, and that's what this core async does now, isn't it, is this concept of a block of code, which looks like it's just sequential code, but will happen when some data comes into a channel.
00:26:29
Speaker
So it's essentially a callback model, but that callback happens in the code transformation or at the lower level. Because the code, when you read it, it still looks like imperative or step by step. Yeah, it's very easy to reason about.
00:26:48
Speaker
Yeah, underneath it all, I think they're basically these things like go loops and go blocks. They're just basically macros, aren't they? Yeah, they are. I think that is one of the things that's surprised or maybe think that people use as the nicety of Clojure or the Lisp world is that you can just write a macro.
00:27:13
Speaker
that can convert the code from, that is looking like imperative code, and then that is converted into some sort of a state machine during the compilation phase. And compared to Go, because in Go they have this language level support for these channels and Go routines. But enclosure is just library. So, and then the Go... I don't know though, actually. I mean, it's a funny thing, isn't it, in some ways. I mean, I think it's a kind of like engineering wonder that you can make this macro.
00:27:41
Speaker
Yeah. But it's got a problem as well, though. I mean, one of the reasons why we, you know, like, people often say, don't they, about closure, you know, it's like data first, then functions. Yeah, then macros. Yeah. And yet, this is kind of like quite an important thing, the macro of the core async.
00:28:02
Speaker
But actually, you can't compose it. Yeah, that's true. And I find that I don't know. I mean, I'm not sure what, to be honest, I don't know what difference it would make if it was if it was in the language, in terms of composability, I don't know.
00:28:18
Speaker
But in the language would mean the Clojure Core need to change. So I think that is the... I mean, you say that, but actually it was Timothy Baldridge and Rich Hickey and they are the Clojure Core team. So if anyone wanted to change it, they had the power to change it. Of course, but then it would be a special form or something. It will still be implemented as a macro, I would guess, but it won't be a library, but it's part of the Clojure Core and it will get the special treatment like the if and whatever the special forms that we have.
00:28:48
Speaker
But yeah, maybe that's, I don't know, maybe we should get Rich Hickey on the podcast and then maybe we can ask him, what do you think?
00:28:56
Speaker
would be interesting. I've never actually seen a thorough, I mean I know that people talk about the macros thing, but I've never actually seen a proper, or never heard, maybe there is such a thing, and I've just missed it, and follow up welcome, you know. But I've never seen a thing saying, ah yeah, it was a macro because this, that and the other. You know, apart from the convenience factor of not changing the language, what are the real actual benefits? I don't know.
00:29:24
Speaker
Not for the Go macro, but I remember reading somewhere that one of the libraries for probably Redis, I think, there is a macro that generates the code at the compile time for all the functions that are in Redis API that is specified in the JSON spec. So I think there are some use cases for macros,
00:29:47
Speaker
probably Go macro is one of them. But of course, I either we're gonna get Rich Hickey here and Timothy Walridge onto the podcast, or we're gonna spend more time understanding this. And then we can come up with, okay, this is why this is a Go macro, or Go is a

Core.async in Clojure

00:30:07
Speaker
macro.
00:30:07
Speaker
I think it's fabulous that it's been done, don't get me wrong. I'm just saying that it is awesome that such a feat can be achieved. I just don't know enough about what the benefits would be the other way around. If it was composable, for example, because it's a macro, it's not easily composable. That's a downside. That's all I'm saying. That's true. Whereas if you have them as functions,
00:30:38
Speaker
and some kind of support in a language, then maybe it would be more powerful even.
00:30:47
Speaker
That's true, but when you think about as a designer of a programming language, it's all about trade-offs, right? I mean, you need to pick one way or the other. So every approach has its own advantage, and every approach has its own disadvantage. So you need to see what is a trade-off here. So making it a library and then not imposing it into the core. And also, one of the things that, in this case particularly, is that
00:31:15
Speaker
Channel based concurrency is not going to be the holy grail for everything. So it doesn't need to get language level support. Maybe that's could be another speculation from from my side that because there are there are still other other types of concurrency models like actors and
00:31:33
Speaker
Yeah, channels and then of course you don't want to go to the thread level because that's too low level. So maybe there is still scope for other types. That's what I remember when Core Async was announced and people were asking questions about why not actor-based thing because Rich Hickey said I think channels are much more interesting way of implementing concurrency, not the actors. The idea about the acting thing is that the caller has to know the callee.
00:32:00
Speaker
And that's the big difference between the core sync model where you put some data on a channel and you don't know what happens to it. And it can be picked up by many people, many people listening on that channel or many people, many processes listening on that channel.
00:32:20
Speaker
I'm a fan of that, definitely. There is a decoupling between the color and the color. It actually makes the whole system more composable, I think, and more loosely coupled, which is something that we like, isn't it?
00:32:36
Speaker
So, and I think actually on the, I can't remember the details, but on one of the closure things, there's a, I wouldn't say a denunciation of the active framework, but definitely a kind of like, you know, we prefer, we don't like that from it because of A, B, and C, and certainly the linkage between the typed messaging of the caller and the callee is definitely a big negative, I think.
00:33:04
Speaker
Yeah, that's true. I think if you see Akka, for example, well, they're working on typed actors. So that might be there in the future. But so far, it's well, I worked on Akka without any typed actor stuff a bit. So the
00:33:19
Speaker
so actors can consume anything so there is no typing involved so that is where one of the nice stuff that you have in Scala world is that or Scala that you have these types but then once you switch to actors then the types are gone and you send a message to any actor and then it might choose to not to respond or not to consume it or well it will consume it but it won't act on it because it the method itself accepts any
00:33:43
Speaker
so you don't know and there is no compile time check to tell you that hey this I can only accept these kind of messages but there is there is a typed actor thing happening there as well but anyway let's not talk about actors much because I think channels are the way to go obviously
00:34:02
Speaker
Well, actually the interesting thing is that like, like you're just saying there, you know, you can now start to do things like put, you know, specs around the data, for example. Yes. And, you know, you, you can do some interesting things there. But yeah, I mean, I've done a little bit of programming with the Core Airsync stuff and it seemed, it seemed very nice to me. I read a few blog posts about it and I think, I think there's some very powerful primitives and some nice words that you can combine data.
00:34:32
Speaker
I did some time series stuff on it again, which was good. But the thing which I thought was, I was talking to a friend of mine from the Belgium meetup actually, and he was saying that in the Go blocks, there were some problems with exception handling.
00:34:51
Speaker
And he found that in the end it was getting quite annoying to have an exception handler for every single statement almost in the goal block.
00:35:04
Speaker
and he ended up abandoning it and going to manifold from Zach Tillman, which was less or more consistent in that respect. But anyway, I certainly found Chorus in very nice and I think, actually to be honest,
00:35:24
Speaker
co-racing, like it or not, is the ยฃ800 gorilla in the concurrency world enclosure, so like it or not, it's here to stay. And I actually like it a lot, so I'm not upset about that. But what I was going to say was that the nice thing about it is that you can have all kinds of
00:35:46
Speaker
filtering on stuff as well. So you can run these kind of transducers on the channel as well to transform the messages and to filter the messages and all these kind of things. And they're really incredibly easy to write. I was super impressed by that. At first I thought, whoa, a transducer, whoa, that's going to be hard. It's going to be quite a complicated thing. But it was so trivially easy.
00:36:11
Speaker
Sometimes you can get put off by the names of these things, but actually it's just a simple map or filter function. It's so easy to make the input of a channel converted to another, to either strip stuff out or to convert stuff to put on another channel. It's so easy.
00:36:35
Speaker
Yeah, I think I had some understanding of transducers. So I was thinking that there is this seek abstraction. So any function that can work on a seek is going to work on this one.
00:36:51
Speaker
But then transducers are next level of abstraction that says as long as I can ask for the next thing in it, you know, that is fine. So obviously, because of the seek abstraction, you could use all these functions on on vectors and lists and yeah,
00:37:09
Speaker
or anything that is implementing a seek. But obviously you cannot use that on a channel. But channel is also something like a collection. So you can ask for the next thing. Is there a next thing available? I'm going to work on it. Yeah, it was a stream of events actually. Exactly. So you can treat the list as a stream of events or stream of things. And you can treat channel as a stream of things. Only the difference is that there is some sort of a time component in the second situation probably.
00:37:38
Speaker
So the transducers are, in my understanding, it's the next level of abstraction that says, okay, I'm going to work on anything that is mappable and that means anything that has a next thing available. So it could be a channel, it could be a collection, it doesn't matter. So you can just create a function that is going to work on anything that can ask for the next one and anything that you can combine. So that's what my vague village idiot understanding of transducers.
00:38:05
Speaker
Well, yeah, you're right. It's unifying. The whole stream thing is quite unifying, though. Although you're right, you can inject time into these things. You don't have to. It's very easy, for instance, with Core AirSync to have a timeout channel. So you can just say, OK,
00:38:31
Speaker
put a timeout channel in the mix and your listeners can listen to either one channel or they can listen to multiple channels and if you're listening to multiple channels one of those channels can be a timeout channel and if you get if when that timeout channel closes you will get an event to say that channel is closed and you can then react on that which is a classic way to
00:38:58
Speaker
to rather than going using like a thread sleep for instance if you did a thread sleep inside of a channel that would be bad yeah because one of the things you shouldn't do when you're in the channel world is do blocking operations and of course sleep blocks on the thread yeah so you have to do that's why these telnet channels are nice because okay in the background it's doing some system
00:39:26
Speaker
sleeping thing. Actually, I don't exactly know how it works. But I'm guessing that's what it's doing. It's got a thread tool in the background. Yeah, of course, there is one thread attached to it. I remember this. Yeah, sorry, go ahead. Yeah. And so anyway, my point is that, you know, while you're in it, while you're in the, the, the go block,
00:39:45
Speaker
you should just be doing non-blocking operations. You should just be doing straightforward things that are all on the CPU, just transforming data or manipulating data. You shouldn't be doing sleeping or IO, those kind of things. Those should be the edge of your system.
00:40:09
Speaker
But it's really nice to be able to combine these operations informally. You just basically put some events on some channels, and then someone else can compose the way that these events are handled later on. So I think overall, actually, despite my notion of saying that macros and functions are not composable, et cetera, I think the core async model is actually very, very composable and very decoupled.
00:40:38
Speaker
So I like that because in the end it's the data on the buffer that you're modeling actually. I was going to say the other thing that's worth discussing about with co-racing is this buffering concept actually. Because one of the problems with asynchronous operations is
00:41:02
Speaker
very very aggressive writers or very very lazy readers and this whole notion of back pressure is essentially a problem because what do you do? There's always an interesting question, I'm writing a log file into a channel and
00:41:27
Speaker
If someone doesn't read from that channel, well, eventually memory is going to get full, isn't it? Yeah. And something's going to go wrong. So what happens in Core Async is by default, things are unbuffered. So if you write something, then you can't write again until someone reads. Yeah. So it's blocking. Kind of, yeah. But normally, of course, what happens is that you will use a buffer.
00:41:57
Speaker
So when you create a channel, you can say, I'll have a hundred or a thousand or ten thousand messages as a buffer. So then what's interesting is then you have some different types of buffers.
00:42:14
Speaker
because you can choose to do. So yeah, so you can have, there's like I think four different buffers. There's ones where basically if they're not being read out, it will drop your rights. So in other words, it will be first in,
00:42:36
Speaker
So the consumer will see the first messages that you put in there. Then you have these other ones called sliding buffers where it will drop the messages that are already in the buffer and take your new ones.
00:42:51
Speaker
So this is, we should call them like first in first die or something, if it is a dropping buffer. So it's going to drop from the first thing that gets in, if it is not processed, it's going to be dropped, right? The next one is going to, so it's probably first in first die. So the messages will be really careful getting into the channel.
00:43:09
Speaker
Yeah, yeah, that's true. But I think what's interesting about it is that you have sort of control in your environment as to what policies you're going to apply to different messaging, different channels. So depending on the nature of the data, you can choose to either drop it at the beginning or drop it at the end or whatever. And of course, the funny thing is that I think overall, people don't like this dropping thing.

Message Durability and Scalability

00:43:38
Speaker
Yeah, that's that's what I initially when I heard that, hey, there is going to be a dropping buffer, that means it's going to have some sort of a buffer capacity, and it is going to drop the stuff. And if you think about it, in some cases, it really makes sense. For example, if I'm tracking mouse movements, obviously, I'm not interested in every, you know, movement of the mouse, and I'm all interested in the latest stuff. So that is a reasonable assumption that I'm interested in, I don't know, 20 mouse movement path events.
00:44:06
Speaker
And I only want the latest ones. Exactly. So if I'm not fast enough to process them, and it is fair enough that you drop them. And when you think about other cases as well, in the real world, people think it's actually losing data. But in some cases, you're not really that attached to historical data, you're interested in what is the current one.
00:44:28
Speaker
So in that case is that the dropping buffer would make sense But that trade-off is going to give you more control on the program's behavior as in I can be sure that this program is not gonna die because of eating a plot of memory So I also think that if you're gonna if you're really if you're really invested in this data like this logging data or whatever You know if you want to
00:44:50
Speaker
make sure you keep every message, then you have to use something which has got greater robustness than one single process. So that's why to me things like Kafka are so useful because they're very large scale, distributed, fault tolerant, all these kind of things.
00:45:12
Speaker
So, you know, you get that kind of system. So actually, it would be quite interesting if the closure guys are working on an equivalent of the atomic for core async, you know, if you could imagine like scaling up the collections, which is what they've done. Essentially, they've scaled up the collections for the atomic, haven't they?
00:45:33
Speaker
I think so. I'm not familiar with the database. You have a bunch of persistent collections which are now durable in the database so you know what's changed.
00:45:50
Speaker
They're all immutable, persistent collections, but now you can scale them up by persisting them on a disk somewhere and having multiple readers and all this kind of good stuff. If you can imagine that doing the same thing for core async, then that's kind of what you're doing with Kafka, essentially. Yeah. Of course, I mean, at that scale, you wouldn't
00:46:12
Speaker
write just closure collections to deal with it, right? I mean, obviously, if you need that size of data, you need an external system to handle it. But this is good for most of the use cases where you don't have that so called quote unquote big data, which the stuff that doesn't fit into your memory. So it makes sense that for for those any smaller use cases, it's just going to be okay. So
00:46:36
Speaker
Obviously, but I think it's also even for the cases of the Kafka, whatever, you know, if you, you can read from a Kafka stream into a core async channel. Channel, yeah, yeah. Because it's really, to me, what core async is about is about the way you organize your code.
00:46:53
Speaker
I only use Kafka as an example of basically making sure that if you're really caring about the durability of your messages, then that's what you would do about it because everything in core racing is ephemeral basically. If you want this whole thing to be durable, then you have to have some message broker basically.

Futures and Reducers in Clojure

00:47:14
Speaker
That's true. Anyway, so before moving to parallelism, I just remembered my train, my train of thought has parked or arrived at the station. So now I recollect what I was talking about. I was about to talk about the futures because that is also one of the things that you can use to achieve concurrency enclosure, because that that would tie in nicely because
00:47:35
Speaker
For the case of parallelism, we have some way around, initially there was this parallel map and that is essentially implemented as the PMAP, which was implemented as a future essentially. So for every function, so you have a collection of things and you do PMAP instead of a map that uses future by allocating a thread pool of I think number of cores plus two.
00:48:02
Speaker
And then you have a thread pool, and then that is what we use to actually execute the parallel mapping stuff. But other than that, there is a much. That was a kind of Pullman's fork join, wasn't it? Exactly, yeah, yeah. But I think reducers are the example of fork join parallelism, right? Right, yes, yes, yes. So maybe let's move on to reducers, like you said, because that's
00:48:30
Speaker
That's where we start moving away from this concurrency model. To parallelism. Yeah, okay. Exactly. So basically reducers, when they were announced, they are going to work on anything as long as they are foldable, I think.
00:48:47
Speaker
like the transducers are on mappable, these are on foldable, anything that can be folded and you have this list of or a collection of items and then it will use the Fort join framework to divide the given collection into smaller pieces and then apply reduce function or the reducing function onto each of these
00:49:15
Speaker
partitions. So that is based on fork join framework in Java 1.7 or 1.8? 7. Yeah, yeah, under seven. So, so reduces all the ones that brought in parallelism, but I haven't seen, I think there were some benchmarks in
00:49:35
Speaker
probably Richard Key's announcement post about how fast the reducers were compared to normal way of using concurrency. So you get to use all the cores properly. That's the key thing, yeah, is that you definitely get, you know, obviously it's like a lot of these things with fork joins, isn't it, is if you've got a fairly small data set, then
00:49:58
Speaker
the overhead of the management dominates things whereas if you've got a very large data set, then you can get into this whole work stealing business as well and can be very highly optimal.
00:50:14
Speaker
Yeah, I mean, it's like having multiple people to do the stuff. But if you if you the division of the work itself is taking too much time, then there is no point in applying parallelism to that one. And so if it takes more time for me to divide a given numbers to be
00:50:32
Speaker
If I want to sum seven numbers or eight for the sake of argument then I have four people and then I need to give them two numbers for each and then they just need to do an addition. Then the complexity of addition is so small that I'm spending more time dividing the task rather than actually executing the task.
00:50:54
Speaker
So I think it's like there needs to be a lot of, in my opinion, there needs to be a lot of experimentation to see where parallelism is going to help to actually make faster programs. But in a way, that's what functional programming is pushing for, right? I mean, because there are no cores, more cores available, but not each core is not going to be as fast as in the past.
00:51:16
Speaker
It's quite funny you say that because I think I told you earlier on that I watched this Guy Steele presentation that Rich Hickey mentioned in his documentation. And I think he was asked a question, and I don't know who asked him the question actually, but there's a lot of luminaries in the audience. Someone asked him a question about
00:51:43
Speaker
It might have been Joe Armstrong even, I think. Whoever it was, it doesn't matter. He asked him a question about what do you think about this exact point about modeling for performance in algorithms. He punted on that because he said, yeah, that's actually the most difficult bit.
00:52:02
Speaker
the most difficult bit is, you know, actually, you know, kind of breaking up the work is fairly mechanical. But determining whether or not it's actually going to be valuable to break up the work is very complicated. And of course, there's an overhead involved in that as well. And I guess if you think about it, it's the kind of thing which probably, and I've just thought about this, but, you know, if you think about
00:52:30
Speaker
where we see those kind of things in the real world is we see them in relational databases, in query optimizers, for example.
00:52:38
Speaker
Or in optimization problems in general, don't we? We see them in these linear algebra style modeling things where you try to work out backtracking solvers, these kind of things. I'm doing a bit of work at the moment on configuring product configurations. And it's definitely very, very slippery.
00:53:05
Speaker
to work out what is the actual number. It gets back into mathematics of NP-hard and NP-complete, all this kind of stuff. So maybe we should drop that bit in the conversation, because we're going to get completely lost in all of the background stuff. It's interesting stuff, but I think it's like you say, modeling exactly what is the exact profile of things that are going to benefit from this approach is actually unknown.
00:53:33
Speaker
That's true, but you triggered something like the performance part. So I was thinking, because I've been working, well, I'm working a lot with Spark and HDFS these days, Hadoop stuff. So it's like sometimes just doing a grep or WC is way faster than actually running a Hadoop program to do the word count.
00:53:51
Speaker
Because there is a, well of course it's not even parallelism, it's even next level up, it's a distributed work. So that means you not only have multiple cores, but you also have multiple machines. And then coordinating them and all sorts of crap. But there was something in Closure World, I think it was David Liebke or somebody who announced this EVOUT library that is a distributed state.
00:54:13
Speaker
that use a zookeeper as it is essentially like a distributed atom. But I didn't see much action happening in afterwards, I think. But that was an I think we have we're talking about concurrency, single machine, single processor, and then we move to parallelism, of course, single machine, but multiple processors, and then you have the multiple machines, multiple processors.
00:54:37
Speaker
that is a distributed world and there we don't have any interesting stuff happening in closure as far as I know.
00:54:52
Speaker
So I think actually they're definitely taking that approach from the ground up.

Tree Data Structures and Future Topics

00:54:59
Speaker
But it's, I think we want to talk about just parallelism and sort of the programming aspects first. I think we should, we've talked before with me about, we should approach the Onyx guys. And I think they're interested in potentially joining us on one of the episodes.
00:55:17
Speaker
So maybe we should leave that to when Michael or Lucas are going to be on the podcast, it would be cool to understand the challenges involved there. The thing, just coming back to this kind of like the reducers thing, the insight I think that Guy Steele made was that
00:55:35
Speaker
you know, functional programming as giving us these tree data structures, these tri-data structures, tree data structures. We call them trees now. Yeah, I think we call, we pronounce them tree, but it is a t-r-i-e. So, in fact, he even mentioned closure in his fortress presentation, you know, to say that Rich Hickey has got this thing which has these
00:55:59
Speaker
64-bit data structures, and it's all awesome stuff. I think it's 32-bit actually, isn't it? Yeah, 32-level branching factor. Yeah, 32-level. We said 64, but it doesn't matter anyway. It all boils down to small numbers. It's very big, yeah.
00:56:19
Speaker
But what he was saying is that these tree data structures are the foundation of the future of how to do reducing and faults and all these kind of things, how to do reduce in the future.
00:56:43
Speaker
The thing that you cannot do, you should not do in the future is you shouldn't have these do loops, you shouldn't have these iterations. You shouldn't be forced to iterate over a linked list data structure. So basically, trees give you the opportunity to have a very balanced parallel implementation.
00:57:06
Speaker
And I think Rich Hickey thought, ah, yes, okay. Well, he must have already thought that, but yeah. But, ah, yeah, okay. So Guy Steele has definitely given us a good blessing, but let's now make these reduces happen. So what he does there is he basically has some kind of function that does the work and then some kind of, you know, the folding, and then some kind of combining function.
00:57:31
Speaker
So you have these, and the real key thing about it, and the key thing he mentions in it, and I haven't seen his presentation for a while actually, but I remember watching his presentation a couple of times, is that ordering is the death of all parallelism, and that current parallelism like map and fold, even in closure itself now,
00:57:53
Speaker
and reduce, all start with zero, or they assume that you're going to have some function, or this function that you're doing, this mapping function that you're doing, this reducing function that you're doing, sorry, is aware that you're going to keep some kind of aggregate going through it, so some accumulator. And it's the accumulation that's the problem.
00:58:18
Speaker
So if you can strip out the accumulation into something which the combining function can understand, which is, in other words, it's associative, then you kind of blow away all of the restrictions that you've got in the data structure itself. You don't have to think about it as if it's a vector or as if it's, you can think of everything as unordered and everything is completely decomposable.
00:58:49
Speaker
Yeah, but that is well, maybe we should touch upon this stuff more on transducers episode. I was thinking we should we should dig deeper into into this one, and then talk about how transducers work and this abstractions and probably something for the newer episode. I think we have almost Oh, it's almost one hour. Right. Okay.
00:59:13
Speaker
Yeah, I mean, we never looked at the clock, I think these days. Looking from the first episode, it's always like, oh, 50 minutes, and then one and a half hour. And then I think we need to keep it in in one hour, though. But before we before we move on, I think a quick so we talked about a lot about concurrency. And also, I think we should post the guy steals.
00:59:35
Speaker
presentation link somewhere. So we post the show notes on Deaf and Audio. But as usual, we'd love to hear your feedback, whoever is listening, our fans, the plural number of friends who are listening to this podcast. And thanks a lot for listening. And the MP3 will be on SoundCloud.
00:59:55
Speaker
And it is also on iTunes. And next week, I think we will probably try to get a special guest. And but we want to keep it as a secret. So we'll announce it as soon as we get some sort of a confirmation, I suppose. Yeah. And, of course, thanks to our very talented pizzeri, I think, am I pronouncing that? Pizzeri, yeah.
01:00:22
Speaker
Putsuri for giving us permission to use the track and what is that called? Melon hamburger? Yeah, it's a vegetarian hamburger. Okay, yeah. Okay, so obviously that the music is also vegetarian. So this is the number one vegetarian closure podcast on this side of the pond with an Indian and a UK Brexit guy. I am not for Brexit.
01:00:50
Speaker
Anyway, bye bye! Bye bye! Cheers guys! Okay, thanks for listening and we'll wish you a new episode in a couple weeks. Bye bye!
01:01:22
Speaker
you