Oops!Zencastr was unable to start because Javascript is disabled
To fix this problem, check your browser's settings and enable Javascript

Become a Creator today!Start creating today - Share your story with the world!

00:00:00

00:00:01

#6 - Concurrency and Parallelism

75 Plays9 years ago

A rambling discussion struggling to come to terms with concurrency and parallelism See the show notes at defn.audio https://defn.audio/2016/07/25/episode-6-concurrency-and-parallelism/

Recommended

b8c570464f67 Siyoung image

b8c570464f67 Siyoung

01:29:13·2 months ago

2aba8aeedf7d Polylith image

2aba8aeedf7d Polylith

01:43:06·5 months ago

95a74d1e2f4d Bobbi Towers image

95a74d1e2f4d Bobbi Towers

01:19:09·7 months ago

7df1b8716726 Anna Colom image

7df1b8716726 Anna Colom

00:49:56·10 months ago

876b4c306172 Cora Sutton image

876b4c306172 Cora Sutton

01:29:30·11 months ago

3d32130bbe3d Drew Raines image

3d32130bbe3d Drew Raines

01:38:25·1 year ago

1a156fe163a1 Nathan Marz image

1a156fe163a1 Nathan Marz

01:31:59·1 year ago

6ae6acf85d52 Chris Pellets McCormick image

6ae6acf85d52 Chris Pellets McCormick

01:07:32·1 year ago

2afd69a4979a Kira Howe image

2afd69a4979a Kira Howe

01:47:47·1 year ago

8f5d5379091d Eric Normand image

8f5d5379091d Eric Normand

01:53:01·1 year ago

15e670d6e60c Kathryn Lawrence image

15e670d6e60c Kathryn Lawrence

00:55:36·1 year ago

Heart of Clojure image

Heart of Clojure

S1 E100 · defn

01:05:40·1 year ago

#96 Gary Johnson on Gemini Protocol, Clojure and off-the-grid life image

#96 Gary Johnson on Gemini Protocol, Clojure and off-the-grid life

01:58:25·1 year ago

#95 Arne (aka plexus) and Heart of Clojure image

#95 Arne (aka plexus) and Heart of Clojure

01:31:22·1 year ago

#94 Clojure, Go, Cloud Storage Tech and more with Albin, Aurelien, and Wouter image

#94 Clojure, Go, Cloud Storage Tech and more with Albin, Aurelien, and Wouter

01:33:52·2 years ago

#93 Malcolm Sparks Returns! image

#93 Malcolm Sparks Returns!

01:50:50·2 years ago

#92 defn.no with Magnar Sveen and Christian Johansen image

#92 defn.no with Magnar Sveen and Christian Johansen

01:30:13·2 years ago

#91 Josh Glover image

#91 Josh Glover

01:55:20·2 years ago

# 90 Jacob O'Bryant image

# 90 Jacob O'Bryant

01:18:25·2 years ago

#89 Kimmo Koskinen aka viesti image

#89 Kimmo Koskinen aka viesti

01:54:03·2 years ago

Transcript

Introduction and Feedback

00:00:18

Speaker

Right, hello, welcome to Zephyr, episode 6. No guests, nothing else, just myself and Vijay. How are you doing Vijay? No, I am doing good. I think this is just, as you said, two village idiots talking about collision. Yeah, I'm sort of hopeful. No experts to correct us.

Building an Online Community

00:00:37

Speaker

Yeah, it's just going to be a total BS this week, I'm afraid. Yeah. Right. OK, so should we do a quick follow up from last week, last time? Of course. So last week we had super fun with Misha talking about Hoplon and, of course, some nuclear issues as well. But it was very super fun to make and I hope you guys enjoyed that one.

00:01:02

Speaker

So that's and we have been getting some pretty good feedback and from from the slack channel and also online as well on reddit we have been raising to the top every now and then on to the closure a subreddit page with Spectacular two digit words, so please keep clicking

00:01:21

Speaker

I haven't seen actually. So we're up to two digits now. Yes. This is very good. It's fantastic. And it's amazing. Oh, we've actually got we've went into double figures on the Deaf and podcast channel as well. On Slack. Yeah, that's true. That's true. We're really building a huge community out there.

JDK8 Support in Clojure

00:01:40

Speaker

Yeah, now I'm actually sporting a fake mustache whenever I'm getting out of the house, because now I'm a minor internet celebrity. I'm looking out for paparazzi and everything. But anyway, so it's been amazing. And thanks for your feedback. So before just recording, we were talking about JDK8 support in enclosure.

00:02:06

Speaker

Yeah. Actually, the reason why I was interested in it, Vijay, was because I'm doing a little project to do with event streaming, actually, to do with Kafka Streams Library, which is a new library. But a lot of these big data libraries I've noticed, we've talked about this ourselves before, the fact when we first met in

00:02:28

Speaker

in Amsterdam, Closure Days, about the fact that Spark and new systems like this Kafka Stream library, Flink, a lot of them have Scala APIs or JDK 8 APIs, but they don't have Closure APIs.

00:02:48

Speaker

Even though Closure and Big Data play very well together, the APIs don't seem to be coming out there. Now I know we've got some things that are dedicated, Closure-wise, like ONIX and stuff like that, for Big Data. But obviously the whole point about Closure is to do interoperability and to leverage the work that's out there in the JVM community.

00:03:13

Speaker

So, it's a bit of a shame that we're not getting this Javaite interop because we can't take advantage of these lambdas and streams and stuff like that. Actually, I'm saying that. There's a guy who has written a library called ik.cljj that allows you to have some of these things. And he seems to be actively developing it, which is good.

00:03:36

Speaker

But I'm thinking that it would be very nice, even if we don't require JDK 1.8 to have some of the core team at least, having a look at this interop to make it a bit more core to language.

Concurrency and Parallelism Intro

00:03:49

Speaker

The thing I was saying before was that we're gonna talk about reducers today, and that requires essentially JDK 1.7. You know you can patch JDK 1.6 with some contrib library, but really,

00:04:03

Speaker

I think for all intents and purposes it's a dependency on JDK 1.7 and there was no problem with that. Obviously, let's just do that.

00:04:14

Speaker

And also, I didn't see anything on the dev.closure.org wiki as the next release planning. There isn't anything, any discussion or something happening to support the lambdas or streams or having a proper interop though. I'm not sure if there are any plans, but I guess we need to wait, I suppose. Because 1.9, the major feature is specs, right? I don't see any other major things bundled into 1.9.

00:04:39

Speaker

No, that's right. Inclusion 1.0. That's the big one. Obviously, the thing about 1.9, and Alex Malera said this, is that it will still only require 1.6, JDK 1.6. So there's no chance of them mandating 1.8 for closure at 1.9. But they are doing it for some of their other libraries. So Datomic, for instance, now requires JDK 1.8.

00:05:08

Speaker

the latest versions of Zetomic require JDK 1.8, which is, of course, sensible. I keep on saying 1.8, by the way. It's kind of crazy. JDK 8. Because I'm such an old JDK guy. It's not 1.7 and 1.6, is it? It's JDK 8 and Java 8 and Java 9. I think it was the Java 5. Java software development kit 1.8 or something.

00:05:37

Speaker

Probably, yeah. There's probably some hidden 1.8 in there somewhere. Anyway, it would be interesting to find out if anyone else is having similar experiences there with this lack of interop with JDK8. I wonder if anyone is feeling that. Obviously, if anyone's listening from the core team, then

00:06:04

Speaker

It would be nice to share what was happening in that respect. Maybe if we get Alex on. Yeah, then we can ask him. We can ask him about that actually. That's true. I think we've got to answer ourselves for that one.

00:06:19

Speaker

Okay, so this episode is going to be about concurrency and parallelism but before we get on to that we have this news and events and there are lots of things happening in Europe especially. I think of course there is Euroclosure, probably the biggest European closure conference. I don't know if I'm offending the closure exchange guys in London by saying that.

00:06:41

Speaker

But we have Euroclosure in October, I think, October 25th and 26th and there is CFP still open until August 5th. And there is another event in Finland, closure tray or closed. I don't know how to pronounce this but it's pretty awkward name but

00:07:01

Speaker

Anyway, that is happening in Tampere in Finland. That's on September 10th. And I saw that David Nolan is going to attend that event. I think it's a one day event in Tampere, Finland. And of course, there is also closure exchange in the UK, in London.

00:07:19

Speaker

that is happening in December 1st and 2nd and there is a CFP open for that as well so any of these listeners if you are interested in talking about closure I think there's a good opportunity to apply for well apply for their call representations whether you know closure and skills matter closure exchange

00:07:40

Speaker

And apart from that, I mean, Deafone will be there at Euroclosure, as we've been explaining to our fans, which is plural, by the way, that will be there. And we'll have a powwow or something, a quick meetup during the Euroclosure. We'd love to talk to

Understanding Concurrency

00:07:59

Speaker

you guys and ask for new ideas. And I think there are a couple of ideas being suggested on Slack already, like what we should do, like speaker interviews and other stuff.

00:08:09

Speaker

Maybe we'll plan something. We'll definitely try and I think get some interviews for the podcast while we're at Euroclosure. I think that would make a lot of sense, wouldn't it? But actually it would just be nice to, like you said, you just go out and have a coffee or a beer or a water or a coke or whatever and just have a, you know, say hello to people in person. It's always nice, isn't it?

00:08:33

Speaker

Yeah, of course, of course, when we'd love to meet some of the people who are listening to this. It would be amazing to get your feedback in person so you can adjust for exactly. So yeah, you never know. But of course, we've been very open about way the way we offend other people. I hope there are not too many scholar people there. So

00:08:56

Speaker

We'll see. That's gonna be a big problem. But anyway, let's get on to the main discussion for the topic for today. So, concurrency and parallelism.

00:09:07

Speaker

So there have been a lot of discussions about our confusion about what is concurrency and what is parallelism. So we're going to throw in our opinions into the whole confusion part and then hopefully try to untangle it a bit. I think we're a bit confused ourselves, aren't we? Of course.

00:09:28

Speaker

I think this is like a therapy session, isn't it really? It's kind of like, what do you think? How do you feel? Do you have any problems as a child? Did you have concurrent parenting?

00:09:44

Speaker

I think we should start with, hello my name is Vijay, I just googled concurrency versus parallelism. That's like a concurrency parallelism anonymous meeting. The thing I always think about is like

00:09:59

Speaker

is that it's concurrency is, the thing that's confusing about concurrency and parallelism is of course, it's all about things happening at the same time. So to some extent, if you say, oh, it's parallel, not concurrent, or concurrent, not parallel, you're kind of splitting hairs at some point. In the end, things happen at the same time. All that really matters, I think, in the end, is what programming constructs do we use?

00:10:29

Speaker

one thing versus the other thing. And what programming constructs, how does it benefit us to use the programming constructs for concurrency versus the programming constructs for parallelism? And I think that in the end, that's what's really going to matter. Because you can get yourself tied in the knots around what is

00:10:50

Speaker

You can just start at the hardware. At the very basic level, if you've got one single core, then in theory, it can only do one thing. But we know for a fact that operating the systems... And in fact hardware, if you have one CPU, it can only do one thing at a time.

00:11:12

Speaker

Yeah. So the whole point about operating systems is to fake the fact that you've only got one CPU. You know, so if you remember, you know, Linux and Unix, and I think that in the areas of Windows and DOS, of course, in DOS, when you were at the command prompt, you ran your command and you waited until the command came back.

00:11:32

Speaker

Yeah, so it's basically running one thing at a time. Yeah. It's only when you get things like modern operating systems with windows or where you can make a background job. I mean, remember the Unix days. There's an ampersand. You can run a shell that puts this task in the background. Things in the background. What a demon. A demon process. What's that all about? It's this devilish, literally.

00:11:58

Speaker

It's a devil. It's in the background. It's a devil, little devil in your machine demonizing things. It's fundamentally evil. It's fundamentally evil. Yeah. So you have these little background processes running there and clearly if something is in the background, is it running concurrently or is it running in parallel? Well, actually you don't really know.

00:12:21

Speaker

Yeah, that's true. But if there's one CPU, then it must be concurrent in the sense that it's faked. That's true. But if it's got two CPUs or you've got a GPU with multiple resources, then of course it can be running in parallel and you don't know that. Yeah. But I think that the whole concurrency

00:12:45

Speaker

Problem or not even a problem the concurrency came because of the whole time slicing right when so you have multiple tasks to do Yes, and but you have only one thing that can do the things so then you have this time slicing So I'm gonna allocate I don't know three milliseconds of my time to do task a and then next three milliseconds I'm gonna do task B then but in the compressed view of human time It looks like there are running parallely because the task switching is so fast. So the concurrency

00:13:13

Speaker

Yeah, the computer kind of masks it by just being so quick. Exactly. So in my understanding, at least as far as I understand, concurrency is essentially when you have just one thing that is going to do the tasks, but parallelism comes into the picture, then when you have multiple things that can do the multiple tasks at the same time.

00:13:34

Speaker

I think because nowadays we have this quad core machine so there is always some parallelism going on. I mean we're now speaking on Skype that is running and there is another program that is running in the background. I'm pretty sure the whole network monitoring shit and everything. So my operating system is already taking advantage of having multiple cores and running things parallelly.

00:13:55

Speaker

But if you switch to concurrency, then there are, as you were pointing out, both concurrency and parallelism mean that there are multiple things happening at the same time, but the way they're executed. Well, I think the other thing to say is that, you know, like we've been talking about with the hardware, is that

00:14:15

Speaker

The operating systems take care of concurrency for you by either time slicing, like you say, or being aware that something is going to talk to the network or talk to a disk. So they can essentially swap that process out and give something else a chance to run. So they either do this scheduling via resources. If you're listening on a

00:14:40

Speaker

an IO or a network IO or a disk IO, then they can essentially put your process to sleep and give someone else the CPU because they know you're not going to do anything. That's fine, but as you got through the 90s and the 2000s towards the middle of 2005, 2006,

Exploring Parallelism

00:15:00

Speaker

suddenly most law runs out on a single CPU. Up until that point, your computer program has just got faster and faster and faster. Your computer could do more things because, like you said, it was just masking the fact that it was really swapping things out.

00:15:22

Speaker

So once you get to the point where actually you need to exploit parallelism in software, then it's a different ballgame because suddenly the operating system itself and the CPU itself, they need to cooperate.

00:15:40

Speaker

Yeah. So if you think about it, like Windows, I was doing some research and it's at work actually for like big data problems. And if I looked at the kind of the stuff that you need to do, and basically you up until like Windows, I think it was Windows, I might get wrong here, but I think it was Windows 2000 was the first operating system from Microsoft that actually used multi-core machines.

00:16:08

Speaker

was capable of you. I think it could be faked in earlier versions, but Windows 98, but I think Windows 2000 was the first one. To get NT, but on the desktop, Windows 2000 was the first one. In fact, Windows NT was written from the ground up, wasn't it? Yeah, of course. So that's my point is that it kind of

00:16:29

Speaker

The operator in the system had to be rewritten to take care of multiple cores. What was interesting at that point was that if you were running, let's say you had a spreadsheet.

00:16:39

Speaker

then if you had some, if you're running like Excel 2000, then Excel itself was not capable of running multiple cores. Because it didn't have the .NET framework. Yeah, exactly. So then they made the .NET framework like multi-core aware, just like Mac did with the GCD stuff.

00:17:03

Speaker

And then Excel was rewritten to take care of the multi-core parallelism. So my point is that multi-core has brought parallelism into the programming world, and that's the big discussion, isn't it now? With all these functional programming and all this kind of stuff, and this whole notion of immutable data, all these persistent data structures and the tree-based structures, which we'll come to in a bit more detail.

00:17:29

Speaker

But that notion of the hardware essentially requiring us to take advantage of its parallel nature is the thing which is changing in the last five years, last ten years let's say, but certainly the last five years. And you can no longer rely upon faster hardware bringing you any perceived benefits.

Concurrency in Clojure

00:17:54

Speaker

But I think in closure, I think the concurrent concurrency story is much more mature than parallelism story, right? I mean, because concurrency is one of the one of the unique selling points of closure that people are at least rich even announced closures. Okay, this is the whole STM based concurrency and then having immutability everywhere that helps you with shared state concurrency.

00:18:18

Speaker

So you have all these fundamental constructs, like you have the vars, refs, agents, and atoms. And then they are operating on immutable data structures. So if I remember correctly, you can achieve concurrency based on either shared state, I think. So you can share the memory.

00:18:36

Speaker

And there are multiple thread sacs in the shared memory. So that's how you create one kind of concurrent programs. And the other way is basically something like an actor model, where you have this message is passing through different threads and the data is passing from one thread to the other or one actor to the other. So that is actor based concurrency or message passing based concurrency, I think so there are two different ways of looking at it.

00:19:01

Speaker

But in closure, the shared state thing was like the bottom or one of the foundational constructs with atoms and agents. I think that is one of the things that people or at least we say STM is going to help you with writing concurrent programs without worrying about locks or writing synchronization by hand. So I think concurrency is much more mature in closure.

00:19:27

Speaker

But I think that's like you say, that's like old school Java concurrency, isn't it? Where you have multiple threads all running, but trying to access a common state. A common resource. And that's definitely very old school, and you could do that in Java, but obviously

00:19:48

Speaker

It was much more complicated. The locking strategies were much more complicated in the Java world than they are in the Closure world. And that's the source of many bugs, isn't it, in imperative and all programming is this whole notion of things not being initialized properly off by one errors, the usual kind of crap that you get with these mutable data structures.

00:20:16

Speaker

Yeah, but I think there is, if you see the different constructs, I think there is plenty of documentation available like which particular STM based thingy that you need to use in what case because most of the things are

00:20:32

Speaker

For example, if we use STM, then you need to use software transactions. So using Do Sync or I think Alter or Do Sync. Yeah, Do Sync is basically to cover the transactional operations on a given dataset or given data structure.

00:20:48

Speaker

And also we have asynchronous stuff using agents. So atoms are essentially if you want to share data between multiple threads without worrying about okay, this is going to modify the you don't you don't want to get the concurrent modification exception. So that that's where items are going to help you with

00:21:06

Speaker

So I'm going to put some data into an atom. And then there are other threats synchronously accessing the state. And at the same time, you can also modify the atom itself without worrying about getting exceptions. So so without using any any locks. So I think that is the simplicity of of using an atom instead of using using lock based concurrency or writing synchronize methods in Java.

00:21:30

Speaker

But then I think the other one is most popular one is atoms and agents. I haven't seen too much refs in the wild though. I'm not sure maybe the use cases are not really that interesting for the real world cases. I need to check maybe because that is there is also a difference between the coordinated versus uncoordinated change of the data.

00:21:54

Speaker

So if you want to change any data in in a ref, for example, then it has to be part of a transaction. So you need to do that in do sync, or a ref set, I think there are a couple of functions available.

00:22:07

Speaker

Yeah, it's complicated, isn't it? Because of the ordering of things and the optimistic things. Yeah, and also retrying. Because if it fails, it's going to retry. Yeah, I think the use cases for that kind of stuff is a bit like two-phase commit. It's theoretical, all very nice, but no one ever does it because it's just horrible for performance.

00:22:33

Speaker

Yeah, I haven't seen any any do sync stuff much, but maybe well, my my closure code reading is not that much as somebody else would do. You need to get someone to call us in. Yeah, I'm around and tell us a bit about it. People who you who are using do sync, please call us up, then let us know your experience. That'll be cool. But the real world use cases, like you say,

00:22:58

Speaker

Yeah, yeah. Because I didn't get any mileage from from do sync stuff yet. But STM stuff. But primarily, I think I wrote atoms mean I use atoms everywhere, especially enclosure script, well, they're really useful.

00:23:12

Speaker

Well, agents are not available on the closure script. Of course. Yeah. Yeah. So anyway, so that's the basic. I think the thing about atoms and stuff like that, though, is because it's all basically assuming that you've got multiple threads accessing some common state. Yes. So you get concurrency that way. But that's, I think, very different to the concurrency models of things like JavaScript

00:23:40

Speaker

which I think is really the basis of concurrence we really want to talk about today, isn't it? Well, also, you know, based upon this notion of Node.js and stuff like that. So, you know, there's a lot of these concepts that

00:24:02

Speaker

that actually, to write an efficient HGP server, it's not a good idea to, and Java people knew this, of course, before JavaScript guys did. Jettie and Comet and all this kind of things knew about continuations. But that's what you need to do. You need to basically, rather than using a thread per request, which is the

00:24:30

Speaker

which is very hungry on the memory. You need to use one thread or a very small number of threads and basically park the sessions or the states for the inactive threads until you get some IOR coming back.

00:24:50

Speaker

That's the basis of Node.js, but obviously the problem with Node.js is this callback crap that you get, this Christmas tree code where you end up saying, all right, I'm going to do something in my callback, and then I'm going to do something in that callback, and I'm going to do something in that callback, and the other callback, oh my god. Once you get beyond about two or three levels deep, it just becomes totally horrible to reason about.

00:25:16

Speaker

And that's why they call it callback hell, isn't it? Because yeah, yeah, because you have no idea of the execution of the program and you can't, you can't, you can't compose it. That's exactly. Yeah. And also you cannot make a mental model of the program because you keep thinking, okay, this happens, this happens, this happens. And then it, it's like a huge mess.

00:25:37

Speaker

well it becomes a christmas tree in the court because you start off you start off at that in the middle with like twenty characters that it becomes twenty five characters and as you're in den down it becomes like this huge normal for the characters to write all of a sudden yes so so that's a nightmare and of course what these guys do now uh... in the javascript world is they use these promises still a bit christmas tree ish

00:26:05

Speaker

But eventually, they're going to move, I think, to this async await stuff in ES7. But this is what the goal language has for a while, and that's what this core async does now, isn't it, is this concept of a block of code, which looks like it's just sequential code, but will happen when some data comes into a channel.

00:26:29

Speaker

So it's essentially a callback model, but that callback happens in the code transformation or at the lower level. Because the code, when you read it, it still looks like imperative or step by step. Yeah, it's very easy to reason about.

00:26:48

Speaker

Yeah, underneath it all, I think they're basically these things like go loops and go blocks. They're just basically macros, aren't they? Yeah, they are. I think that is one of the things that's surprised or maybe think that people use as the nicety of Clojure or the Lisp world is that you can just write a macro.

00:27:13

Speaker

that can convert the code from, that is looking like imperative code, and then that is converted into some sort of a state machine during the compilation phase. And compared to Go, because in Go they have this language level support for these channels and Go routines. But enclosure is just library. So, and then the Go... I don't know though, actually. I mean, it's a funny thing, isn't it, in some ways. I mean, I think it's a kind of like engineering wonder that you can make this macro.

00:27:41

Speaker

Yeah. But it's got a problem as well, though. I mean, one of the reasons why we, you know, like, people often say, don't they, about closure, you know, it's like data first, then functions. Yeah, then macros. Yeah. And yet, this is kind of like quite an important thing, the macro of the core async.

00:28:02

Speaker

But actually, you can't compose it. Yeah, that's true. And I find that I don't know. I mean, I'm not sure what, to be honest, I don't know what difference it would make if it was if it was in the language, in terms of composability, I don't know.

00:28:18

Speaker

But in the language would mean the Clojure Core need to change. So I think that is the... I mean, you say that, but actually it was Timothy Baldridge and Rich Hickey and they are the Clojure Core team. So if anyone wanted to change it, they had the power to change it. Of course, but then it would be a special form or something. It will still be implemented as a macro, I would guess, but it won't be a library, but it's part of the Clojure Core and it will get the special treatment like the if and whatever the special forms that we have.

00:28:48

Speaker

But yeah, maybe that's, I don't know, maybe we should get Rich Hickey on the podcast and then maybe we can ask him, what do you think?

00:28:56

Speaker

would be interesting. I've never actually seen a thorough, I mean I know that people talk about the macros thing, but I've never actually seen a proper, or never heard, maybe there is such a thing, and I've just missed it, and follow up welcome, you know. But I've never seen a thing saying, ah yeah, it was a macro because this, that and the other. You know, apart from the convenience factor of not changing the language, what are the real actual benefits? I don't know.

00:29:24

Speaker

Not for the Go macro, but I remember reading somewhere that one of the libraries for probably Redis, I think, there is a macro that generates the code at the compile time for all the functions that are in Redis API that is specified in the JSON spec. So I think there are some use cases for macros,

00:29:47

Speaker

probably Go macro is one of them. But of course, I either we're gonna get Rich Hickey here and Timothy Walridge onto the podcast, or we're gonna spend more time understanding this. And then we can come up with, okay, this is why this is a Go macro, or Go is a

Core.async in Clojure

00:30:07

Speaker

macro.

00:30:07

Speaker

I think it's fabulous that it's been done, don't get me wrong. I'm just saying that it is awesome that such a feat can be achieved. I just don't know enough about what the benefits would be the other way around. If it was composable, for example, because it's a macro, it's not easily composable. That's a downside. That's all I'm saying. That's true. Whereas if you have them as functions,

00:30:38

Speaker

and some kind of support in a language, then maybe it would be more powerful even.

00:30:47

Speaker

That's true, but when you think about as a designer of a programming language, it's all about trade-offs, right? I mean, you need to pick one way or the other. So every approach has its own advantage, and every approach has its own disadvantage. So you need to see what is a trade-off here. So making it a library and then not imposing it into the core. And also, one of the things that, in this case particularly, is that

00:31:15

Speaker

Channel based concurrency is not going to be the holy grail for everything. So it doesn't need to get language level support. Maybe that's could be another speculation from from my side that because there are there are still other other types of concurrency models like actors and

00:31:33

Speaker

Yeah, channels and then of course you don't want to go to the thread level because that's too low level. So maybe there is still scope for other types. That's what I remember when Core Async was announced and people were asking questions about why not actor-based thing because Rich Hickey said I think channels are much more interesting way of implementing concurrency, not the actors. The idea about the acting thing is that the caller has to know the callee.

00:32:00

Speaker

And that's the big difference between the core sync model where you put some data on a channel and you don't know what happens to it. And it can be picked up by many people, many people listening on that channel or many people, many processes listening on that channel.

00:32:20

Speaker

I'm a fan of that, definitely. There is a decoupling between the color and the color. It actually makes the whole system more composable, I think, and more loosely coupled, which is something that we like, isn't it?

00:32:36

Speaker

So, and I think actually on the, I can't remember the details, but on one of the closure things, there's a, I wouldn't say a denunciation of the active framework, but definitely a kind of like, you know, we prefer, we don't like that from it because of A, B, and C, and certainly the linkage between the typed messaging of the caller and the callee is definitely a big negative, I think.

00:33:04

Speaker

Yeah, that's true. I think if you see Akka, for example, well, they're working on typed actors. So that might be there in the future. But so far, it's well, I worked on Akka without any typed actor stuff a bit. So the

00:33:19

Speaker

so actors can consume anything so there is no typing involved so that is where one of the nice stuff that you have in Scala world is that or Scala that you have these types but then once you switch to actors then the types are gone and you send a message to any actor and then it might choose to not to respond or not to consume it or well it will consume it but it won't act on it because it the method itself accepts any

00:33:43

Speaker

so you don't know and there is no compile time check to tell you that hey this I can only accept these kind of messages but there is there is a typed actor thing happening there as well but anyway let's not talk about actors much because I think channels are the way to go obviously

00:34:02

Speaker

Well, actually the interesting thing is that like, like you're just saying there, you know, you can now start to do things like put, you know, specs around the data, for example. Yes. And, you know, you, you can do some interesting things there. But yeah, I mean, I've done a little bit of programming with the Core Airsync stuff and it seemed, it seemed very nice to me. I read a few blog posts about it and I think, I think there's some very powerful primitives and some nice words that you can combine data.

00:34:32

Speaker

I did some time series stuff on it again, which was good. But the thing which I thought was, I was talking to a friend of mine from the Belgium meetup actually, and he was saying that in the Go blocks, there were some problems with exception handling.

00:34:51

Speaker

And he found that in the end it was getting quite annoying to have an exception handler for every single statement almost in the goal block.

00:35:04

Speaker

and he ended up abandoning it and going to manifold from Zach Tillman, which was less or more consistent in that respect. But anyway, I certainly found Chorus in very nice and I think, actually to be honest,

00:35:24

Speaker

co-racing, like it or not, is the £800 gorilla in the concurrency world enclosure, so like it or not, it's here to stay. And I actually like it a lot, so I'm not upset about that. But what I was going to say was that the nice thing about it is that you can have all kinds of

00:35:46

Speaker

filtering on stuff as well. So you can run these kind of transducers on the channel as well to transform the messages and to filter the messages and all these kind of things. And they're really incredibly easy to write. I was super impressed by that. At first I thought, whoa, a transducer, whoa, that's going to be hard. It's going to be quite a complicated thing. But it was so trivially easy.

00:36:11

Speaker

Sometimes you can get put off by the names of these things, but actually it's just a simple map or filter function. It's so easy to make the input of a channel converted to another, to either strip stuff out or to convert stuff to put on another channel. It's so easy.

00:36:35

Speaker

Yeah, I think I had some understanding of transducers. So I was thinking that there is this seek abstraction. So any function that can work on a seek is going to work on this one.

00:36:51

Speaker

But then transducers are next level of abstraction that says as long as I can ask for the next thing in it, you know, that is fine. So obviously, because of the seek abstraction, you could use all these functions on on vectors and lists and yeah,

00:37:09

Speaker

or anything that is implementing a seek. But obviously you cannot use that on a channel. But channel is also something like a collection. So you can ask for the next thing. Is there a next thing available? I'm going to work on it. Yeah, it was a stream of events actually. Exactly. So you can treat the list as a stream of events or stream of things. And you can treat channel as a stream of things. Only the difference is that there is some sort of a time component in the second situation probably.

00:37:38

Speaker

So the transducers are, in my understanding, it's the next level of abstraction that says, okay, I'm going to work on anything that is mappable and that means anything that has a next thing available. So it could be a channel, it could be a collection, it doesn't matter. So you can just create a function that is going to work on anything that can ask for the next one and anything that you can combine. So that's what my vague village idiot understanding of transducers.

00:38:05

Speaker

Well, yeah, you're right. It's unifying. The whole stream thing is quite unifying, though. Although you're right, you can inject time into these things. You don't have to. It's very easy, for instance, with Core AirSync to have a timeout channel. So you can just say, OK,

00:38:31

Speaker

put a timeout channel in the mix and your listeners can listen to either one channel or they can listen to multiple channels and if you're listening to multiple channels one of those channels can be a timeout channel and if you get if when that timeout channel closes you will get an event to say that channel is closed and you can then react on that which is a classic way to

00:38:58

Speaker

to rather than going using like a thread sleep for instance if you did a thread sleep inside of a channel that would be bad yeah because one of the things you shouldn't do when you're in the channel world is do blocking operations and of course sleep blocks on the thread yeah so you have to do that's why these telnet channels are nice because okay in the background it's doing some system

00:39:26

Speaker

sleeping thing. Actually, I don't exactly know how it works. But I'm guessing that's what it's doing. It's got a thread tool in the background. Yeah, of course, there is one thread attached to it. I remember this. Yeah, sorry, go ahead. Yeah. And so anyway, my point is that, you know, while you're in it, while you're in the, the, the go block,

00:39:45

Speaker

you should just be doing non-blocking operations. You should just be doing straightforward things that are all on the CPU, just transforming data or manipulating data. You shouldn't be doing sleeping or IO, those kind of things. Those should be the edge of your system.

00:40:09

Speaker

But it's really nice to be able to combine these operations informally. You just basically put some events on some channels, and then someone else can compose the way that these events are handled later on. So I think overall, actually, despite my notion of saying that macros and functions are not composable, et cetera, I think the core async model is actually very, very composable and very decoupled.

00:40:38

Speaker

So I like that because in the end it's the data on the buffer that you're modeling actually. I was going to say the other thing that's worth discussing about with co-racing is this buffering concept actually. Because one of the problems with asynchronous operations is

00:41:02

Speaker

very very aggressive writers or very very lazy readers and this whole notion of back pressure is essentially a problem because what do you do? There's always an interesting question, I'm writing a log file into a channel and

00:41:27

Speaker

If someone doesn't read from that channel, well, eventually memory is going to get full, isn't it? Yeah. And something's going to go wrong. So what happens in Core Async is by default, things are unbuffered. So if you write something, then you can't write again until someone reads. Yeah. So it's blocking. Kind of, yeah. But normally, of course, what happens is that you will use a buffer.

00:41:57

Speaker

So when you create a channel, you can say, I'll have a hundred or a thousand or ten thousand messages as a buffer. So then what's interesting is then you have some different types of buffers.

00:42:14

Speaker

because you can choose to do. So yeah, so you can have, there's like I think four different buffers. There's ones where basically if they're not being read out, it will drop your rights. So in other words, it will be first in,

00:42:36

Speaker

So the consumer will see the first messages that you put in there. Then you have these other ones called sliding buffers where it will drop the messages that are already in the buffer and take your new ones.

00:42:51

Speaker

So this is, we should call them like first in first die or something, if it is a dropping buffer. So it's going to drop from the first thing that gets in, if it is not processed, it's going to be dropped, right? The next one is going to, so it's probably first in first die. So the messages will be really careful getting into the channel.

00:43:09

Speaker

Yeah, yeah, that's true. But I think what's interesting about it is that you have sort of control in your environment as to what policies you're going to apply to different messaging, different channels. So depending on the nature of the data, you can choose to either drop it at the beginning or drop it at the end or whatever. And of course, the funny thing is that I think overall, people don't like this dropping thing.

Message Durability and Scalability

00:43:38

Speaker

Yeah, that's that's what I initially when I heard that, hey, there is going to be a dropping buffer, that means it's going to have some sort of a buffer capacity, and it is going to drop the stuff. And if you think about it, in some cases, it really makes sense. For example, if I'm tracking mouse movements, obviously, I'm not interested in every, you know, movement of the mouse, and I'm all interested in the latest stuff. So that is a reasonable assumption that I'm interested in, I don't know, 20 mouse movement path events.

00:44:06

Speaker

And I only want the latest ones. Exactly. So if I'm not fast enough to process them, and it is fair enough that you drop them. And when you think about other cases as well, in the real world, people think it's actually losing data. But in some cases, you're not really that attached to historical data, you're interested in what is the current one.

00:44:28

Speaker

So in that case is that the dropping buffer would make sense But that trade-off is going to give you more control on the program's behavior as in I can be sure that this program is not gonna die because of eating a plot of memory So I also think that if you're gonna if you're really if you're really invested in this data like this logging data or whatever You know if you want to

00:44:50

Speaker

make sure you keep every message, then you have to use something which has got greater robustness than one single process. So that's why to me things like Kafka are so useful because they're very large scale, distributed, fault tolerant, all these kind of things.

00:45:12

Speaker

So, you know, you get that kind of system. So actually, it would be quite interesting if the closure guys are working on an equivalent of the atomic for core async, you know, if you could imagine like scaling up the collections, which is what they've done. Essentially, they've scaled up the collections for the atomic, haven't they?

00:45:33

Speaker

I think so. I'm not familiar with the database. You have a bunch of persistent collections which are now durable in the database so you know what's changed.

00:45:50

Speaker

They're all immutable, persistent collections, but now you can scale them up by persisting them on a disk somewhere and having multiple readers and all this kind of good stuff. If you can imagine that doing the same thing for core async, then that's kind of what you're doing with Kafka, essentially. Yeah. Of course, I mean, at that scale, you wouldn't

00:46:12

Speaker

write just closure collections to deal with it, right? I mean, obviously, if you need that size of data, you need an external system to handle it. But this is good for most of the use cases where you don't have that so called quote unquote big data, which the stuff that doesn't fit into your memory. So it makes sense that for for those any smaller use cases, it's just going to be okay. So

00:46:36

Speaker

Obviously, but I think it's also even for the cases of the Kafka, whatever, you know, if you, you can read from a Kafka stream into a core async channel. Channel, yeah, yeah. Because it's really, to me, what core async is about is about the way you organize your code.

00:46:53

Speaker

I only use Kafka as an example of basically making sure that if you're really caring about the durability of your messages, then that's what you would do about it because everything in core racing is ephemeral basically. If you want this whole thing to be durable, then you have to have some message broker basically.

Futures and Reducers in Clojure

00:47:14

Speaker

That's true. Anyway, so before moving to parallelism, I just remembered my train, my train of thought has parked or arrived at the station. So now I recollect what I was talking about. I was about to talk about the futures because that is also one of the things that you can use to achieve concurrency enclosure, because that that would tie in nicely because

00:47:35

Speaker

For the case of parallelism, we have some way around, initially there was this parallel map and that is essentially implemented as the PMAP, which was implemented as a future essentially. So for every function, so you have a collection of things and you do PMAP instead of a map that uses future by allocating a thread pool of I think number of cores plus two.

00:48:02

Speaker

And then you have a thread pool, and then that is what we use to actually execute the parallel mapping stuff. But other than that, there is a much. That was a kind of Pullman's fork join, wasn't it? Exactly, yeah, yeah. But I think reducers are the example of fork join parallelism, right? Right, yes, yes, yes. So maybe let's move on to reducers, like you said, because that's

00:48:30

Speaker

That's where we start moving away from this concurrency model. To parallelism. Yeah, okay. Exactly. So basically reducers, when they were announced, they are going to work on anything as long as they are foldable, I think.

00:48:47

Speaker

like the transducers are on mappable, these are on foldable, anything that can be folded and you have this list of or a collection of items and then it will use the Fort join framework to divide the given collection into smaller pieces and then apply reduce function or the reducing function onto each of these

00:49:15

Speaker

partitions. So that is based on fork join framework in Java 1.7 or 1.8? 7. Yeah, yeah, under seven. So, so reduces all the ones that brought in parallelism, but I haven't seen, I think there were some benchmarks in

00:49:35

Speaker

probably Richard Key's announcement post about how fast the reducers were compared to normal way of using concurrency. So you get to use all the cores properly. That's the key thing, yeah, is that you definitely get, you know, obviously it's like a lot of these things with fork joins, isn't it, is if you've got a fairly small data set, then

00:49:58

Speaker

the overhead of the management dominates things whereas if you've got a very large data set, then you can get into this whole work stealing business as well and can be very highly optimal.

00:50:14

Speaker

Yeah, I mean, it's like having multiple people to do the stuff. But if you if you the division of the work itself is taking too much time, then there is no point in applying parallelism to that one. And so if it takes more time for me to divide a given numbers to be

00:50:32

Speaker

If I want to sum seven numbers or eight for the sake of argument then I have four people and then I need to give them two numbers for each and then they just need to do an addition. Then the complexity of addition is so small that I'm spending more time dividing the task rather than actually executing the task.

00:50:54

Speaker

So I think it's like there needs to be a lot of, in my opinion, there needs to be a lot of experimentation to see where parallelism is going to help to actually make faster programs. But in a way, that's what functional programming is pushing for, right? I mean, because there are no cores, more cores available, but not each core is not going to be as fast as in the past.

00:51:16

Speaker

It's quite funny you say that because I think I told you earlier on that I watched this Guy Steele presentation that Rich Hickey mentioned in his documentation. And I think he was asked a question, and I don't know who asked him the question actually, but there's a lot of luminaries in the audience. Someone asked him a question about

00:51:43

Speaker

It might have been Joe Armstrong even, I think. Whoever it was, it doesn't matter. He asked him a question about what do you think about this exact point about modeling for performance in algorithms. He punted on that because he said, yeah, that's actually the most difficult bit.

00:52:02

Speaker

the most difficult bit is, you know, actually, you know, kind of breaking up the work is fairly mechanical. But determining whether or not it's actually going to be valuable to break up the work is very complicated. And of course, there's an overhead involved in that as well. And I guess if you think about it, it's the kind of thing which probably, and I've just thought about this, but, you know, if you think about

00:52:30

Speaker

where we see those kind of things in the real world is we see them in relational databases, in query optimizers, for example.

00:52:38

Speaker

Or in optimization problems in general, don't we? We see them in these linear algebra style modeling things where you try to work out backtracking solvers, these kind of things. I'm doing a bit of work at the moment on configuring product configurations. And it's definitely very, very slippery.

00:53:05

Speaker

to work out what is the actual number. It gets back into mathematics of NP-hard and NP-complete, all this kind of stuff. So maybe we should drop that bit in the conversation, because we're going to get completely lost in all of the background stuff. It's interesting stuff, but I think it's like you say, modeling exactly what is the exact profile of things that are going to benefit from this approach is actually unknown.

00:53:33

Speaker

That's true, but you triggered something like the performance part. So I was thinking, because I've been working, well, I'm working a lot with Spark and HDFS these days, Hadoop stuff. So it's like sometimes just doing a grep or WC is way faster than actually running a Hadoop program to do the word count.

00:53:51

Speaker

Because there is a, well of course it's not even parallelism, it's even next level up, it's a distributed work. So that means you not only have multiple cores, but you also have multiple machines. And then coordinating them and all sorts of crap. But there was something in Closure World, I think it was David Liebke or somebody who announced this EVOUT library that is a distributed state.

00:54:13

Speaker

that use a zookeeper as it is essentially like a distributed atom. But I didn't see much action happening in afterwards, I think. But that was an I think we have we're talking about concurrency, single machine, single processor, and then we move to parallelism, of course, single machine, but multiple processors, and then you have the multiple machines, multiple processors.

00:54:37

Speaker

that is a distributed world and there we don't have any interesting stuff happening in closure as far as I know.

00:54:52

Speaker

So I think actually they're definitely taking that approach from the ground up.

Tree Data Structures and Future Topics

00:54:59

Speaker

But it's, I think we want to talk about just parallelism and sort of the programming aspects first. I think we should, we've talked before with me about, we should approach the Onyx guys. And I think they're interested in potentially joining us on one of the episodes.

00:55:17

Speaker

So maybe we should leave that to when Michael or Lucas are going to be on the podcast, it would be cool to understand the challenges involved there. The thing, just coming back to this kind of like the reducers thing, the insight I think that Guy Steele made was that

00:55:35

Speaker

you know, functional programming as giving us these tree data structures, these tri-data structures, tree data structures. We call them trees now. Yeah, I think we call, we pronounce them tree, but it is a t-r-i-e. So, in fact, he even mentioned closure in his fortress presentation, you know, to say that Rich Hickey has got this thing which has these

00:55:59

Speaker

64-bit data structures, and it's all awesome stuff. I think it's 32-bit actually, isn't it? Yeah, 32-level branching factor. Yeah, 32-level. We said 64, but it doesn't matter anyway. It all boils down to small numbers. It's very big, yeah.

00:56:19

Speaker

But what he was saying is that these tree data structures are the foundation of the future of how to do reducing and faults and all these kind of things, how to do reduce in the future.

00:56:43

Speaker

The thing that you cannot do, you should not do in the future is you shouldn't have these do loops, you shouldn't have these iterations. You shouldn't be forced to iterate over a linked list data structure. So basically, trees give you the opportunity to have a very balanced parallel implementation.

00:57:06

Speaker

And I think Rich Hickey thought, ah, yes, okay. Well, he must have already thought that, but yeah. But, ah, yeah, okay. So Guy Steele has definitely given us a good blessing, but let's now make these reduces happen. So what he does there is he basically has some kind of function that does the work and then some kind of, you know, the folding, and then some kind of combining function.

00:57:31

Speaker

So you have these, and the real key thing about it, and the key thing he mentions in it, and I haven't seen his presentation for a while actually, but I remember watching his presentation a couple of times, is that ordering is the death of all parallelism, and that current parallelism like map and fold, even in closure itself now,

00:57:53

Speaker

and reduce, all start with zero, or they assume that you're going to have some function, or this function that you're doing, this mapping function that you're doing, this reducing function that you're doing, sorry, is aware that you're going to keep some kind of aggregate going through it, so some accumulator. And it's the accumulation that's the problem.

00:58:18

Speaker

So if you can strip out the accumulation into something which the combining function can understand, which is, in other words, it's associative, then you kind of blow away all of the restrictions that you've got in the data structure itself. You don't have to think about it as if it's a vector or as if it's, you can think of everything as unordered and everything is completely decomposable.

00:58:49

Speaker

Yeah, but that is well, maybe we should touch upon this stuff more on transducers episode. I was thinking we should we should dig deeper into into this one, and then talk about how transducers work and this abstractions and probably something for the newer episode. I think we have almost Oh, it's almost one hour. Right. Okay.

00:59:13

Speaker

Yeah, I mean, we never looked at the clock, I think these days. Looking from the first episode, it's always like, oh, 50 minutes, and then one and a half hour. And then I think we need to keep it in in one hour, though. But before we before we move on, I think a quick so we talked about a lot about concurrency. And also, I think we should post the guy steals.

00:59:35

Speaker

presentation link somewhere. So we post the show notes on Deaf and Audio. But as usual, we'd love to hear your feedback, whoever is listening, our fans, the plural number of friends who are listening to this podcast. And thanks a lot for listening. And the MP3 will be on SoundCloud.

00:59:55

Speaker

And it is also on iTunes. And next week, I think we will probably try to get a special guest. And but we want to keep it as a secret. So we'll announce it as soon as we get some sort of a confirmation, I suppose. Yeah. And, of course, thanks to our very talented pizzeri, I think, am I pronouncing that? Pizzeri, yeah.

01:00:22

Speaker

Putsuri for giving us permission to use the track and what is that called? Melon hamburger? Yeah, it's a vegetarian hamburger. Okay, yeah. Okay, so obviously that the music is also vegetarian. So this is the number one vegetarian closure podcast on this side of the pond with an Indian and a UK Brexit guy. I am not for Brexit.

01:00:50

Speaker

Anyway, bye bye! Bye bye! Cheers guys! Okay, thanks for listening and we'll wish you a new episode in a couple weeks. Bye bye!

01:01:22

Speaker

you