Oops!Zencastr was unable to start because Javascript is disabled
To fix this problem, check your browser's settings and enable Javascript

Become a Creator today!Start creating today - Share your story with the world!

00:00:00

00:00:01

#24 Enter the Dragan with @draganrocks

50 Plays8 years ago

Where we learn all about the expressive power of Clojure running at the speed of raw machine code on CPUs and GPUs.

Recommended

b8c570464f67 Siyoung image

b8c570464f67 Siyoung

01:29:13·2 months ago

2aba8aeedf7d Polylith image

2aba8aeedf7d Polylith

01:43:06·5 months ago

95a74d1e2f4d Bobbi Towers image

95a74d1e2f4d Bobbi Towers

01:19:09·7 months ago

7df1b8716726 Anna Colom image

7df1b8716726 Anna Colom

00:49:56·10 months ago

876b4c306172 Cora Sutton image

876b4c306172 Cora Sutton

01:29:30·11 months ago

3d32130bbe3d Drew Raines image

3d32130bbe3d Drew Raines

01:38:25·1 year ago

1a156fe163a1 Nathan Marz image

1a156fe163a1 Nathan Marz

01:31:59·1 year ago

6ae6acf85d52 Chris Pellets McCormick image

6ae6acf85d52 Chris Pellets McCormick

01:07:32·1 year ago

2afd69a4979a Kira Howe image

2afd69a4979a Kira Howe

01:47:47·1 year ago

8f5d5379091d Eric Normand image

8f5d5379091d Eric Normand

01:53:01·1 year ago

15e670d6e60c Kathryn Lawrence image

15e670d6e60c Kathryn Lawrence

00:55:36·1 year ago

Heart of Clojure image

Heart of Clojure

S1 E100 · defn

01:05:40·1 year ago

#96 Gary Johnson on Gemini Protocol, Clojure and off-the-grid life image

#96 Gary Johnson on Gemini Protocol, Clojure and off-the-grid life

01:58:25·1 year ago

#95 Arne (aka plexus) and Heart of Clojure image

#95 Arne (aka plexus) and Heart of Clojure

01:31:22·1 year ago

#94 Clojure, Go, Cloud Storage Tech and more with Albin, Aurelien, and Wouter image

#94 Clojure, Go, Cloud Storage Tech and more with Albin, Aurelien, and Wouter

01:33:52·2 years ago

#93 Malcolm Sparks Returns! image

#93 Malcolm Sparks Returns!

01:50:50·2 years ago

#92 defn.no with Magnar Sveen and Christian Johansen image

#92 defn.no with Magnar Sveen and Christian Johansen

01:30:13·2 years ago

#91 Josh Glover image

#91 Josh Glover

01:55:20·2 years ago

# 90 Jacob O'Bryant image

# 90 Jacob O'Bryant

01:18:25·2 years ago

#89 Kimmo Koskinen aka viesti image

#89 Kimmo Koskinen aka viesti

01:54:03·2 years ago

Transcript

Introductions and Name Pronunciation

00:00:15

Speaker

Welcome to episode 24 of Deafen with Ray in Belgium. And Vijay from Holland. And Dragun from Belgrade. So, Dragun Jurich. Yes. Yes. Have I got your name right, Dragun? Yeah. Well, almost perfectly. Not quite, but it's good. See if you can do it any better then. Dragun Jurich. No, it's Dragun Jurich.

00:00:44

Speaker

Dragonjurich. Okay, so I think it's this episode we're basically trying to pronounce his name. We'll just keep repeating it. Yeah. Let's try it. Where is it going wrong? So Dragonjurich. No, it's dragon, like rrr. Ah, dragon. Yes, something like that. Okay, dragonjurich. Yeah, jurich.

00:01:06

Speaker

It's Jewish. Yeah, it's Jewish. No, it's okay, because it's a bit like Dutch, you know, drag and Jewish. Almost perfect. You know, I'm just going to call you with your GitHub username, Blueberry. Yeah. That's pure racism, VJ. We're not having that. Come on. We're going to call you Blueberry. Right, from now on, hello, Blueberry.

00:01:32

Speaker

He's the guy who picked his name, so I'm not discriminating against any other Berries.

The Irony of Names and Software Simplicity

00:01:38

Speaker

Yes, but Blueberry is a Belgian comic book anti-hero. So Ray, as a Belgian guy, should know who Blueberry is. Well, I'm kind of... I'm just being integrated, let's say. Maybe it's going to be on the test.

00:01:59

Speaker

I've already taken the test, it wasn't that I got away with it. But it's quite funny that your name is so complicated, yet your library was not complicated, but it is to our tongues, but your libraries are uncomplicated.

Dragun's Tech Preferences and Setup

00:02:15

Speaker

Maybe we'll move on to that. Before we do that, we needed a little bit of introductory, a little bit of chat just to warm everyone up. We were talking beforehand about the fact that Mac is very popular and was developers, but you prefer the Linux desktop. I noticed at work, there's a lot of people using Linux desktops and I noticed with very many people in the Closure community now,

00:02:40

Speaker

These Linux's are coming to the fore. So, is 2017 the evidence desktop for Clojure? Well, I'm not sure because I'm not using desktop.

00:02:53

Speaker

Oh my god, you're definitely splitting hairs now. I'm using Linux without the desktop. Well, you're just going blind, are you? No. No, I'm using Emacs with Exmo-Ned. So no need for desktop.

00:03:14

Speaker

Yeah, and and I don't have a mouse on my computer. So that makes that makes more sense. No, no, I see. I see. So everything is everything is on the keyboard and magic trackpad or whatever it's called for scrolling on Google Chrome. So basically, for all my developers and development environment and everything else I do, I can do it with shortcuts on on my keyboard keyboard.

Journey into Clojure and High-Performance Computing

00:03:40

Speaker

So basically what you need is, you need a 3270 into some kind of Google backend or something. That's all you need. I don't even know what that is. Well, it's like an old IBM terminal, IBM 3270. Yeah, okay. Probably before I was born.

00:04:00

Speaker

Well, at least you're not using it like RMS does, right? Because if he wants to read web pages, he'll send a request to a server, and then the web page is sending an email to him. No, I have GUI. I have a good Chromium. I have VLCA. I watch movies and everything else. I just don't have a desktop. So no taskbar, no mousey stuff.

00:04:29

Speaker

But I have Xmonad. It's like a tiling window manager. Yeah, I know. Xmonad is written in Haskell, right? Yeah, but I don't use it because it's written in Haskell. Of course. I just set it up five years ago and I haven't touched it since then. So it just works. Now we know what to use. It had any side effects.

00:04:56

Speaker

Yes, side effects, it works. That's one side effect. So I'm not even a fan of Haskell. We'll get there slowly.

00:05:13

Speaker

So can you give us some, okay, we talked about your setup already, which is fascinating. And I think this is the first time when we had someone like someone with this kind of setup. So can you give us a bit of a background on, you know, where do you work and how do you get into closure? What do you do usually? I mean, I'm guessing most of the time you're configuring Xmonad, but rest of the time, what do you do?

00:05:35

Speaker

Um, I had some problems, uh, in communication here. So I, I missed half of the question, but I'll try to answer it. So, well, I think what we should do is we should just chop it up into like, like five questions rather than one big, longer. So because, because, uh, because, um, internet is a bit choppy, I think connection is a bit, uh, not great, but okay. How, how I got into closure, right? That's a question. Yeah. Yeah.

00:06:05

Speaker

So it was in 2009 and for some time before that I was trying to find some solution that will give me metaprogramming facilities for some stuff that I needed to do.

00:06:21

Speaker

And I was trying with aspect J and some complicated Java stuff and everything, but it was clunky and required lots of boilerplate and everything. And then the enclosure was, I think, released or about to be released. So it's maybe May or April of 2009. And I was looking into this before, but not really into detail because it was not quite practical.

00:06:50

Speaker

because of the lack of libraries or ecosystem of, or I just didn't know how to use it properly. So when Closure appeared, it clicked immediately. And fortunately, because I work at university, I can choose my own technology and what I do and how to do it. So it was an easy switch. I don't have VJs problems of like having to code in Scala.

JVM Performance in Machine Learning

00:07:20

Speaker

So I switched immediately and well, it was quite an easy ride since then. So I'm quite satisfied with what Closure offers. It has its rough edges or so, but it's manageable.

00:07:36

Speaker

Yeah, it's interesting because you're doing the closure stuff, but you're very focused on performance. So what was that story? Because I think you've been in some of your talks, you've talked about the JVM and sort of lower level libraries. But what do you think about the JVM ecosystem for closure? Is that something that attracted you? Or was it just the language itself? Then you figured you could get underneath it.

00:08:05

Speaker

Well, actually when I started doing closure, I was not into high performance stuff. So at that time I was more like a domain modeling, something doing with ontologies or modeling domains, some constraints or so. And at that time, so JVM wasn't a problem on the opposite. It's quite performant for that kind of stuff.

00:08:35

Speaker

But later, when I started looking more into machine learning, I realized that, well, JVM

00:08:47

Speaker

and similar environments are not performant enough because especially machine learning technologies that I'm interested in are quite computationally expensive. So really, really expensive, like billions of operations are needed. Can you give us an example of that kind of thing?

00:09:10

Speaker

Well, the most obvious example is Markov chain Monte Carlo sampling that I'd mentioned at your closure. So basically the point with this kind of sampling is you're exploring a hyperspace.

00:09:28

Speaker

Hyperspace is a space with more than three dimensions let's say so you have space with like fifty or hundred dimensions and each dimension is continuous so infinite numbers of possibilities and in machine learning there is a thing called

00:09:47

Speaker

the curse of dimensionality. So basically, if you have some methods and in one dimension, you can say, okay, I will sample really finally that one dimension. So I need maybe thousand points. I think in these thousand points, I will be able to discover with the shape, rough shape.

00:10:09

Speaker

of the function in one dimension. In two dimensions, I have to check 1,000 times 1,000, a million points. In three dimensions, in times 1,000, it's a billion points. In 100 dimensions, it seems like 1,000 to 100. So it's really by

The Power of GPUs in Computation

00:10:30

Speaker

brute force, you cannot compute it at all.

00:10:33

Speaker

like in million years. So basically all these methods try to somehow reduce the space, to randomly search the space, but not so randomly. So a lot of computation is needed, but these computations are numerical. So not much logic, more like millions of millions of computations or functions like exponent and sinus and cosinus and something, combinations of those.

00:11:03

Speaker

Basically, the point is that GPUs are really suited for such kind of computations and a lot faster than CPUs. And Java is, well,

00:11:23

Speaker

It doesn't have access to hardware facilities, even of CPUs, not to talk about GPUs, but even on the CPUs, you have many hardware facilities that can be used like pipelining or AVX instructions and SSE instructions and such things that can be used with numerical computations and give speed up of 10 times or 20 times or 50 times if programmed well.

00:11:51

Speaker

So Java doesn't have access to that. So basically that got me interested. I need performance.

00:11:58

Speaker

for a low price because I'm not Google or Microsoft. I cannot have a huge data center and just throw so much computation on it. So I really need to be like optimal. So get as much as I can from the cheap hardware. And that's how I got into researching how can I do this practically.

00:12:26

Speaker

but from closure because I love closure. So I love programming closure.

Exploring Neanderthal and ClosureCL Libraries

00:12:31

Speaker

So the library. Did you look at, sorry, go on, go on VJ.

00:12:34

Speaker

So I was looking at the libraries that you built so far for essentially for utilizing the GPUs. So you have, of course, Neandertal, that is the metrics library. And you have ClosureCL. So can you give us some idea about what each of these libraries do and what are they used for? Because I see one of it as a much more lower level library than the other one.

00:13:03

Speaker

And there is a third library, ClosureCuda. Yeah, this is for NVIDIA, right? So, what is the purpose of all this, basically? When you develop such kind of software, like numerical software, and you need access to all these low-level facilities, the point is that you need to

00:13:33

Speaker

Basically, the way that you program it is similar to Clojure, if you understand the map and reduce, basically you understand basic stuff of how this massively parallel computation is done.

00:13:49

Speaker

So Closure is a good approximation of what these libraries do. But what Closure cannot do is cannot access real hardware stuff. So it cannot tell the GPU what to do because it doesn't have any connection to it. So basically, let's start from Neanderthal because it's probably something that most users would try first. Yeah.

00:14:16

Speaker

So what Neanderthal enables you is to create some structured data like vectors and matrices. So basically blocks of data that you want to compute and then call some standard functions that are optimized for running on different kinds of hardware. So you as a user have to be aware where it computes and what

00:14:47

Speaker

does it need to compute, but you don't have to do it yourself. Neanderthal would do this for you. You just have to call it like matrix multiplication of, or, or, um, matrix factoring or like solving systems of linear equations or something like that. Also what Neanderthal does it, it says, okay, you can do your computation either on the CPU, like your processor of your computer or laptop or desktop or whatever you have.

00:15:18

Speaker

Or you can send it to the GPU for computations. Now, GPU can be AMD or Nvidia or something else, but probably Nvidia or maybe AMD. So Nvidia has its own environment, which is CUDA, right? So basically,

00:15:41

Speaker

you call Nvidia's CUDA driver and says, okay, do this for me, like send this kernel. Kernel is a small chunk of code that computes in parallel. So, Clojure CUDA enables you to write your own kernels and call them on your GPU on a really low level. So, for Neanderthal, you don't need to use Clojure CUDA

00:16:11

Speaker

Cl. Neanderthal does this for you. You just have to say, okay, I need this on my CUDA GPU. But it will get you only so far because most of us write some custom software, like 100% custom or like 19% custom. But each of us needs some custom stuff. So basically on Neanderthal is not enough.

00:16:39

Speaker

It's good

Deep Learning and Numerical Computation in Clojure

00:16:40

Speaker

for start, but if you really want to use some great software and that other people don't have and like be better than competition, typically you will need to write some custom algorithms. So basically Closure CUDA and Closure CL enable you to write your own low level code that runs on the GPU by using Closure. So you write small chunks of C, like really small kernels of C.

00:17:09

Speaker

All management is enclosure. Everything is host is in management and calling these kernels and everything else setting up memory is enclosure. And another benefit is that you don't have to compile anything. So basically you just run, uh, you fire up your REPL closure, REPL, and from the REPL, you have access to Neanderthal, you have access to closure CUDA, you have access to closure CL.

00:17:35

Speaker

You just provide those strings or files with kernels, provide some closure code that manages it, and you just can experiment in REPL. So that's the main benefit.

00:17:51

Speaker

OK, so how do you see the role of these libraries in the larger ecosystem? Because these days, I'm pretty sure you know about Cloud text as well, right? The closure machine learning, deep learning library. So you mean Cortex or Cortex? Yeah, Cortex. Yeah, Cortex, right? So how do you see using these libraries with computations that are related to deep learning stuff? Is there any relation between them, or do you think it's a person? Yeah.

00:18:21

Speaker

I have a look of Cortex, but basically I'm not so interested in deep learning because deep learning is really popular now and everyone is interested in it, but deep learning

00:18:36

Speaker

It's more used for perception. Like if you have pictures or sounds or such kind of information and you need to process it like classified or do some recognizing there or so. But I'm more interested in the methods for analyzing data. For example, typical closure company will probably

00:19:04

Speaker

collect data about users, how they browse some products, or how they do some transactions, or how they do stuff. So I'm more interested in data analysis methods, like biological stuff, biasing stuff. So I'm not so well informed about deep learning. Of course, I know the basics.

00:19:33

Speaker

I haven't tried Cortex. Now, how would you use Neanderthal with such libraries? How does it fit? Well, Neanderthal is...

00:19:43

Speaker

a more general library. So basically you would use it in machine learning or in physics or any other area that needs numerical computations. And of course you can build your own deep learning stuff on top of it. Maybe I will provide some libraries in the future like based on QDNN. Cortex is like specialized matrix, not matrix but the deep learning library.

00:20:12

Speaker

Yeah. So basically now you probably cannot, I mean, you can, you can transfer data from, from Cortex to Neanderthal and vice versa, of course. But there is no, some, some deeper integration now because Cortex, uh, used, I think the, uh, Java, uh, Andy for J for accessing, for accessing with a QDNN. So basically they, they, they, they use the Java interrupt for that. Yeah. Right.

00:20:41

Speaker

Yeah, I think maybe it's more interesting to some extent to talk about things we can do with it rather than things we can't do. I noticed one of the things I was looking at your blog actually and I noticed that in March you released 0.9 of Neanderthal

00:21:02

Speaker

And you said in that one that your notion of what it needs to do high performance in closure is beginning to take shape. Maybe we can talk a little bit about that because you said that you started to use Intel's math kernel library.

00:21:25

Speaker

And that seems like an interesting concept. You moved from an open source library to this closed source library, but you say it's a good, you know, an improvement. What's the story there? Well, I've noticed, I mean, I noticed it was obvious that many closure users have had trouble installing Atlas.

00:21:50

Speaker

Right. Because the point with Atlas is it's built in such a way that you have to, you have to build it on your own machine to optimize it, to allow it to choose the best performing kernel for that particular hardware. Maybe you can just back up a little bit about and explain what Atlas is a little bit. Atlas is an open source matrix computation library.

00:22:16

Speaker

So basically, if you're aware that BLAS and LAPAC are standards for numerical matrix computations, like really low level stuff, really, really low level stuff. This is a linear algebra library. Yes.

00:22:36

Speaker

basically BLAS is for primitive operations and then LAPAC builds on top of it for solvers and such stuff. So Atlas is an open source library for that. So it's quite a good library and it's open source. So it's really a good choice. But the point is it's built in such a way that you have to build it on your own machine. And of course you can use one

00:23:03

Speaker

the builds provided by your Linux distributions, for example. But in that case, some distributions don't build it properly, some distributions package it in such a way that it's not really always obvious how to use it. So any

Data Handling and Performance in Neanderthal

00:23:20

Speaker

competent Linux user wouldn't have a problem

00:23:24

Speaker

building it because it's like some make script with all the tools. Basically, it's not something really complicated, but most closure users don't have experience with that.

00:23:39

Speaker

So they have kind of, I don't, I can't say, yeah, it's not a fear, but they're not really confident trying it out. So I realized, one thing is I realize it's really a problem. It's not a problem for me, but it's a problem for a lot of users. So of course I like to help them also. But another thing that is also quite important is that

00:24:07

Speaker

Intel MKL is even more performance than Atlas.

00:24:15

Speaker

And what is even more important is that it supports some really important stuff that Atlas doesn't have, such for example, sparse matrices. And the point with sparse matrices is that the only really good open source implementation is GPL. So it doesn't fit into the closure ecosystem.

00:24:39

Speaker

So basically, switch to MKL enables two things, easier installation and distribution later, and some features that are not possible with Atlas.

00:24:55

Speaker

Maybe just to help us out a little bit here, I don't know. I know that AMD do things on top of like they conform to a lot of Intel standards. Is this something which is only specific for Intel or does it also run AMD chips? No, it runs on AMD processors also. But Intel says that they say it's optimized for Intel processors.

00:25:25

Speaker

probably it runs better on Intel processors than on AMD's, but I did some search on the internet and people claim that even on AMD processors, Sam Kyle is the most performant library. So I think the only downside is that it's not open source, but it is free to distribute. So it is not a free software and it's closed source.

00:25:51

Speaker

But you can distribute it with your software for free. And it was since the last year. So before that it required license for such things, but now it's free to distribute.

00:26:09

Speaker

So basically you've got like a closure library with this Intel core that you can use for this, let's say linear, classic linear algebra problems. So in terms of like use cases, have you worked up any use cases like the traveling salesman type use cases or optimization type things? Well, the thing is,

00:26:33

Speaker

I use it for Bayesian computations. So it's my use case since it's quite demanding and I use it a lot. So it's helped me tremendously. But maybe to wrap it up, how it can be useful for closure programmers. So that I think most people would be interested, not interested in the details so much, but how can it help me or what can I do with it? So basically,

00:27:03

Speaker

There are a few things. First, it gives you access to all hardware resources for numerical computations on your machine. So CPU and GPU to the full speed from closure. So that's the first thing. Another thing is it gives you an easy connection to the world outside JVM.

00:27:30

Speaker

So I made it in such a way that there is practically no coping costs to transfer your data outside of the JVM. So for example, if you have some C library, maybe that library doesn't have anything with linear algebra, it just needs some vectorized data to be sent to it. So do you mean to say that this, for example, this can be used in tandem with something like TensorFlow?

00:28:00

Speaker

to the extent that TensorFlow can receive primitive arrays. But I think TensorFlow is a C++ software, so it's a more complicated story. Well, how about things like computer vision, that kind of stuff?

00:28:18

Speaker

I suppose, yeah, the point is it doesn't care. So most of the, there are two kinds of like these high performance software that you'll find. It's C++ stuff and C stuff. C stuff is usually written in a way that it can be accessed from whatever environment you have. So you just need to provide row pointers to row arrays.

00:28:48

Speaker

that confirm to that software's view of how it will use that array.

00:28:54

Speaker

That's how MKL works and Atlas and such blast stuff. So they say, okay, give me a pointer to the array and give me some number, some strides or dimensions or so, and give me the pointer to the data. But C++ software often requires some data structures that are typical to that C++ software. So it's not really that easy to connect from Java.

00:29:22

Speaker

So Neanderthal can give you a free access to the C software, but not to the C++, it depends on how it's structured.

00:29:35

Speaker

But main, I think main foreclosure programmers, it's not so important, I think, because they usually are not so interested in writing low-level C stuff and compiling into multiple platforms. So they usually, I think, would like to say, okay, I have my REPL. Can I write high-performance software

Neanderthal's Role in Hardware Acceleration

00:30:00

Speaker

from the REPL, from the Clojure environment?

00:30:04

Speaker

So the idea that I want to enable with Neanderthal is that you don't need TensorFlow. You don't need, I don't know, OpenCV. I mean, obviously you need it because there is so much functionality there. But in ideal case, it would be, okay, my team has this wonderful idea. We have the algorithm.

00:30:30

Speaker

know that algorithm can compute some useful stuff. Now we have to write it. Can we write it from closure? And don't worry about multi-platform build and C++ and C and everything else. Can we write it from closure and have a closure program that does it at full speed? So that's the main idea of Neanderthal. So I suppose if you need to use TensorFlow, you would need TensorFlow's

00:31:00

Speaker

I don't know, Java, Interop libraries or so. But in Yandertal, the primitives or the closure data structures themselves, right? Or do you introduce different kind of primitives? Not even primitives, like the data structures that you use. If you say vectors, are they closure vectors, closure persistent vectors? No. Okay. Because that doesn't make any sense.

00:31:24

Speaker

Okay. Why doesn't, for example, if you, if you need high performance, uh, performance code and you use closure vectors, closure vectors are like, um,

00:31:41

Speaker

a bunch of references in Java Virtual Machine. So each number is not a primitive number. It's an object that wraps a primitive number and it's scattered all over memory. So you don't have a guarantee that cache memory will be used in an optimal way. So what does it mean? It means that

00:32:08

Speaker

If you compute like if you have data about thousand workers and then they schedule and their managers and then what they do during the day and you have to like write the software that provide like an information management software so it's quite fast for that kind of stuff.

00:32:32

Speaker

But most of the machine learning or linear algebra, these numerical algorithms, their complexity is not linear. It's usually quadratic complexity or like O of n to the power of 3 or even more.

00:32:49

Speaker

So basically, you need much more speed. You need to use cache memory in an optimal way. You need to use all these hardware intels, pipelining AVX instructions or whatever. So you cannot close your vectors and Java linked lists and Java array lists and everything else.

00:33:13

Speaker

it is really a bad tool for such kind of task. So what Neanderthal does, it practically maintains its own primitive memory, but it enables you to look at it as a closure object.

00:33:34

Speaker

let's say okay so you don't have to worry about low level stop that much i functions that that work on closure vectors for example they work on Neanderthal vectors as well uh excuse me i didn't understand that

00:33:50

Speaker

So all the functions that work on, so because closure subtraction is a seek sequence abstraction, right? The seek thing. So all the functions that we have that work on closure data structures, for example, closure vectors, they work on Neanderthal vectors as well. I mean, I'm thinking about like the user of this library, then what should I be aware of? So I suppose you mean like math. Yeah, yeah, yeah, yeah.

00:34:20

Speaker

especially lazy sequences, a bad abstraction for high performance numerical stuff. So that's the point. Forget about it. Okay. So, so we cannot use the same kind of things, but yeah. So, but map as a concept and reduce as a concept is a good, a good abstraction, even for numerical code.

00:34:44

Speaker

But unfortunately, Closure Maps and Reduces are built in such a way to be not quite a good match for that. So basically, if you want to write a code with Maps and Reduces, you would use Fluokitten. It's one of my libraries. That's what I was getting at, slowly. Oh, yeah. OK. So you would use Fluokitten, which offers alternatives to MapReduce that are

00:35:13

Speaker

Not only Neanderthal stuff, it's really a library that I wrote, I had written before Neanderthal, like four years ago. So it's like a category theory inspired software library that enables you these kinds of abstractions like fmap, fredges, fold,

00:35:39

Speaker

operation or so that are polymorphic. So if you call them on closure vectors, they will work with optimal algorithm for closure vectors. If you call them on the undertold data structure, they will call the appropriate optimized algorithm.

00:35:59

Speaker

So that's one of the combinations that you can use, but more, most often, I guess my preferred combination is to use Neanderthal and matrix operations. If they are appropriate for the algorithm, if not to write my own with closure CL or closure CUDA, that that's the best. I think some, yeah, right.

00:36:23

Speaker

Yeah, quick question is, how does, whether, I mean, maybe it's just not possible, but have you thought about how these kinds of things could fit into the transducer framework or the transducer model?

00:36:39

Speaker

The thing is,

Learning Resources for Machine Learning

00:36:40

Speaker

I'm not sure whether it's even needed. Could you give me an example? What would you do with transducer and GPU function or Neanderthal vectorized function or so?

00:37:01

Speaker

Well, let's say you wanted to put some data into a channel or something like that, and then you want to pull it off or put something onto a channel, but you want to have the results computed on the GPU or a filter or a map on the GPU before you put it into a channel for further consumption downstream, then it's quite nice to have

00:37:28

Speaker

that in a transducer because then you could use it on a on a channel or or on some other you know on another type of input and uh that's a theory anyway i'm just again maybe i'm just uh in bullshit land here no no you're not you're not but uh what would be the shape of that data like what the size of that data when you say data it's like 10 10 numbers or 10 million numbers or 10 billion numbers so

00:37:58

Speaker

Well, I think the idea is that it could be whatever you want. But then this map reduce operation that you're talking about would do its work before it goes onto the channel. So the idea about, to me anyway, the idea about transducers is that you provide the algorithms, you provide the functions that do the operations on the data in a way which is orthogonal to the output channel.

00:38:23

Speaker

Mm-hmm. Well, in that way, maybe, but I'm also maybe in bullshit land now. So the point is that may be an orthogonal to Neanderthal. The point is you would typically

00:38:43

Speaker

So maybe now it's time to just make a short break, because I think we got into the quicksand now. Not with the channels, but I think that our conversation got a bit more stretched into details, so people would be bored. Maybe we could cut out this first half an hour. So maybe we should speed this up. We've got to hope not. So the point is,

00:39:13

Speaker

Let's say we will get back to this, but maybe we should look at it from a different perspective. Let's say you're a Clojure programmer, and I'm a Clojure programmer, so you're not a statistician. You don't have a PhD in deep learning or whatever, so you know your programming

00:39:34

Speaker

Perhaps you'll have some math courses in college that you forgot, more or less. Maybe you remember some linear algebra, or I'll remember some linear algebra to the extent that there are such things as matrices, but you forgot even to what is eigenvalue or such things.

00:39:57

Speaker

Eigenvector or yeah, or so

Introduction to Bayadera for Bayesian Computations

00:40:00

Speaker

you you recognize the stuff but you forgot the details and yeah, so you don't have really idea how how even

00:40:10

Speaker

to use this to any benefit. But machine learning is such a popular topic. So you recognize that you would love to learn it, at least some methods. You feel that you can solve some problems with it. Maybe your company have been gathering data for the last five years.

00:40:34

Speaker

And now, and hoping that some data will answer some useful questions, but that they just is not coming. So all of you are creating some user groups and trying to learn those machine learning techniques. And you start reading blogs, looking at demos, trying some software, and

00:41:04

Speaker

You understand what those demos do, but you still don't understand how they do this.

00:41:10

Speaker

Okay, then they have these pictures and they teach their software to recognize like cows and hot dogs and not hot dogs, such stuff. But now my company has data about visa transactions and like vegetarian food sales in Amsterdam or so, right?

00:41:36

Speaker

So how can we use this perfect data? So you realize that you need to dig deeper into machine learning theory, let's say. So you browse for books, and then you pick some popular textbooks, right?

00:41:59

Speaker

So you're following me? You're in the same situations or not? Definitely, definitely. We haven't picked the books yet, but go for it. But you're on the Amazon and trying to build a wish list. I'm looking right now, yeah. Yeah, okay. I have just started ISLR now, so I'm in the next step.

00:42:19

Speaker

Yeah. So, so, so basically you recognize that it's important, but you cannot, you still, well, it's an alien land. And when you, when you open these textbooks, you won't find any loops, any functions, any transducers or reducers or Java release or anything. What you'll find there, what you'll find there is a bunch of,

00:42:47

Speaker

mathematical symbols and most of them are matrix notations. So they will say okay you have this vector and then you multiply this vector with the transpose of that other vector and you get a matrix and then you solve this matrix and find some other matrix and then you plug this matrix into some other computation that we describe only in matrix notation.

00:43:16

Speaker

And only to a bit we don't get into implementation details

Future Plans for Clojure Libraries

00:43:20

Speaker

and then magically this is the solution. So basically they give you some some nuts and bolts and they give you the details. But only in on the level of linear algebra so if you want to implement it there are two possibilities.

00:43:37

Speaker

Either you understand this stuff that they talk about really perfectly, and you're a programming wizard, and you write your own loops, and you basically re-implement this matrix implementation from scratch. Of course, in reality, you won't get very far with it. Or you learn some matrix software library,

00:44:03

Speaker

that will at least enable you to do some percentage of this stuff directly. And the rest you will have to write yourself or to use a library that some other people wrote. So that's the point, for example, of Neanderthal. You would love to get into the area, but all the literature is really

00:44:33

Speaker

either geared towards complete theory or applied theory, but applied to the level of matrix computations. So you need to know how to run and use matrix computations. And Neanderthal enables you like to recognize when you see some

00:44:53

Speaker

combination of matrices in a formula in the book, you will recognize, this is the matrix multiplication and this is linear algebra, like solving the system of linear equations. And this is finding eigenvalues and this is finding eigenvectors. And then these eigenvectors are plugged into this algorithm or so.

00:45:17

Speaker

You will still have to write something yourself, but only some part of it. And the point is, even those parts that you will have to write yourself, you would need speed. And you will get this speed through Closure CUDA and Closure CL.

00:45:35

Speaker

most of the time. That's the point. You also have a statistics library, right? The Bayadera? Yeah, Bayadera. And if you're curious about the name, why it's called Bayadera, some people in Germany recognize it because they went to the Adriatic Sea. In ex-Yugoslavia, there was a cookie, really good cookie.

00:46:03

Speaker

some nougat cookie called Bayadera. It's really a popular cookie which is called Bayadera. So it was nice and it's also have something similar to bias. So Bayadera is quite interesting to me and I think it's quite interesting given to the wider audience because

00:46:29

Speaker

Once, just the source repository on GitHub without any documentation got into the front page of Hacker News and it

Wrapping Up and Euroclosure Discussion

00:46:40

Speaker

got some hundreds of stars even without anything else. What's good with the bio data?

00:46:47

Speaker

It's much faster than state-of-the-art tools. So state-of-the-art is basically Stan. It's a C++ Bayesian inference engine that has an R like front-end. So basically, you will write some model in Stan's proprietary language and call it from R to do some Bayesian inference for you.

00:47:18

Speaker

Bia data enables you to do similar things from closure, but your computation would run on the GPU and we'll run instead of let's say five minutes, it will run in 300 milliseconds. And I translated a year ago, I translated most of the examples from the best textbook in this area, like doing Bayesian data analysis.

00:47:48

Speaker

which is really a great resource for learning Bayesian and probabilistic stuff. It's written by a biologist with real examples, not some hello world-ish stuff, but real examples from real scientific hypothesis testing or so. So really a great book if you want to go into this area. I suppose for most Clojure programmers it wouldn't be

00:48:15

Speaker

that interesting because they haven't decided to go Bayesian yet, I suppose. But if you want to go into machine learning, it's something you have to learn in statistics and probabilities and probabilistic way of thinking. And so you will come to Bayadero sooner or later if you want to create your own stuff.

00:48:41

Speaker

So maybe you could write an algorithm about what the probability of a closure program coming to buy it is definitely definitely. So that could be a good example. It's an easy problem. It's an easy problem. Awesome. You don't need GPU for that probably.

00:49:00

Speaker

Yeah, we don't even need GPU. But the point is, the point is by that is good example of how to use Neanderthal and enclosure CL and closure CUDA. So if you need some, for example, if you want to start and like go some through some tutorials and then wonder, okay, but what

00:49:19

Speaker

How do I do this exotic? How do I transfer this data? And then how do I call some complicated kernel on the GPU? You can browse Bioderra source and see how I did it with Bioderra and get some idea how you would do it for your own algorithm.

00:49:36

Speaker

So for the people who want to get into this GPU stuff, so if I understand correctly the first point of entry would be the Neanderthal. Well, if you want to learn only GPU programming as such, like low-level GPU programming, the point of entry would be ClosureCL or ClosureCuda. And I place

Cryptocurrency Banter and Conclusion

00:50:00

Speaker

a great importance of helping people to learn it. That's why for all these libraries I provide, I usually pick up a textbook.

00:50:12

Speaker

And I choose some good one, like not really the simplest one, not really the like only for PhDs, but I choose some good textbook that would get you from beginner to intermediate, let's say. And then I write examples, usually in form of tests.

00:50:36

Speaker

So I write the examples of how to use this library. And usually these examples are translation from the book examples. So basically, for example, if you want to learn ClosureCL and do some GPU computing, you would pick up ClosureCL in action book from Manning.

00:50:56

Speaker

It's really good from Matt's Carpino, I think is the author. It's really great book in the tradition of all the Manning books that is really good match between getting you started and giving you really in action stuff, not only just some hypothetical examples.

00:51:16

Speaker

And math gives a good batch of examples that get you from the simplest hellworld ones to some more complex ones. And most of these examples, at least those that make sense with ClosureCL, are in ClosureCL's test folder. So I haven't written like blog posts with narration.

00:51:42

Speaker

But you can follow the book and look at the example's enclosure and see, aha, okay, this is how it's done in the book. Okay, how can I do this enclosure? So you will get the explanation from the book, and then you will get the code from the tests. Okay, so what's your future plans for these things? So you have CUDA, which is for NVIDIA things, and you have obviously support for AMD GPUs as well. So where are you taking these libraries to?

00:52:13

Speaker

In the, let's say like this, the first steps now will be to give some more functionality to Neanderthal. I don't need it immediately, but I need these are important to show that the library is more or less featureful and complete and that people can rely on it.

00:52:37

Speaker

So during July and maybe August, I will implement lots of new data structures for sparse matrices, like banded matrices, symmetrical matrices. Some of it I already implemented, haven't checked to the GitHub yet.

00:52:57

Speaker

but it's on my machine. I will not, probably I will not put proper sparse matrices, this is from the next phase, but I will try to probably cover what BLAST and LAPAC covers. That's the first part. The next steps would be to bring Biodera to the level that can be used by, let's say, average closure programmers, because now,

00:53:25

Speaker

I wrote Bayadera with ClosureCL and optimized it for the AMD GPUs. And most people don't have that. Most people usually have CUDA GPUs.

00:53:39

Speaker

So I will probably also write port whatever is there to code also. I will try to polish it a bit more, write documentation and start writing some more tutorials. So I probably try to keep some pace of about maybe

00:54:01

Speaker

one or two tutorials a month, sometimes even three or four, but let's say two a month and try to help closure programmers like get up on their feet with this stuff. So I hope this basic linear algebra stuff

00:54:22

Speaker

was received quite well. I think it has something like 20,000 readers in the last month. Thanks to all for getting to the front page of the Hacker News, of course. But it was received quite well. So I can see that people recognize that this is useful for them. So I'll probably keep writing those stuff.

00:54:49

Speaker

and i'll keep like getting more functionality to it and i hope that because maybe i will introduce more more bugs because i i recognize that most people most people are like quiet before because

00:55:05

Speaker

There are many people that use these libraries and when I ask them, okay, I haven't heard from you. No, it's working great. So I probably need to introduce more bugs to get people to complain more and just show that there is activity.

00:55:28

Speaker

Maybe what you can do is every time they do a matrix computation, it's going to post a question to Stack Overflow automatically and then it answers. Maybe I'll try to convince Vijay to try these libraries so he can complain.

00:55:51

Speaker

Yes, I'll start doing that. But by the way, are you going to be at Euroclosure? Excuse me, I have some problems in connectivity. Are you going to be at Euroclosure? Oh, unfortunately not. Oh, no. I mean, my talk wasn't accepted. But fortunately, two talks would have to say some stuff about Neanderthal and ClojureCL.

00:56:18

Speaker

So maybe they pick another talks and I think it's better because instead of me just saying how Neanderthal is great, other people will say how Neanderthal is great. So I think it's better for the visibility and for convincing people to try to use this stuff. So basically it's not rocket science. You can use it. Yeah.

00:56:47

Speaker

If you're a Closure developer, you are ready to use Neanderthal. So I think we are almost at the end of the, almost at one hour. So I think we can wrap it up. Just to, I think one, of course, we are going to be at your Closure, Ray is going to be there and Ray will be there and I'll be there as well.

00:57:07

Speaker

And first, I'd like to give a quick shout out to Cognitec because they gave us one free ticket to attend. So thanks to Cognitec, guys, and Lin and Alex Miller, of course. And I think now we can roll the credits. And Dragon, thanks a lot for joining us and giving us the insight into what you're building and the high performance side of Closure. And hopefully, we'll meet you at your closure. You'll be attending there, right?

00:57:35

Speaker

No, not even attending. As I said, I don't have a talk, so basically, I'm interested in listening, but my funding can only cover if I have a talk. You know how at university you have to have a talk to be able to

00:58:00

Speaker

Exactly. So, unfortunately, I won't be able to attend. But we'll go there and then spread the word. Yeah, thank you very much. I can see I put you to sleep. No, no, no, no. Not at all, no. It's definitely some challenging stuff, though. I think, like you say, the

00:58:23

Speaker

the ability for us to get closure developers onto the kind of the linear algebra way is still coming. But I think you're doing great work in this area. Without this kind of stuff, then we've got no chance. Because if we just rely upon the JVM libraries or integration with Python or R, then

00:58:49

Speaker

people won't do interesting things in the closure world. And I think this is the real key important thing that you're bringing here. Yeah. Yeah. The key thing is that, that, uh, enable people to create a stuff enclosure that is not available on other platforms. So basically not to, not to just follow what is out there, but to, to try to, to create some killer app. Yeah. Excellent stuff. Yeah. Thanks a lot. It's the time to roll the credits. Yeah. Thank you for inviting me.

00:59:19

Speaker

Yeah, thank you for joining us. And as I said, we'll try to spread the word around your closure. And hopefully, we'll see you again. And I'll certainly give it a try to see what I can do, especially on the CUDA thing now. Because as you know, on a Mac, I have an NVIDIA stuff. So I'll give it a try, and I'll poke you. And maybe I'll be the guy who will be posting stupid questions on Stack Overflow. Yeah, just complain as much as you like. I will try to help you.

00:59:50

Speaker

Thank you. Just before you go, we've got links to kind of your GitHub and all this kind of stuff. But you said you can recommend some books as well for people like me who are just getting into this stuff and good beginning material. So maybe Zai can take a few months over the summer and then come back in the autumn with some questions. What would you like to learn?

01:00:16

Speaker

Well, you know, you mentioned you have some, uh, some books about, uh, about, uh, the machine learning Bayesian stuff. Okay. So for Bayesian stuff, I would recommend, uh, doing Bayesian data analysis second edition by John Krushke. Uh, also for, for like old school machine learning overview, there are two good books. One is, uh,

01:00:43

Speaker

pattern recognition and machine learning from Christopher Bishop, who is I think our researcher at Microsoft now or something like that. He's also a professor in USA. And there is also by I think Kevin Murphy, Probabilistic Machine Learning by MIT Press. These are three books that are like covers you with data analysis and also machine learning stuff.

01:01:10

Speaker

But before that, I would definitely recommend some linear algebra refreshment. And I recommended this book called, I think, linear algebra, something, something. You have it on my blog. Because it was a really good choice because the most important stuff when you pick

01:01:37

Speaker

A linear algebra book is to pick one for engineers, not one for mathematicians. So this one is for engineers and it was like $1 on second hand on Amazon. But when I recommended it now, it's like $40 or something like that.

01:01:58

Speaker

So maybe I can choose another one. Can you recommend a cryptocurrency as well? But we'll hold this back before we release this program. Just tell us, is it Ethereum? Come on, tell us. No, I can recommend you, but I'm not a believer in cryptocurrencies and I don't know anything about that. What he's going to recommend is dragon coins. I recommend dog coin.

01:02:29

Speaker

I thought you're going to come up with your own cryptocurrency. That is going to be mined by Neanderthal secretly. Every time somebody uses the library, you get a coin. Yeah, like veg cryptocurrency.

01:02:45

Speaker

We call it a veg coin. But it's not only coin, it's like a veg eater or something like that for some buzzwordy computation on the internet. We call it vegium.

01:03:05

Speaker

Anyway, so on that bombshell. You heard it here first, folks. And instead of Satoshi Nakamoto, you will get Ray and Vijay. Exactly. And then we'll have to go into hiding pretty quickly. We are practically retired because of this podcast, by the way.

01:03:32

Speaker

We can't do anything else, that's what I mean. It's not that we can retract. Yeah, well, just mention like globalization, Trump, Putin and usual buzzwords and I think like listeners will skyrocket, like all the audience will like be in millions.

01:03:52

Speaker

I don't think people are appreciating that much. But anyway, so I think that's pretty much it for this week. And hopefully we will release this before the UB. I don't know the people who are listening to this one.

01:04:08

Speaker

Most probably you folks will be listening it before your closure and we will be there at your closure with some deafened stickers and Ray and I will be there as well. So please stop by and say hi and if you want to punch us, please punch Ray.

01:04:26

Speaker

Yeah, I'm going to be doing a little hopefully a little on session there as well with functional programming jokes. Yeah. But I've got to get a bit of love on the GitHub page. So I'll put a link to that in the show notes as well. So yeah, so the more reasons to come on punch Ray. So yeah, please, please Ray, please tell at least one joke about Neanderthals. So right. So evolution is programming. Yeah.

01:04:57

Speaker

So that's it for the show. And let's roll the credits. First of all, I would like to thank Dragon again. Thank you for joining the show and taking the time on a Sunday evening, spending the time with us and also going outside and then buying a headphone, especially for the podcast. That's a lot of commitment. That's a first. Exactly.

01:05:22

Speaker

And our sound engineer who is fixing all our voices, Mr. Walter Dallart. I don't know if I don't pronounce it properly. He said he's going to replace it with proper pronunciation. And our music is done by Pizzeri.

01:05:41

Speaker

So please go and listen to those music. The music tracks on SoundCloud, the links will be in our doobly-doo thingy or the show notes somewhere at deafen.audio. So that's it from us for this week. Ray, any closing thoughts?

01:06:00

Speaker

No, I think it was just really good. Thanks again, Dragan. It was really an amazing little dive, even though it was in warm waters into the pool of linear algebra and matrix maths and all this kind of stuff. So thanks again for doing all the good work. And hopefully we can bring a few more people into the library because of this podcast. And I'll definitely try and make a few jokes about it. Yeah. Thanks very much.

01:06:29

Speaker

Yeah, thanks for calling me and inviting me. Bye bye. Bye.

01:07:06

Speaker

How do you stop this? It's a red button.