Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
#24 Enter the Dragan with @draganrocks image

#24 Enter the Dragan with @draganrocks

defn
Avatar
43 Plays8 years ago
Where we learn all about the expressive power of Clojure running at the speed of raw machine code on CPUs and GPUs.
Transcript

Introductions and Name Pronunciation

00:00:15
Speaker
Welcome to episode 24 of Deafen with Ray in Belgium. And Vijay from Holland. And Dragun from Belgrade. So, Dragun Jurich. Yes. Yes. Have I got your name right, Dragun? Yeah. Well, almost perfectly. Not quite, but it's good. See if you can do it any better then. Dragun Jurich. No, it's Dragun Jurich.
00:00:44
Speaker
Dragonjurich. Okay, so I think it's this episode we're basically trying to pronounce his name. We'll just keep repeating it. Yeah. Let's try it. Where is it going wrong? So Dragonjurich. No, it's dragon, like rrr. Ah, dragon. Yes, something like that. Okay, dragonjurich. Yeah, jurich.
00:01:06
Speaker
It's Jewish. Yeah, it's Jewish. No, it's okay, because it's a bit like Dutch, you know, drag and Jewish. Almost perfect. You know, I'm just going to call you with your GitHub username, Blueberry. Yeah. That's pure racism, VJ. We're not having that. Come on. We're going to call you Blueberry. Right, from now on, hello, Blueberry.
00:01:32
Speaker
He's the guy who picked his name, so I'm not discriminating against any other Berries.

The Irony of Names and Software Simplicity

00:01:38
Speaker
Yes, but Blueberry is a Belgian comic book anti-hero. So Ray, as a Belgian guy, should know who Blueberry is. Well, I'm kind of... I'm just being integrated, let's say. Maybe it's going to be on the test.
00:01:59
Speaker
I've already taken the test, it wasn't that I got away with it. But it's quite funny that your name is so complicated, yet your library was not complicated, but it is to our tongues, but your libraries are uncomplicated.

Dragun's Tech Preferences and Setup

00:02:15
Speaker
Maybe we'll move on to that. Before we do that, we needed a little bit of introductory, a little bit of chat just to warm everyone up. We were talking beforehand about the fact that Mac is very popular and was developers, but you prefer the Linux desktop. I noticed at work, there's a lot of people using Linux desktops and I noticed with very many people in the Closure community now,
00:02:40
Speaker
These Linux's are coming to the fore. So, is 2017 the evidence desktop for Clojure? Well, I'm not sure because I'm not using desktop.
00:02:53
Speaker
Oh my god, you're definitely splitting hairs now. I'm using Linux without the desktop. Well, you're just going blind, are you? No. No, I'm using Emacs with Exmo-Ned. So no need for desktop.
00:03:14
Speaker
Yeah, and and I don't have a mouse on my computer. So that makes that makes more sense. No, no, I see. I see. So everything is everything is on the keyboard and magic trackpad or whatever it's called for scrolling on Google Chrome. So basically, for all my developers and development environment and everything else I do, I can do it with shortcuts on on my keyboard keyboard.

Journey into Clojure and High-Performance Computing

00:03:40
Speaker
So basically what you need is, you need a 3270 into some kind of Google backend or something. That's all you need. I don't even know what that is. Well, it's like an old IBM terminal, IBM 3270. Yeah, okay. Probably before I was born.
00:04:00
Speaker
Well, at least you're not using it like RMS does, right? Because if he wants to read web pages, he'll send a request to a server, and then the web page is sending an email to him. No, I have GUI. I have a good Chromium. I have VLCA. I watch movies and everything else. I just don't have a desktop. So no taskbar, no mousey stuff.
00:04:29
Speaker
But I have Xmonad. It's like a tiling window manager. Yeah, I know. Xmonad is written in Haskell, right? Yeah, but I don't use it because it's written in Haskell. Of course. I just set it up five years ago and I haven't touched it since then. So it just works. Now we know what to use. It had any side effects.
00:04:56
Speaker
Yes, side effects, it works. That's one side effect. So I'm not even a fan of Haskell. We'll get there slowly.
00:05:13
Speaker
So can you give us some, okay, we talked about your setup already, which is fascinating. And I think this is the first time when we had someone like someone with this kind of setup. So can you give us a bit of a background on, you know, where do you work and how do you get into closure? What do you do usually? I mean, I'm guessing most of the time you're configuring Xmonad, but rest of the time, what do you do?
00:05:35
Speaker
Um, I had some problems, uh, in communication here. So I, I missed half of the question, but I'll try to answer it. So, well, I think what we should do is we should just chop it up into like, like five questions rather than one big, longer. So because, because, uh, because, um, internet is a bit choppy, I think connection is a bit, uh, not great, but okay. How, how I got into closure, right? That's a question. Yeah. Yeah.
00:06:05
Speaker
So it was in 2009 and for some time before that I was trying to find some solution that will give me metaprogramming facilities for some stuff that I needed to do.
00:06:21
Speaker
And I was trying with aspect J and some complicated Java stuff and everything, but it was clunky and required lots of boilerplate and everything. And then the enclosure was, I think, released or about to be released. So it's maybe May or April of 2009. And I was looking into this before, but not really into detail because it was not quite practical.
00:06:50
Speaker
because of the lack of libraries or ecosystem of, or I just didn't know how to use it properly. So when Closure appeared, it clicked immediately. And fortunately, because I work at university, I can choose my own technology and what I do and how to do it. So it was an easy switch. I don't have VJs problems of like having to code in Scala.

JVM Performance in Machine Learning

00:07:20
Speaker
So I switched immediately and well, it was quite an easy ride since then. So I'm quite satisfied with what Closure offers. It has its rough edges or so, but it's manageable.
00:07:36
Speaker
Yeah, it's interesting because you're doing the closure stuff, but you're very focused on performance. So what was that story? Because I think you've been in some of your talks, you've talked about the JVM and sort of lower level libraries. But what do you think about the JVM ecosystem for closure? Is that something that attracted you? Or was it just the language itself? Then you figured you could get underneath it.
00:08:05
Speaker
Well, actually when I started doing closure, I was not into high performance stuff. So at that time I was more like a domain modeling, something doing with ontologies or modeling domains, some constraints or so. And at that time, so JVM wasn't a problem on the opposite. It's quite performant for that kind of stuff.
00:08:35
Speaker
But later, when I started looking more into machine learning, I realized that, well, JVM
00:08:47
Speaker
and similar environments are not performant enough because especially machine learning technologies that I'm interested in are quite computationally expensive. So really, really expensive, like billions of operations are needed. Can you give us an example of that kind of thing?
00:09:10
Speaker
Well, the most obvious example is Markov chain Monte Carlo sampling that I'd mentioned at your closure. So basically the point with this kind of sampling is you're exploring a hyperspace.
00:09:28
Speaker
Hyperspace is a space with more than three dimensions let's say so you have space with like fifty or hundred dimensions and each dimension is continuous so infinite numbers of possibilities and in machine learning there is a thing called
00:09:47
Speaker
the curse of dimensionality. So basically, if you have some methods and in one dimension, you can say, okay, I will sample really finally that one dimension. So I need maybe thousand points. I think in these thousand points, I will be able to discover with the shape, rough shape.
00:10:09
Speaker
of the function in one dimension. In two dimensions, I have to check 1,000 times 1,000, a million points. In three dimensions, in times 1,000, it's a billion points. In 100 dimensions, it seems like 1,000 to 100. So it's really by

The Power of GPUs in Computation

00:10:30
Speaker
brute force, you cannot compute it at all.
00:10:33
Speaker
like in million years. So basically all these methods try to somehow reduce the space, to randomly search the space, but not so randomly. So a lot of computation is needed, but these computations are numerical. So not much logic, more like millions of millions of computations or functions like exponent and sinus and cosinus and something, combinations of those.
00:11:03
Speaker
Basically, the point is that GPUs are really suited for such kind of computations and a lot faster than CPUs. And Java is, well,
00:11:23
Speaker
It doesn't have access to hardware facilities, even of CPUs, not to talk about GPUs, but even on the CPUs, you have many hardware facilities that can be used like pipelining or AVX instructions and SSE instructions and such things that can be used with numerical computations and give speed up of 10 times or 20 times or 50 times if programmed well.
00:11:51
Speaker
So Java doesn't have access to that. So basically that got me interested. I need performance.
00:11:58
Speaker
for a low price because I'm not Google or Microsoft. I cannot have a huge data center and just throw so much computation on it. So I really need to be like optimal. So get as much as I can from the cheap hardware. And that's how I got into researching how can I do this practically.
00:12:26
Speaker
but from closure because I love closure. So I love programming closure.

Exploring Neanderthal and ClosureCL Libraries

00:12:31
Speaker
So the library. Did you look at, sorry, go on, go on VJ.
00:12:34
Speaker
So I was looking at the libraries that you built so far for essentially for utilizing the GPUs. So you have, of course, Neandertal, that is the metrics library. And you have ClosureCL. So can you give us some idea about what each of these libraries do and what are they used for? Because I see one of it as a much more lower level library than the other one.
00:13:03
Speaker
And there is a third library, ClosureCuda. Yeah, this is for NVIDIA, right? So, what is the purpose of all this, basically? When you develop such kind of software, like numerical software, and you need access to all these low-level facilities, the point is that you need to
00:13:33
Speaker
Basically, the way that you program it is similar to Clojure, if you understand the map and reduce, basically you understand basic stuff of how this massively parallel computation is done.
00:13:49
Speaker
So Closure is a good approximation of what these libraries do. But what Closure cannot do is cannot access real hardware stuff. So it cannot tell the GPU what to do because it doesn't have any connection to it. So basically, let's start from Neanderthal because it's probably something that most users would try first. Yeah.
00:14:16
Speaker
So what Neanderthal enables you is to create some structured data like vectors and matrices. So basically blocks of data that you want to compute and then call some standard functions that are optimized for running on different kinds of hardware. So you as a user have to be aware where it computes and what
00:14:47
Speaker
does it need to compute, but you don't have to do it yourself. Neanderthal would do this for you. You just have to call it like matrix multiplication of, or, or, um, matrix factoring or like solving systems of linear equations or something like that. Also what Neanderthal does it, it says, okay, you can do your computation either on the CPU, like your processor of your computer or laptop or desktop or whatever you have.
00:15:18
Speaker
Or you can send it to the GPU for computations. Now, GPU can be AMD or Nvidia or something else, but probably Nvidia or maybe AMD. So Nvidia has its own environment, which is CUDA, right? So basically,
00:15:41
Speaker
you call Nvidia's CUDA driver and says, okay, do this for me, like send this kernel. Kernel is a small chunk of code that computes in parallel. So, Clojure CUDA enables you to write your own kernels and call them on your GPU on a really low level. So, for Neanderthal, you don't need to use Clojure CUDA
00:16:11
Speaker
Cl. Neanderthal does this for you. You just have to say, okay, I need this on my CUDA GPU. But it will get you only so far because most of us write some custom software, like 100% custom or like 19% custom. But each of us needs some custom stuff. So basically on Neanderthal is not enough.
00:16:39
Speaker
It's good

Deep Learning and Numerical Computation in Clojure

00:16:40
Speaker
for start, but if you really want to use some great software and that other people don't have and like be better than competition, typically you will need to write some custom algorithms. So basically Closure CUDA and Closure CL enable you to write your own low level code that runs on the GPU by using Closure. So you write small chunks of C, like really small kernels of C.
00:17:09
Speaker
All management is enclosure. Everything is host is in management and calling these kernels and everything else setting up memory is enclosure. And another benefit is that you don't have to compile anything. So basically you just run, uh, you fire up your REPL closure, REPL, and from the REPL, you have access to Neanderthal, you have access to closure CUDA, you have access to closure CL.
00:17:35
Speaker
You just provide those strings or files with kernels, provide some closure code that manages it, and you just can experiment in REPL. So that's the main benefit.
00:17:51
Speaker
OK, so how do you see the role of these libraries in the larger ecosystem? Because these days, I'm pretty sure you know about Cloud text as well, right? The closure machine learning, deep learning library. So you mean Cortex or Cortex? Yeah, Cortex. Yeah, Cortex, right? So how do you see using these libraries with computations that are related to deep learning stuff? Is there any relation between them, or do you think it's a person? Yeah.
00:18:21
Speaker
I have a look of Cortex, but basically I'm not so interested in deep learning because deep learning is really popular now and everyone is interested in it, but deep learning
00:18:36
Speaker
It's more used for perception. Like if you have pictures or sounds or such kind of information and you need to process it like classified or do some recognizing there or so. But I'm more interested in the methods for analyzing data. For example, typical closure company will probably
00:19:04
Speaker
collect data about users, how they browse some products, or how they do some transactions, or how they do stuff. So I'm more interested in data analysis methods, like biological stuff, biasing stuff. So I'm not so well informed about deep learning. Of course, I know the basics.
00:19:33
Speaker
I haven't tried Cortex. Now, how would you use Neanderthal with such libraries? How does it fit? Well, Neanderthal is...
00:19:43
Speaker
a more general library. So basically you would use it in machine learning or in physics or any other area that needs numerical computations. And of course you can build your own deep learning stuff on top of it. Maybe I will provide some libraries in the future like based on QDNN. Cortex is like specialized matrix, not matrix but the deep learning library.
00:20:12
Speaker
Yeah. So basically now you probably cannot, I mean, you can, you can transfer data from, from Cortex to Neanderthal and vice versa, of course. But there is no, some, some deeper integration now because Cortex, uh, used, I think the, uh, Java, uh, Andy for J for accessing, for accessing with a QDNN. So basically they, they, they, they use the Java interrupt for that. Yeah. Right.
00:20:41
Speaker
Yeah, I think maybe it's more interesting to some extent to talk about things we can do with it rather than things we can't do. I noticed one of the things I was looking at your blog actually and I noticed that in March you released 0.9 of Neanderthal
00:21:02
Speaker
And you said in that one that your notion of what it needs to do high performance in closure is beginning to take shape. Maybe we can talk a little bit about that because you said that you started to use Intel's math kernel library.
00:21:25
Speaker
And that seems like an interesting concept. You moved from an open source library to this closed source library, but you say it's a good, you know, an improvement. What's the story there? Well, I've noticed, I mean, I noticed it was obvious that many closure users have had trouble installing Atlas.
00:21:50
Speaker
Right. Because the point with Atlas is it's built in such a way that you have to, you have to build it on your own machine to optimize it, to allow it to choose the best performing kernel for that particular hardware. Maybe you can just back up a little bit about and explain what Atlas is a little bit. Atlas is an open source matrix computation library.
00:22:16
Speaker
So basically, if you're aware that BLAS and LAPAC are standards for numerical matrix computations, like really low level stuff, really, really low level stuff. This is a linear algebra library. Yes.
00:22:36
Speaker
basically BLAS is for primitive operations and then LAPAC builds on top of it for solvers and such stuff. So Atlas is an open source library for that. So it's quite a good library and it's open source. So it's really a good choice. But the point is it's built in such a way that you have to build it on your own machine. And of course you can use one
00:23:03
Speaker
the builds provided by your Linux distributions, for example. But in that case, some distributions don't build it properly, some distributions package it in such a way that it's not really always obvious how to use it. So any

Data Handling and Performance in Neanderthal

00:23:20
Speaker
competent Linux user wouldn't have a problem
00:23:24
Speaker
building it because it's like some make script with all the tools. Basically, it's not something really complicated, but most closure users don't have experience with that.
00:23:39
Speaker
So they have kind of, I don't, I can't say, yeah, it's not a fear, but they're not really confident trying it out. So I realized, one thing is I realize it's really a problem. It's not a problem for me, but it's a problem for a lot of users. So of course I like to help them also. But another thing that is also quite important is that
00:24:07
Speaker
Intel MKL is even more performance than Atlas.
00:24:15
Speaker
And what is even more important is that it supports some really important stuff that Atlas doesn't have, such for example, sparse matrices. And the point with sparse matrices is that the only really good open source implementation is GPL. So it doesn't fit into the closure ecosystem.
00:24:39
Speaker
So basically, switch to MKL enables two things, easier installation and distribution later, and some features that are not possible with Atlas.
00:24:55
Speaker
Maybe just to help us out a little bit here, I don't know. I know that AMD do things on top of like they conform to a lot of Intel standards. Is this something which is only specific for Intel or does it also run AMD chips? No, it runs on AMD processors also. But Intel says that they say it's optimized for Intel processors.
00:25:25
Speaker
probably it runs better on Intel processors than on AMD's, but I did some search on the internet and people claim that even on AMD processors, Sam Kyle is the most performant library. So I think the only downside is that it's not open source, but it is free to distribute. So it is not a free software and it's closed source.
00:25:51
Speaker
But you can distribute it with your software for free. And it was since the last year. So before that it required license for such things, but now it's free to distribute.
00:26:09
Speaker
So basically you've got like a closure library with this Intel core that you can use for this, let's say linear, classic linear algebra problems. So in terms of like use cases, have you worked up any use cases like the traveling salesman type use cases or optimization type things? Well, the thing is,
00:26:33
Speaker
I use it for Bayesian computations. So it's my use case since it's quite demanding and I use it a lot. So it's helped me tremendously. But maybe to wrap it up, how it can be useful for closure programmers. So that I think most people would be interested, not interested in the details so much, but how can it help me or what can I do with it? So basically,
00:27:03
Speaker
There are a few things. First, it gives you access to all hardware resources for numerical computations on your machine. So CPU and GPU to the full speed from closure. So that's the first thing. Another thing is it gives you an easy connection to the world outside JVM.
00:27:30
Speaker
So I made it in such a way that there is practically no coping costs to transfer your data outside of the JVM. So for example, if you have some C library, maybe that library doesn't have anything with linear algebra, it just needs some vectorized data to be sent to it. So do you mean to say that this, for example, this can be used in tandem with something like TensorFlow?
00:28:00
Speaker
to the extent that TensorFlow can receive primitive arrays. But I think TensorFlow is a C++ software, so it's a more complicated story. Well, how about things like computer vision, that kind of stuff?
00:28:18
Speaker
I suppose, yeah, the point is it doesn't care. So most of the, there are two kinds of like these high performance software that you'll find. It's C++ stuff and C stuff. C stuff is usually written in a way that it can be accessed from whatever environment you have. So you just need to provide row pointers to row arrays.
00:28:48
Speaker
that confirm to that software's view of how it will use that array.
00:28:54
Speaker
That's how MKL works and Atlas and such blast stuff. So they say, okay, give me a pointer to the array and give me some number, some strides or dimensions or so, and give me the pointer to the data. But C++ software often requires some data structures that are typical to that C++ software. So it's not really that easy to connect from Java.
00:29:22
Speaker
So Neanderthal can give you a free access to the C software, but not to the C++, it depends on how it's structured.
00:29:35
Speaker
But main, I think main foreclosure programmers, it's not so important, I think, because they usually are not so interested in writing low-level C stuff and compiling into multiple platforms. So they usually, I think, would like to say, okay, I have my REPL. Can I write high-performance software

Neanderthal's Role in Hardware Acceleration

00:30:00
Speaker
from the REPL, from the Clojure environment?
00:30:04
Speaker
So the idea that I want to enable with Neanderthal is that you don't need TensorFlow. You don't need, I don't know, OpenCV. I mean, obviously you need it because there is so much functionality there. But in ideal case, it would be, okay, my team has this wonderful idea. We have the algorithm.
00:30:30
Speaker
know that algorithm can compute some useful stuff. Now we have to write it. Can we write it from closure? And don't worry about multi-platform build and C++ and C and everything else. Can we write it from closure and have a closure program that does it at full speed? So that's the main idea of Neanderthal. So I suppose if you need to use TensorFlow, you would need TensorFlow's
00:31:00
Speaker
I don't know, Java, Interop libraries or so. But in Yandertal, the primitives or the closure data structures themselves, right? Or do you introduce different kind of primitives? Not even primitives, like the data structures that you use. If you say vectors, are they closure vectors, closure persistent vectors? No. Okay. Because that doesn't make any sense.
00:31:24
Speaker
Okay. Why doesn't, for example, if you, if you need high performance, uh, performance code and you use closure vectors, closure vectors are like, um,
00:31:41
Speaker
a bunch of references in Java Virtual Machine. So each number is not a primitive number. It's an object that wraps a primitive number and it's scattered all over memory. So you don't have a guarantee that cache memory will be used in an optimal way. So what does it mean? It means that
00:32:08
Speaker
If you compute like if you have data about thousand workers and then they schedule and their managers and then what they do during the day and you have to like write the software that provide like an information management software so it's quite fast for that kind of stuff.
00:32:32
Speaker
But most of the machine learning or linear algebra, these numerical algorithms, their complexity is not linear. It's usually quadratic complexity or like O of n to the power of 3 or even more.
00:32:49
Speaker
So basically, you need much more speed. You need to use cache memory in an optimal way. You need to use all these hardware intels, pipelining AVX instructions or whatever. So you cannot close your vectors and Java linked lists and Java array lists and everything else.
00:33:13
Speaker
it is really a bad tool for such kind of task. So what Neanderthal does, it practically maintains its own primitive memory, but it enables you to look at it as a closure object.
00:33:34
Speaker
let's say okay so you don't have to worry about low level stop that much i functions that that work on closure vectors for example they work on Neanderthal vectors as well uh excuse me i didn't understand that
00:33:50
Speaker
So all the functions that work on, so because closure subtraction is a seek sequence abstraction, right? The seek thing. So all the functions that we have that work on closure data structures, for example, closure vectors, they work on Neanderthal vectors as well. I mean, I'm thinking about like the user of this library, then what should I be aware of? So I suppose you mean like math. Yeah, yeah, yeah, yeah.
00:34:20
Speaker
especially lazy sequences, a bad abstraction for high performance numerical stuff. So that's the point. Forget about it. Okay. So, so we cannot use the same kind of things, but yeah. So, but map as a concept and reduce as a concept is a good, a good abstraction, even for numerical code.
00:34:44
Speaker
But unfortunately, Closure Maps and Reduces are built in such a way to be not quite a good match for that. So basically, if you want to write a code with Maps and Reduces, you would use Fluokitten. It's one of my libraries. That's what I was getting at, slowly. Oh, yeah. OK. So you would use Fluokitten, which offers alternatives to MapReduce that are
00:35:13
Speaker
Not only Neanderthal stuff, it's really a library that I wrote, I had written before Neanderthal, like four years ago. So it's like a category theory inspired software library that enables you these kinds of abstractions like fmap, fredges, fold,
00:35:39
Speaker
operation or so that are polymorphic. So if you call them on closure vectors, they will work with optimal algorithm for closure vectors. If you call them on the undertold data structure, they will call the appropriate optimized algorithm.
00:35:59
Speaker
So that's one of the combinations that you can use, but more, most often, I guess my preferred combination is to use Neanderthal and matrix operations. If they are appropriate for the algorithm, if not to write my own with closure CL or closure CUDA, that that's the best. I think some, yeah, right.
00:36:23
Speaker
Yeah, quick question is, how does, whether, I mean, maybe it's just not possible, but have you thought about how these kinds of things could fit into the transducer framework or the transducer model?
00:36:39
Speaker
The thing is,

Learning Resources for Machine Learning

00:36:40
Speaker
I'm not sure whether it's even needed. Could you give me an example? What would you do with transducer and GPU function or Neanderthal vectorized function or so?
00:37:01
Speaker
Well, let's say you wanted to put some data into a channel or something like that, and then you want to pull it off or put something onto a channel, but you want to have the results computed on the GPU or a filter or a map on the GPU before you put it into a channel for further consumption downstream, then it's quite nice to have
00:37:28
Speaker
that in a transducer because then you could use it on a on a channel or or on some other you know on another type of input and uh that's a theory anyway i'm just again maybe i'm just uh in bullshit land here no no you're not you're not but uh what would be the shape of that data like what the size of that data when you say data it's like 10 10 numbers or 10 million numbers or 10 billion numbers so
00:37:58
Speaker
Well, I think the idea is that it could be whatever you want. But then this map reduce operation that you're talking about would do its work before it goes onto the channel. So the idea about, to me anyway, the idea about transducers is that you provide the algorithms, you provide the functions that do the operations on the data in a way which is orthogonal to the output channel.
00:38:23
Speaker
Mm-hmm. Well, in that way, maybe, but I'm also maybe in bullshit land now. So the point is that may be an orthogonal to Neanderthal. The point is you would typically
00:38:43
Speaker
So maybe now it's time to just make a short break, because I think we got into the quicksand now. Not with the channels, but I think that our conversation got a bit more stretched into details, so people would be bored. Maybe we could cut out this first half an hour. So maybe we should speed this up. We've got to hope not. So the point is,
00:39:13
Speaker
Let's say we will get back to this, but maybe we should look at it from a different perspective. Let's say you're a Clojure programmer, and I'm a Clojure programmer, so you're not a statistician. You don't have a PhD in deep learning or whatever, so you know your programming
00:39:34
Speaker
Perhaps you'll have some math courses in college that you forgot, more or less. Maybe you remember some linear algebra, or I'll remember some linear algebra to the extent that there are such things as matrices, but you forgot even to what is eigenvalue or such things.
00:39:57
Speaker
Eigenvector or yeah, or so

Introduction to Bayadera for Bayesian Computations

00:40:00
Speaker
you you recognize the stuff but you forgot the details and yeah, so you don't have really idea how how even
00:40:10
Speaker
to use this to any benefit. But machine learning is such a popular topic. So you recognize that you would love to learn it, at least some methods. You feel that you can solve some problems with it. Maybe your company have been gathering data for the last five years.
00:40:34
Speaker
And now, and hoping that some data will answer some useful questions, but that they just is not coming. So all of you are creating some user groups and trying to learn those machine learning techniques. And you start reading blogs, looking at demos, trying some software, and
00:41:04
Speaker
You understand what those demos do, but you still don't understand how they do this.
00:41:10
Speaker
Okay, then they have these pictures and they teach their software to recognize like cows and hot dogs and not hot dogs, such stuff. But now my company has data about visa transactions and like vegetarian food sales in Amsterdam or so, right?
00:41:36
Speaker
So how can we use this perfect data? So you realize that you need to dig deeper into machine learning theory, let's say. So you browse for books, and then you pick some popular textbooks, right?
00:41:59
Speaker
So you're following me? You're in the same situations or not? Definitely, definitely. We haven't picked the books yet, but go for it. But you're on the Amazon and trying to build a wish list. I'm looking right now, yeah. Yeah, okay. I have just started ISLR now, so I'm in the next step.
00:42:19
Speaker
Yeah. So, so, so basically you recognize that it's important, but you cannot, you still, well, it's an alien land. And when you, when you open these textbooks, you won't find any loops, any functions, any transducers or reducers or Java release or anything. What you'll find there, what you'll find there is a bunch of,
00:42:47
Speaker
mathematical symbols and most of them are matrix notations. So they will say okay you have this vector and then you multiply this vector with the transpose of that other vector and you get a matrix and then you solve this matrix and find some other matrix and then you plug this matrix into some other computation that we describe only in matrix notation.
00:43:16
Speaker
And only to a bit we don't get into implementation details

Future Plans for Clojure Libraries

00:43:20
Speaker
and then magically this is the solution. So basically they give you some some nuts and bolts and they give you the details. But only in on the level of linear algebra so if you want to implement it there are two possibilities.
00:43:37
Speaker
Either you understand this stuff that they talk about really perfectly, and you're a programming wizard, and you write your own loops, and you basically re-implement this matrix implementation from scratch. Of course, in reality, you won't get very far with it. Or you learn some matrix software library,
00:44:03
Speaker
that will at least enable you to do some percentage of this stuff directly. And the rest you will have to write yourself or to use a library that some other people wrote. So that's the point, for example, of Neanderthal. You would love to get into the area, but all the literature is really
00:44:33
Speaker
either geared towards complete theory or applied theory, but applied to the level of matrix computations. So you need to know how to run and use matrix computations. And Neanderthal enables you like to recognize when you see some
00:44:53
Speaker
combination of matrices in a formula in the book, you will recognize, this is the matrix multiplication and this is linear algebra, like solving the system of linear equations. And this is finding eigenvalues and this is finding eigenvectors. And then these eigenvectors are plugged into this algorithm or so.
00:45:17
Speaker
You will still have to write something yourself, but only some part of it. And the point is, even those parts that you will have to write yourself, you would need speed. And you will get this speed through Closure CUDA and Closure CL.
00:45:35
Speaker
most of the time. That's the point. You also have a statistics library, right? The Bayadera? Yeah, Bayadera. And if you're curious about the name, why it's called Bayadera, some people in Germany recognize it because they went to the Adriatic Sea. In ex-Yugoslavia, there was a cookie, really good cookie.
00:46:03
Speaker
some nougat cookie called Bayadera. It's really a popular cookie which is called Bayadera. So it was nice and it's also have something similar to bias. So Bayadera is quite interesting to me and I think it's quite interesting given to the wider audience because
00:46:29
Speaker
Once, just the source repository on GitHub without any documentation got into the front page of Hacker News and it

Wrapping Up and Euroclosure Discussion

00:46:40
Speaker
got some hundreds of stars even without anything else. What's good with the bio data?
00:46:47
Speaker
It's much faster than state-of-the-art tools. So state-of-the-art is basically Stan. It's a C++ Bayesian inference engine that has an R like front-end. So basically, you will write some model in Stan's proprietary language and call it from R to do some Bayesian inference for you.
00:47:18
Speaker
Bia data enables you to do similar things from closure, but your computation would run on the GPU and we'll run instead of let's say five minutes, it will run in 300 milliseconds. And I translated a year ago, I translated most of the examples from the best textbook in this area, like doing Bayesian data analysis.
00:47:48
Speaker
which is really a great resource for learning Bayesian and probabilistic stuff. It's written by a biologist with real examples, not some hello world-ish stuff, but real examples from real scientific hypothesis testing or so. So really a great book if you want to go into this area. I suppose for most Clojure programmers it wouldn't be
00:48:15
Speaker
that interesting because they haven't decided to go Bayesian yet, I suppose. But if you want to go into machine learning, it's something you have to learn in statistics and probabilities and probabilistic way of thinking. And so you will come to Bayadero sooner or later if you want to create your own stuff.
00:48:41
Speaker
So maybe you could write an algorithm about what the probability of a closure program coming to buy it is definitely definitely. So that could be a good example. It's an easy problem. It's an easy problem. Awesome. You don't need GPU for that probably.
00:49:00
Speaker
Yeah, we don't even need GPU. But the point is, the point is by that is good example of how to use Neanderthal and enclosure CL and closure CUDA. So if you need some, for example, if you want to start and like go some through some tutorials and then wonder, okay, but what
00:49:19
Speaker
How do I do this exotic? How do I transfer this data? And then how do I call some complicated kernel on the GPU? You can browse Bioderra source and see how I did it with Bioderra and get some idea how you would do it for your own algorithm.
00:49:36
Speaker
So for the people who want to get into this GPU stuff, so if I understand correctly the first point of entry would be the Neanderthal. Well, if you want to learn only GPU programming as such, like low-level GPU programming, the point of entry would be ClosureCL or ClosureCuda. And I place

Cryptocurrency Banter and Conclusion

00:50:00
Speaker
a great importance of helping people to learn it. That's why for all these libraries I provide, I usually pick up a textbook.
00:50:12
Speaker
And I choose some good one, like not really the simplest one, not really the like only for PhDs, but I choose some good textbook that would get you from beginner to intermediate, let's say. And then I write examples, usually in form of tests.
00:50:36
Speaker
So I write the examples of how to use this library. And usually these examples are translation from the book examples. So basically, for example, if you want to learn ClosureCL and do some GPU computing, you would pick up ClosureCL in action book from Manning.
00:50:56
Speaker
It's really good from Matt's Carpino, I think is the author. It's really great book in the tradition of all the Manning books that is really good match between getting you started and giving you really in action stuff, not only just some hypothetical examples.
00:51:16
Speaker
And math gives a good batch of examples that get you from the simplest hellworld ones to some more complex ones. And most of these examples, at least those that make sense with ClosureCL, are in ClosureCL's test folder. So I haven't written like blog posts with narration.
00:51:42
Speaker
But you can follow the book and look at the example's enclosure and see, aha, okay, this is how it's done in the book. Okay, how can I do this enclosure? So you will get the explanation from the book, and then you will get the code from the tests. Okay, so what's your future plans for these things? So you have CUDA, which is for NVIDIA things, and you have obviously support for AMD GPUs as well. So where are you taking these libraries to?
00:52:13
Speaker
In the, let's say like this, the first steps now will be to give some more functionality to Neanderthal. I don't need it immediately, but I need these are important to show that the library is more or less featureful and complete and that people can rely on it.
00:52:37
Speaker
So during July and maybe August, I will implement lots of new data structures for sparse matrices, like banded matrices, symmetrical matrices. Some of it I already implemented, haven't checked to the GitHub yet.
00:52:57
Speaker
but it's on my machine. I will not, probably I will not put proper sparse matrices, this is from the next phase, but I will try to probably cover what BLAST and LAPAC covers. That's the first part. The next steps would be to bring Biodera to the level that can be used by, let's say, average closure programmers, because now,
00:53:25
Speaker
I wrote Bayadera with ClosureCL and optimized it for the AMD GPUs. And most people don't have that. Most people usually have CUDA GPUs.
00:53:39
Speaker
So I will probably also write port whatever is there to code also. I will try to polish it a bit more, write documentation and start writing some more tutorials. So I probably try to keep some pace of about maybe
00:54:01
Speaker
one or two tutorials a month, sometimes even three or four, but let's say two a month and try to help closure programmers like get up on their feet with this stuff. So I hope this basic linear algebra stuff
00:54:22
Speaker
was received quite well. I think it has something like 20,000 readers in the last month. Thanks to all for getting to the front page of the Hacker News, of course. But it was received quite well. So I can see that people recognize that this is useful for them. So I'll probably keep writing those stuff.
00:54:49
Speaker
and i'll keep like getting more functionality to it and i hope that because maybe i will introduce more more bugs because i i recognize that most people most people are like quiet before because
00:55:05
Speaker
There are many people that use these libraries and when I ask them, okay, I haven't heard from you. No, it's working great. So I probably need to introduce more bugs to get people to complain more and just show that there is activity.
00:55:28
Speaker
Maybe what you can do is every time they do a matrix computation, it's going to post a question to Stack Overflow automatically and then it answers. Maybe I'll try to convince Vijay to try these libraries so he can complain.
00:55:51
Speaker
Yes, I'll start doing that. But by the way, are you going to be at Euroclosure? Excuse me, I have some problems in connectivity. Are you going to be at Euroclosure? Oh, unfortunately not. Oh, no. I mean, my talk wasn't accepted. But fortunately, two talks would have to say some stuff about Neanderthal and ClojureCL.
00:56:18
Speaker
So maybe they pick another talks and I think it's better because instead of me just saying how Neanderthal is great, other people will say how Neanderthal is great. So I think it's better for the visibility and for convincing people to try to use this stuff. So basically it's not rocket science. You can use it. Yeah.
00:56:47
Speaker
If you're a Closure developer, you are ready to use Neanderthal. So I think we are almost at the end of the, almost at one hour. So I think we can wrap it up. Just to, I think one, of course, we are going to be at your Closure, Ray is going to be there and Ray will be there and I'll be there as well.
00:57:07
Speaker
And first, I'd like to give a quick shout out to Cognitec because they gave us one free ticket to attend. So thanks to Cognitec, guys, and Lin and Alex Miller, of course. And I think now we can roll the credits. And Dragon, thanks a lot for joining us and giving us the insight into what you're building and the high performance side of Closure. And hopefully, we'll meet you at your closure. You'll be attending there, right?
00:57:35
Speaker
No, not even attending. As I said, I don't have a talk, so basically, I'm interested in listening, but my funding can only cover if I have a talk. You know how at university you have to have a talk to be able to
00:58:00
Speaker
Exactly. So, unfortunately, I won't be able to attend. But we'll go there and then spread the word. Yeah, thank you very much. I can see I put you to sleep. No, no, no, no. Not at all, no. It's definitely some challenging stuff, though. I think, like you say, the
00:58:23
Speaker
the ability for us to get closure developers onto the kind of the linear algebra way is still coming. But I think you're doing great work in this area. Without this kind of stuff, then we've got no chance. Because if we just rely upon the JVM libraries or integration with Python or R, then
00:58:49
Speaker
people won't do interesting things in the closure world. And I think this is the real key important thing that you're bringing here. Yeah. Yeah. The key thing is that, that, uh, enable people to create a stuff enclosure that is not available on other platforms. So basically not to, not to just follow what is out there, but to, to try to, to create some killer app. Yeah. Excellent stuff. Yeah. Thanks a lot. It's the time to roll the credits. Yeah. Thank you for inviting me.
00:59:19
Speaker
Yeah, thank you for joining us. And as I said, we'll try to spread the word around your closure. And hopefully, we'll see you again. And I'll certainly give it a try to see what I can do, especially on the CUDA thing now. Because as you know, on a Mac, I have an NVIDIA stuff. So I'll give it a try, and I'll poke you. And maybe I'll be the guy who will be posting stupid questions on Stack Overflow. Yeah, just complain as much as you like. I will try to help you.
00:59:50
Speaker
Thank you. Just before you go, we've got links to kind of your GitHub and all this kind of stuff. But you said you can recommend some books as well for people like me who are just getting into this stuff and good beginning material. So maybe Zai can take a few months over the summer and then come back in the autumn with some questions. What would you like to learn?
01:00:16
Speaker
Well, you know, you mentioned you have some, uh, some books about, uh, about, uh, the machine learning Bayesian stuff. Okay. So for Bayesian stuff, I would recommend, uh, doing Bayesian data analysis second edition by John Krushke. Uh, also for, for like old school machine learning overview, there are two good books. One is, uh,
01:00:43
Speaker
pattern recognition and machine learning from Christopher Bishop, who is I think our researcher at Microsoft now or something like that. He's also a professor in USA. And there is also by I think Kevin Murphy, Probabilistic Machine Learning by MIT Press. These are three books that are like covers you with data analysis and also machine learning stuff.
01:01:10
Speaker
But before that, I would definitely recommend some linear algebra refreshment. And I recommended this book called, I think, linear algebra, something, something. You have it on my blog. Because it was a really good choice because the most important stuff when you pick
01:01:37
Speaker
A linear algebra book is to pick one for engineers, not one for mathematicians. So this one is for engineers and it was like $1 on second hand on Amazon. But when I recommended it now, it's like $40 or something like that.
01:01:58
Speaker
So maybe I can choose another one. Can you recommend a cryptocurrency as well? But we'll hold this back before we release this program. Just tell us, is it Ethereum? Come on, tell us. No, I can recommend you, but I'm not a believer in cryptocurrencies and I don't know anything about that. What he's going to recommend is dragon coins. I recommend dog coin.
01:02:29
Speaker
I thought you're going to come up with your own cryptocurrency. That is going to be mined by Neanderthal secretly. Every time somebody uses the library, you get a coin. Yeah, like veg cryptocurrency.
01:02:45
Speaker
We call it a veg coin. But it's not only coin, it's like a veg eater or something like that for some buzzwordy computation on the internet. We call it vegium.
01:03:05
Speaker
Anyway, so on that bombshell. You heard it here first, folks. And instead of Satoshi Nakamoto, you will get Ray and Vijay. Exactly. And then we'll have to go into hiding pretty quickly. We are practically retired because of this podcast, by the way.
01:03:32
Speaker
We can't do anything else, that's what I mean. It's not that we can retract. Yeah, well, just mention like globalization, Trump, Putin and usual buzzwords and I think like listeners will skyrocket, like all the audience will like be in millions.
01:03:52
Speaker
I don't think people are appreciating that much. But anyway, so I think that's pretty much it for this week. And hopefully we will release this before the UB. I don't know the people who are listening to this one.
01:04:08
Speaker
Most probably you folks will be listening it before your closure and we will be there at your closure with some deafened stickers and Ray and I will be there as well. So please stop by and say hi and if you want to punch us, please punch Ray.
01:04:26
Speaker
Yeah, I'm going to be doing a little hopefully a little on session there as well with functional programming jokes. Yeah. But I've got to get a bit of love on the GitHub page. So I'll put a link to that in the show notes as well. So yeah, so the more reasons to come on punch Ray. So yeah, please, please Ray, please tell at least one joke about Neanderthals. So right. So evolution is programming. Yeah.
01:04:57
Speaker
So that's it for the show. And let's roll the credits. First of all, I would like to thank Dragon again. Thank you for joining the show and taking the time on a Sunday evening, spending the time with us and also going outside and then buying a headphone, especially for the podcast. That's a lot of commitment. That's a first. Exactly.
01:05:22
Speaker
And our sound engineer who is fixing all our voices, Mr. Walter Dallart. I don't know if I don't pronounce it properly. He said he's going to replace it with proper pronunciation. And our music is done by Pizzeri.
01:05:41
Speaker
So please go and listen to those music. The music tracks on SoundCloud, the links will be in our doobly-doo thingy or the show notes somewhere at deafen.audio. So that's it from us for this week. Ray, any closing thoughts?
01:06:00
Speaker
No, I think it was just really good. Thanks again, Dragan. It was really an amazing little dive, even though it was in warm waters into the pool of linear algebra and matrix maths and all this kind of stuff. So thanks again for doing all the good work. And hopefully we can bring a few more people into the library because of this podcast. And I'll definitely try and make a few jokes about it. Yeah. Thanks very much.
01:06:29
Speaker
Yeah, thanks for calling me and inviting me. Bye bye. Bye.
01:07:06
Speaker
How do you stop this? It's a red button.