Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
GPUs, from Simulation to Encryption (with Agnès Leroy) image

GPUs, from Simulation to Encryption (with Agnès Leroy)

Developer Voices
Avatar
3k Plays3 months ago

This week we take a look at what you can do with a GPU when you get away from just using it to draw polygons. Agnès Leroy has spent most of her career programming, optimizing and converting programs to run on that oh-so-curious piece of specialised processing hardware, and we go through all the places that journey has taken her. From simulating the flow of fluids in hydroelectric powerstations, to figuring out how to make a new approach to encryption run fast enough to make it practical…

Become a Developer Voices supporter! https://patreon.com/DeveloperVoices

A Fully Homomorphic Encryption Scheme (pdf): https://crypto.stanford.edu/craig/craig-thesis.pdf

CUDA platform: https://developer.nvidia.com/cuda-zone

Rust-CUDA: https://github.com/Rust-GPU/Rust-CUDA

And in case anyone was wondering, A List of Hydroelectric Power Stations in France: https://en.wikipedia.org/wiki/Category:Hydroelectric_power_stations_in_France

Kris on Mastodon: http://mastodon.social/@krisajenkins

Kris on LinkedIn: https://www.linkedin.com/in/krisjenkins/

Kris on Twitter: https://twitter.com/krisajenkins

Recommended
Transcript

Introduction to GPU Programming

00:00:00
Speaker
GPU programming, it's that strange cousin to the more familiar CPU. It can't quite do as much, it doesn't feel quite as general, but what a GPU does, it does with massive parallelism and tremendous speed. And if you're good at programming it, if you know the constraints and the tricks, if you're not afraid to think about things like optimising register allocations, it's a world full of possibility.
00:00:26
Speaker
And it's a world I've not given enough credit to in the past. I have mostly associated GPU programming with graphics programming for obvious reasons.

Exploring Parallel Programming with GPUs

00:00:35
Speaker
But that's selling it short. The GPU is just a playground for all kinds of parallel programming. And my guest this week is here to show us quite how varied and novel a career close to the GPU can be.

Anya Soleroy on GPU-Optimized Encryption Algorithms

00:00:48
Speaker
I'm joined this week by Anya Soleroy, and she spent most of her career programming physical simulations for everything from dentist drill bits spinning in your mouth to hydroelectric power stations. And there are plenty of juicy details we get into just going through all that, as we work our way towards her current GPU job, which is optimizing encryption algorithms.
00:01:14
Speaker
And if you think you already know plenty about encryption, here's a question for you. Is it possible for me to give you an encrypted file, and have you processed the contents without decrypting it? Processing without seeing inside, that shouldn't be possible, should it?
00:01:31
Speaker
But it is, it actually is secure processing of encrypted data. And about 30 minutes into this podcast, you're going to hear me trying to wrap my head around that idea in real time. It is a mind bending idea. But once it s sinks in, it speaks to some terrific possibilities for things like keeping control of your privacy in a cloud based world.
00:01:55
Speaker
To get there, it's all going to depend on someone making that processing fast enough. So, we've got a lot to get through. We've got Agnus' entire career to get through, so we best get started. I'm your host, Chris

Meet the Hosts: Chris and Agnès

00:02:07
Speaker
Jenkins. This is Developer Voices, and today's voice is Agnus Leroy.
00:02:24
Speaker
Joining me today is Agnès Leroy. Agnès, how are you? Very good. Thank you very much for having me today. Oh, pleasure. we've ah we've We've both just got back from a holiday vacation, so we should be very relaxed and hopefully still technically sharp.
00:02:39
Speaker
Yes, definitely. I hope. I've got to admit, I'm going to do this on hard mode, because not only am I asking you to talk tech when you're freshly back from holiday, I want to start right back at the start of your career.

Agnès’ Early Career and Education

00:02:53
Speaker
So you're going to have to go into the mental archives. Yeah, because im yeah I think when we when we talked about like how we wanted to do this, I think we both realised there was so much to cover, it was maybe easier to go chronologically. Yeah, going on the timeline. so yeah So actually, I started out as a student in mechanics in a French engineering school where it was a lot about math already, and a lot about numerical simulation. So I was doing this and at some point I got the opportunity to go for two years to study Brazil. And I went there
00:03:28
Speaker
And this is where I met. So I was in the department working on metallurgy. So yeah, just ah studying different types of metal, and especially one that was hyper elastic metal and also had some strange properties that some the dentists would use to clean root canals. And so yeah.
00:03:53
Speaker
this is this is my This was my first so experience professional experience, so I worked there as a researcher trying to model these ah instruments that would and unfortunately sometimes break inside the root canal.
00:04:11
Speaker
I'm going to have to go through the warning on this episode. Yeah, but yeah but actually it's kind of reconstituting me with the dentist. Like after this, I was much more at going to the dentist and how much less scared. So, yeah, so I was building a mathematical model to describe the motion of those instruments and how the material would behave.
00:04:39
Speaker
while inside the wood car. So this is like, you've got specialized tooling made of specialized metals that's spinning at a thousand revolutions per minute. yeah Drilling into, let's call it calcium and keep it detached from the physical experience. Yeah. And how do you do that? What's the programming that goes into that?
00:05:03
Speaker
So at that stage, the only programming I was doing is like using something like Mathematica to compute ah to make integral calculations. So it was a lot of integral calculations. I had to just solve this math problem with lots of integrals and lots of variables. And it was super complicated. I could not solve it by hand. So I was using a tool, like it was named Mapo at the time.
00:05:29
Speaker
to to do that so it was my really first experience with programming it was really using something that doesn't match. Okay tell me more about that because um that's that's a programming language i've never touched never used and i can't quite imagine something dedicated to solving integrals.
00:05:49
Speaker
What's it like? It's not dedicated to solving integrals. It's more like a generic math software where you can write some equation and it solves it for you. So you you write, ah you want to find the roots of a polynomial, you write your polynomial and you ask it to find the roots, it will do it for you. so it's ah But you can also give it an integral and you can try to to simplify it, to compute the the result of the integral and so on and so forth. So there are many things you can do with this kind of tool.
00:06:17
Speaker
is it Does it also do like theorem proving? Is it that kind of area? I don't think it can do theorem proving, but it was a long time ago, so maybe things changed. but i was I think there are like two branches of maths. We have numbers or we don't have numbers. Yeah, exactly. It's like the more the theoretical part of the math. yeah yeah okay Okay, so from dentistry,
00:06:44
Speaker
I know eventually I ah want to get to this point where you start delving straight into the GPU. What happens from there? yeah so Actually, classical mechanics, it's all the same. The way you describe a solid, a metal, a plastic, ah water, air, the equations are the same. It's just the forces the way we describe forces, so the way pressure gets distributed and so on and so forth,
00:07:13
Speaker
changes depending on the material. okay And also, if you you can sometimes simplify problems by or restricting the kinematics, like the way things move. Sometimes you they can't move in every direction, so you can restrict that. But overall, the theory behind it is all the same for water or metal or rock or anything.

PhD and Work in Fluid Dynamics

00:07:35
Speaker
This is like when I was at school and the the physics teacher said, pretend this is an ideal solid, this kind of thing.
00:07:43
Speaker
Yeah, so when you say pretend it's an ideal solid, it means like write your forces in a certain way. yeah and And just make these specific assumptions ah two to just ah simplify your equations. But the the theory behind it is the same. Or or even for magneto hydrodynamics, like this kind of complicated thing that still the same kind of of street behind. So when I was working on on solids for the dentists, I had this opportunity to start a PhD in Paris area, ah working with my previous teacher in the free dynamics course, and to work on simulating water flows with a technique that's called smooth particle hydrodynamics.
00:08:34
Speaker
Okay. and And this technique is like, so it's the same kind of equations. It's just, as I said, the way you describe the behavior of the material changes. and And this actually changes a lot in terms of math and what you have to do later to do simulation. And in this, the PhD I was working on, we would describe the motion of the fluid with particles that move.
00:09:01
Speaker
And so for me, there was a strong link because I so I knew that the my PhD supervisor already, I knew he was extremely good. I knew he was like, I really wanted to work with him. So I accepted to start a PhD. It was a a French thing where you where you can do a PhD in a company. So I was working for the main electricity company of France. It's called EDF. Right. And and yeah, I work in I worked there for three years on simulating water flows.
00:09:31
Speaker
And as i joined my as I started my PhD, we could do a simulation of 100,000 particles in something like half a day or maybe a full day.
00:09:44
Speaker
Right. Okay. And it was ah like, it was a fourth round code that they had. And I realized this was not going to work out because just to make small experiments would take days. So in my team, there were already some people looking at GPU. So this was in 2012.

Using GPUs in Hydrodynamic Simulations

00:10:04
Speaker
So pretty early on because CUDA, I don't remember exactly when it started, but in 2012, it was still pretty early.
00:10:13
Speaker
or to use it in in another context than than video games. you know and so um the The people in that team, they started like looking at how we can use it we can how they could use those GPUs to to accelerate ah what we we're doing. And there was already a team, like one team in the world that was doing this, and they had developed an open source code with GPUs doing this kind of thing. So they took over that code, and they modified it to adapt specifically to the specific of the technique they had been developing in the the lab.
00:10:49
Speaker
And so I started working on it with them. So it was like, we had these big parties every Tuesday evening, where we would go and try to pull up the GPU. We like build a desktop, install everything on it. It was like, yeah, but a lot of fun.
00:11:08
Speaker
i So here's where we get into, let's start with my ignorance of how GPUs work and you can educate me, right? Because I think of GPUs as, okay, they're great for drawing lots of polygons. That's where they came from. And they're really great at doing lots and lots of simple, but parallel maths. Exactly. Right. Now you're giving me this mental image of imagining fluid dynamics as I'm, I'm picturing it as a big bucket of billiard balls, all of which are trying to jostle around.
00:11:45
Speaker
Yeah, totally. And every ball you have to compute at every time for every time frame during your simulation, you have to compute its interactions with its neighbors. That's that's the thing I was about to say, because that's where the 100,000 billiard balls moving. That sounds easy on a GPU. But colliding with each other, that sounds like the whole parallelization thing has just completely broken down.
00:12:12
Speaker
Yeah, you're right. So that's why it's not so easy. yeah so No, so each particle would interact with something in 3D like with 300 or 500 neighbors. And for each of those neighbors, you have to compute quite complex interactions, let's say, depending on your algorithm, like depending on the math you chose to to use to describe the simulation, it may be ah more or less parallel. like But yes you have to have some kind of synchronization there so it's not like it's not ideal in terms of acceleration on the GPU, but we were getting a much better performance than on CPUs.
00:12:51
Speaker
day and night so when i started using this at the beginning of my phd i think after one year i implemented matrix inversion on the so i had to in in my phd the technique i was trying to develop is to solve a matrix inversion to compute the interactions between neighbors right so So for each particle at each time frame, I would solve a matrix inversion. And so I actually I programmed it because it was not any matrix. I knew the shape exact shape of the matrix so I could have a dedicated algorithm for that. And so that's the first thing I ever implemented on on the GPU is this matrix inversion using ah an iterative solver. So it's pretty classical, actually, in terms of software engineering. It's it's pretty classical. like
00:13:38
Speaker
And I think it's something in in my career that comes ah that comes back is that in the end, I work on very different topics, but it's it can all be broken down to kind of classical computer science problems, you know, many people i have been working on and so on. Okay, I take me another level down because I'm trying to picture how this will actually work.
00:14:03
Speaker
ah um My knowledge of collision detection drops out at 1980s video games. In this case, they would not collide the particles. The equations that's that that you solve to describe the motion of the fluid would prevent them from colliding.
00:14:22
Speaker
Except it was not totally working, so you had to have some some extra terms there to to get over really prevent them from colliding too much. this because it will make your This could make your simulation blow up. like ah If the particles got too close together, at some point you would try to divide by zero or something like that. so right it would not work out but the particles would not you would not try to detect collisions between particles and a type of method you would really describe the motion of the particles with some equations and those equations you would make it so that they don't collide like it's ah the beauty of the math behind it but also it's very hard to achieve in practice because also of some numerical effects
00:15:09
Speaker
uh, that mix up with the physics when you do your simulation. So it's that's, that's what makes the thing really complicated. It's like how using floating points, for example, even in the well precision, introduce some errors in your simulation and introduce kind of dripped and how you can mitigate that and, and so on. So yeah, oh this is the this is into the whole um chaos theory butterfly flapping its wings in your fluid thing.
00:15:38
Speaker
Yeah, exactly. Yeah. So there is a famous example of, I don't remember, I'm i'm not so good at names, so I don't remember the name of the person. I'm sorry. But that person was trying to make a fluid simulation with another method. It was, I don't remember the date either, but it was pretty long ago. And he was trying to do this simulation of the fluid inside the pipe and the pipe would extend.
00:16:01
Speaker
And this configuration, the configuration was simulating. In his simulation, the fluid would never go the same way, like he would run it once and it was would go downstairs, the main flow, he would run it again and could go up upwards. And he was like, okay, I have a bug in my program, you know. And after some time, he realized it was actually a physical effect, like,
00:16:24
Speaker
the this specific configuration he was trying to simulate was not stable because of this butterfly effect you're talking about and the very tiny differences in in floating point arithmetic would make the flow go up or down and it was normal. You iterate that much and these infinitesimally small changes start to add up. yeah yeah And it has effects on the the whole and dynamics of your flow. It's not just like tiny effect, it's a major change on the on the shape of the flow. So yeah, free dynamics is complicated because of this, because of the nature of what you're trying to simulate.
00:17:06
Speaker
So what can you actually simulate? I mean, because i'm I'm thinking of like weather simulations, they become pretty much useless five days out, something like that. Yes. in In your situation, what can you genuinely simulate and predict?

Energy Companies and Fluid Dynamics Simulations

00:17:22
Speaker
Yeah, so fortunately, the viscosity of water is much higher than the one for air So the characteristic time for this chaotic problem is much longer. And usually, we would simulate things that occur within minutes or hours in my lab. So we had some ah we had some models that we would use to describe in a macroscopic way. So in a high-level way, the effects of turbulence. So turbulence is this phenomenon where
00:17:54
Speaker
and You have lots of bubbling in your flow. And we would have some models to describe it because this bubbling occurs at large scales and occurs at very small scales at the same time. And so we could not afford to have such a women high precision description like so many particles to describe the whole fluid. So we had some models to to describe it describe this effect of the turbulence. How does this, maybe I should step back a second. but
00:18:26
Speaker
You said you were doing this for EDF, the French power company. what why Why is it an electricity company doing fluid simulations? Every energy company is doing fluid simulations. Why? Because to produce energy, you have several choices. You can build a ah nuclear power plant, for example, in France, and this is a big boiler.
00:18:50
Speaker
Okay, yeah, a giant boiler basically. And it's the same with coal or anything. It's like you have this giant boiler, and you have secrets of water cooling down the core of the boiler. And those secrets, everything is flowing inside it. Even even you have a very complex interaction between the nuclear physics inside the core and the water. It's extremely complicated. I was not working on that. But there is a whole research department at EDF working on that. um But you also have interactions with the environment. And this is where I was working. So you have up potentially your
00:19:28
Speaker
posed by the sea and you have storms coming in, you can have, I was doing some tsunami simulation, um flood simulation in rivers, because the plants usually they are close to a water source. So whether it be a river or the sea, you have to to know what's going on around the plants. And if you were choose to youtube to use, a wind energy, it's still an air flow. And so air so simulating the air and water, it's the same equations as well. um and And yeah, so I think maybe the solar energy is the only one where you don't need it as much. But yeah, otherwise, everything you have like, and also was working a lot with um hydraulic engineering, so dam,
00:20:21
Speaker
How do you call this an image? I don't remember, but using dams and turbines to produce energy. Hydroelectricity. Yeah, exactly. What's the famous one in America? The Hoover Dam, right? That's the one I always think of when people mention hydroelectric power. Yeah. I have lots of French dams in mind, but nobody knows them. So yeah, I was also doing like turbine simulation, producing water, like different types of turbines.
00:20:53
Speaker
ah river or waterfalls. And actually we were also trying to do waterfalls simulations, which is extremely complex. yeah Yeah, but you have dams where the where when the flood comes, the water flows above the dam and folds, you know, like ah it's a huge waterfall basically. And we were trying to study the effect of that waterfall on the ground just below the dam. And so to do that, you have to really understand the physics of the um the waterfall itself. So we had also lots of people doing experiments, not just numerical simulation,
00:21:29
Speaker
But lots of experiments on environmental flows, including waterfalls here. you Are you visualizing this at the moment? is it Is it like, is it all just matrices being chewed through or is at some point is there like a video graphic simulation of how you can check your work? Yes, we were also working on this. So we also had some kind of ray tracing software to re to rebuild the so the surface of the water and the reflection of the light on it so that we had a kind of real life oh visualization of what was happening.
00:22:06
Speaker
check that very useful It was for the waterfall simulation, it was actually to to compare with high speed cameras, photography, to to really to yeah yeah to be sure we are reproducing the shapes of the substructures of the waterfall, which is complex.
00:22:25
Speaker
Does it work? I mean, I was thinking, there's that thing where you go from the microscopic up to the macroscopic and never quite matches because the world is so different. How accurate were your simulations? Yeah, so actually, this is the beauty of these equations that we use to to simulate the fluid is that it it I'm not sure if I can define exactly the the the degree of precision for these equations, but it's really amazingly accurate. Like when you look at it, it's really it really looks like reality. But what we were trying to achieve was not only looking at it and and making it look real, but also having the right levels of pressures and the right levels, the right velocities, because this is actually what's important for designing ah any kind of protection or
00:23:16
Speaker
um like your understanding the fatigue of a structure or anything like this, you have to have the physics right, not just making it look real, because this is pretty easy to make it look real. To have it actually right, the right pressures, the right velocities is much harder. And yes, for this, we're getting really good results as well.
00:23:33
Speaker
OK, so let's go back into the programming a bit then. I'm trying to imagine what your day was like. Did someone come with a bunch of measurements for a dam and you had to write a new program, or were you writing a generic simulator? It was a generic program. Yes, it was a generic simulator. So when we had a new thing to simulate, we would ah first make a numerical model of it, like ah build a like maybe we had some scans for example and we had to like chena the scans to have ah
00:24:06
Speaker
certain distribution of points on it on the surfaces, and then we had to fill it with particles representing the fluid. So we had these pre-processing stages. and then we would the So the the software we were developing would run this simulation. So we would yeah have some some way to set initial conditions and so on. And and then we would post-process the results. So we had we used this tool named ParaView where you can uh, do some, some frames, visualize some variables in your simulation and so on. And, uh, basically that's how it would proceed. Okay. And what are you writing this in? Is it still mathematical? No, no, no, no. So the free processor tool, I think it was
00:24:55
Speaker
I don't remember. Actually, it was pretty a long time ago, but I think it was written in fourth round the start menu. But the main software for the simulation, it was written in C++ CUDA.

CUDA Programming and Optimization

00:25:06
Speaker
So CUDA is this language that so you can use to program NVDR GPUs.
00:25:11
Speaker
uh you have some alternatives and now you have more and more but at the time it was really the most efficient way to program a GPU. I believe today it's it's kind of still the case even though you have real core coming in but I'm still today working with with our program. Okay and is it like excuse me is it like um is it I mean, the only language I i know I can imagine code for is like a web shader shader languages, right?
00:25:43
Speaker
Are you converting your mathematical model into something that looks like a GPU shader? or no one um guy is general yeah It's a general purpose language, so basically you write C++ code. and there are some kind of So you have to just change the way you write it, because you you are writing what happens within each thread of the GPU.
00:26:05
Speaker
And you have to organize how those threads cooperate within blocks of threads to the logic for how you write program is different but it's a plus plus with some additions that say that make it possible to configure how you launch things on the gpu.
00:26:22
Speaker
So how many threads you're going to launch, what is going to be the size of the thread blocks, and so on and so forth. And there are some things you can use on the GPU ah to make things more efficient. ah For example, you can use something named shared memory, where threads can have access to a shared memory pool, let's say. So basically, it's it means ah you will read from some memory that's faster to access than the global memory of the GPU.
00:26:51
Speaker
So there are lots of low level optimizations to be made there to really achieve this performance on the GPU. And that qug this language allows you to to make. um So today, it's like, for example, they introduced, I think in 2012, introduced these Tensor cores in the GPUs. And you also have some types of instructions to use those Tensor cores to perform matrix multiplication.
00:27:18
Speaker
So um by at the time we were using things like texture memory that were like very video game specific. Now I don't use this anymore in my my job, but at the time and we were using this texture memory, things like this to make things more efficient. So is it then that your
00:27:40
Speaker
trying I'm trying to get a feel for this. Is it that you're writing code for the GPU, which could be like like classical linear code? It looks like a C++ plus plus code. Okay. But then you're trying to be aware of all the hardware level tricks where you can optimize and parallelize. Yes. Yes. And also the logic of how you write things is slightly different. Like you don't write a for loop.
00:28:07
Speaker
Instead of writing a for loop to do something, you will write something in the kernel, and in the kernel, so this function that's named the kernel, you describe what happens per thread, what each thread does, and then you configure that kernel to use a certain number of threads, a certain number of block threads, and a certain amount of shared memory. And then you can have some more complex things where you can also streamline work on the GPU. So you also have this type of choice. But basically, GPU programming comes down to this. And inside the kernel there, you can do the low-level optimizations of really how you want things to happen on a GPU.
00:28:48
Speaker
Okay, yeah ah does this mean that a large part of your programming is understanding the specific GPU card you're using or is it more general than that? Yeah, so definitely that's a big part of ah of my work ah and also understanding how the compiler the MPD compiler behaves when I write some things. Usually, I don't understand, so I have to just try it out and see what happens. and um That sounds very familiar programming. yeah
00:29:20
Speaker
but yeah and ah yeah then every Every generation of GPU comes with some changes, so we also have to adapt to to this. and There's also this dimension where you can ah use multiple GPUs to perform computations. and Already during my PhD, I was doing my TPU computations.
00:29:38
Speaker
So then you have to coordinate how like what each GPU does when and how it collaborates with the others. So you have synchronization at the kernel level on the GPU. You have synchronization between kernels. You have synchronization between GPUs. So yeah, it's a lot of parallel programming. A synchronous parallel programming, yeah. Does it end up looking something like pipeline-based programming where you have It can. So part of it is is something like that, yes. um But you also have like the lower lower level stuff where you can also... But it's the same on a CPU. You can also write some lower level things like using specific instructions to accelerate specific parts of your of your code. But you don't have access to... like On the GPU, you can really decide what what area of the cache you're using, for example.
00:30:33
Speaker
And you have to really be sure you don't use too many registers. So it's like, yeah, it's a lot of of small optimizations to make before you reach the best performance for a specific function. Yeah, yeah. Diving into all those things we tried to abstract away from in the 60s and 70s. Yeah, exactly.
00:30:57
Speaker
I'm an old timer actually. We're going backwards through time if you're now worrying about registers. yeah Yeah, no, but because on GPUs it's really something you want to contribute to monitor um to because it's it's you define what happens on a thread on your GPU and you define how many threads you want to launch and then the compiler schedules everything for you. So you're not writing the scheduling of how each thread will be executed. This is the compiler that makes it for you. And depending on how you wrote your frankence your calendar then you achieve different occupancies on your GPU, meaning your your you can be using 100% of the cores all the time, or you can be using 10% of the cores all the time, and you can already have an intuition that if you're only using 10% of your cores, your performance is affected. So the the occupancy on your GPU, so how many cores are actually being used ah just during time,
00:31:58
Speaker
is determined by how many registers you use, how much share memory you use, how many threads you're launching, and so on and so forth. So that's really something we try to monitor closely. And a big part of my work is to try and find an optimal value for that occupancy. Because 100% occupancy is not always good either. It usually means you're just running too many small tasks in parallel and probably there's a better way to do it.
00:32:23
Speaker
Oh, I see. Yeah, yeah, yeah. So how how much of your job is writing new code and how much of it is like digging in and optimizing existing code?

Debugging GPU Code

00:32:32
Speaker
Oh, good question. um I think in my job I dig into the code and optimize mostly. um I think writing new code, I did it. ah I don't know. It's hard to tell. It depends on the moments, actually, because sometimes you're writing something for the first time. It's pretty straightforward, let's say. you You don't try to optimize everything the first time you write something. You want this to be correct first. And then it takes a really long time to to make it optimal yeah and to debug, actually, because debugging is a big part of my job. I spend a lot of time debugging.
00:33:13
Speaker
Yeah, I can totally believe it. Maybe we should go into how you did debugging on this. Actually, let's do that. Let's go into how do you debug these pipelines? What kind of problems come up? What tools have you got? Yeah, so we have ah debugger that debugger. So you can know what happens exactly in your kernel on which thread and at at what time and so on. But It's complicated to debug because many things have happened asynchronously. So if it's a synchronization bug, it's super hard to detect where it comes from. So how I debug is also mostly printing stuff. Sorry. But I do this a lot. like I print stuff in my program and I try to understand why this variable is changing now, where it comes from, the change, and and so on.
00:33:57
Speaker
Many times we have also like ah memory bugs, where we're trying to access memory in a place in the in an array that's not correct. Maybe it's allocated, but it's not the right place. So this is extremely hard to detect because it's not going to be the debugger wants detected because memory is valid. It's just not the right value you're fetching.
00:34:18
Speaker
And so this is actually very very problematic. I think maybe the worst bugs are those ones, the synchronization bugs and those accesses to places in arrays that are not and the correct ones. This is the and in in oh so In the CFD world, so in my free dynamics ah work, it was already very hard to debug because there are so many particles doing things altogether. When you have a small bug that's only affecting one or two particles, you don't really see an effect on your form simulation.
00:34:49
Speaker
Yeah, at least not until a very long time. Yeah, it's very long. to It's hard seeing the rest of the noise. Yeah. yeah and And now I'm so after after I worked in this free dynamics project, I worked at also at PDF for some time after that I switched ah because i I was working in a research department and I really wanted to be working with software developers.

Transition to RTE and Power Grid Simulations

00:35:12
Speaker
I wanted to be in a team of software developers. And so that's when I joined this the power grid operator in France that's named RTE and working on software a Java software and no more you ah that was doing power grid simulation. Power grid simulation is the same physics. is just ah For most things, it's a lot simplified. We don't simulate all the physical effects. and ah and To me, it looked more like building a huge
00:35:47
Speaker
legal software because you have all these equipments in the whole country, in the whole of Europe. and We were focusing on simulations for France and for Europe. So you have all these equipments that are of different characteristics, different physical behaviors, and you had to represent them in your software and make sure everything was ah conformant with actual equipments on the ground and so on. So it was like assembling lots of LEGO pieces to me. So what kind of things are you What level of detail are you simulating there? Are you simulating the kind of, I don't know, ah ampage that a particular cable stretching across Mars can take?
00:36:26
Speaker
yeah no so it's it's the high tension lines that we were simulating. So it's like the highways of electricity. For all of this, yes, we monitor the, we monitor on the amperage voltage, everything that that would be relevant to making decisions, making real time decisions for operators of the electrical system, and also making market decisions for like, ah
00:36:56
Speaker
provisions of how the the grid would be operated within three years, one year, one week, one day, one hour, you know, so making these kind of provisions.
00:37:09
Speaker
And so for me, it was a lot less math. There was no more GPUs. It was a lot less about optimizing things at the low level. And I think I was missing it probably. But I got this opportunity to work for a company that was based in Paris, that's named

Shift to Cryptography at Zama

00:37:25
Speaker
Zama. And that was building this cryptography technique. Yeah, yeah you're you're heading back to the GPU, but this time for crypto. Yes. Not for crypto. I almost said crypto. That's a very different thing.
00:37:37
Speaker
Yeah, so for cryptography. Yeah, so let's be clear on that. Yes. and And so I completely moved away from physics. Now I'm not working with physics anymore. But I'm still doing the same kind of things. I'm still programming GPUs to accelerate complex algorithms. And I think that's really a something I really like to do in my job. And so I'm pretty happy to be doing this today.
00:38:04
Speaker
So maybe I can just give an idea of what this cryptography technique is and what it means to make it faster. Yeah, because it's got a cracking name,

Understanding Fully Homomorphic Encryption

00:38:15
Speaker
hasn't it? Fully homomorphic encryption. Yes. Right. Yes, right. Explain that. Yeah. So basically,
00:38:25
Speaker
i Everything we do when we interact with the internet or anything like this is encrypted in transfer. So when you want to perform some computation on data today, you encrypt it, you send it encrypted to a server, and the server will decrypt it and run computations on it.
00:38:50
Speaker
What this technology, so fully homomorphic encryption, I will name it FHE, what it allows is to encrypt your data, send it encrypted, and the server will perform the computation on the encrypted data without decrypting it. We'll send you back the encrypted results and only you can decrypt it. Now, on the surface of it, that sounds impossible. Yes. yeah it's ah yeah it's it's the What makes this possible is by writing a cryptographic scheme that has some mathematical properties that makes this possible. It's below below the hood. It's really some some mathematical properties of your cryptographic scheme that makes it possible.
00:39:38
Speaker
and um
00:39:42
Speaker
So depending on your schemes, you have different properties. So many cryptographic schemes, you can add ciphertexts, and the result of this addition is the encryption of the addition. So I'm going to have to go slowly on this to make sure I'm following. So I've got one document I encrypt, I've got another document I encrypt, and I can concatenate them and still get so the meaningful concatenation of the two encrypted documents back out. Exactly. Exactly.
00:40:10
Speaker
Okay, I don't know how that's done, but I can imagine a mathematician figuring that out. because Yeah, because what you do when you encrypt something is you make your data look random. Basically, it's the core of photographography everything has to look random and nobody can make a difference between your data and an actual random generate random number generated by our Yeah, okay a computer. And so the way you make it look random, you can imagine lots of ways to make it look random, you know. But the algorithm you choose to do it, depending on it, you have some properties on the scheme.
00:40:54
Speaker
itself on the cipher text You can ah perform additions. Sometimes you can perform also multiplications. Sometimes you can perform only a limited number of additions or multiplications. And today in the in the with the technology we're developing at XAML, we can perform any number of operations and any operation. So that's really powerful. like yeah It's something like people have been thinking about since the 70s. And it only became possible in 2009 with a paper by Craig Gentry, who thought of a technique to, okay, I have to explain some some lower level things to explain this, but basically in the technique we use, we hide the message with something that looks random.

Managing Noise in Encrypted Computations

00:41:46
Speaker
And we also add some noise to it. okay And the problem when you add two ciphertexts is that this noise grows. When you multiply a ciphertext with a clear value, your noise grows. And if your noise grows too much, you correct your message. And so in 2009, Craig Gentry came up with a technique that allows you to reset the noise in your ciphertext to a nominal value. And that was a game changer because now you can perform as many operations as required for your secret. I still can't see beyond the veil to how you perform these. Because you're saying this is all happening without the person doing the addition, let's say, decrypting the message. Yes. So I've got two big numbers that have been encrypted.
00:42:42
Speaker
I am now going to be able to multiply the encrypted versions together without knowing what the numbers are. Yes. And the the mathematical property of the scheme you use to encrypt, with the addition, it's it's more simple to to visualize it. But um in the technique we're using, we are... oh Oh, it's complicated to explain in just a few words, but we are basically hiding the message with something that looks random.
00:43:12
Speaker
And this thing, when you add two of them together, they add up. So when you decrypt, you get the result of the addition.
00:43:25
Speaker
Are you saying that I'm adding, that in a way, I'm adding the two numbers and adding the two sets of noise. And when someone goes to decrypt it, they can remove the added, remove the summed noise and get yeah some value back. Yes, basically, that's it.
00:43:42
Speaker
Okay. And this guy, Craig Gentry, if you do lots of these operations, the noise gets larger and larger and larger, yeah it obscures it. And this guy, Craig Gentry figured out how to reset the noise without decrypting. Yes, exactly. Okay. okay This is complicated. Yeah. but okay But basically that's it. And the thing is when he had the idea, it was impractical because it was way too slow to do that.
00:44:11
Speaker
And so when zama my company started, it was still very impractical to to use this kind of schemes. But today it's ah I think since I started, I think it got

Accelerating Cryptographic Computations

00:44:24
Speaker
100 times faster. So now we are really getting into a stage where we can use this technology in practice. And that's very exciting. Give me some idea of the times we're talking. Define slow and 100 times faster than that.
00:44:38
Speaker
Yeah, so depending on your scheme, computing on encrypted data is can be a thousand times slower than computing on the clear data. Okay, so you've lost three orders of magnitude on your addition operation. Yeah. That's painful. At least, I think maybe it can be more, but all it can be less depending on your
00:45:03
Speaker
decision of the computations. But yeah, it's about that. And we are still, yeah, we still need a lot of acceleration for this technology. But it's it's really like, a you know, it's, it's something so new, that ah first, we are targeting use cases where speed is not so much of an issue. Okay, well, we can afford to to have some latency in the computations. So that's why the first clients for Zama are in the blockchain field.
00:45:33
Speaker
surprising yeah yeah and so because there we cannot afford to have some latency and so it's It's much easier to have businesses being built around this technology than, for example, in AI, where it's already very computationally intensive. So it's very hard to to make it practical to use this technology and have a real business value there. So we are also working on it. I think it we're still missing some acceleration there to make it actually practical.
00:46:04
Speaker
But okay, so speed aside for the moment, what's, what's the use case of being able to do operations on a large language model that's encrypted without decrypting it? Why is that good? Well, it means, okay. Usually when you perform this type of computation, you send your data to a server and you're not the person who controls the s server. So.
00:46:28
Speaker
You have to trust that server and also you have to trust that server is ah protected against attacks, you know? So having everything encrypted end to end, it means you are intrinsically protected against this. It's like you you could have, for example, you could have personalized adverts without Google ever seeing your data, for example. Okay.
00:46:53
Speaker
Or I'm thinking I'm running a legal firm and I'd like to be able to search all my case histories. So I want someone to build a large language model around my case history. yeah I don't want to send my client's legal documents to you. Exactly. This is all in the healthcare industry, finance industry, everything that holds sensitive data basically is effective.
00:47:17
Speaker
And I can see a reason why you want someone else to build the large language model for you. Yes. but You don't want to have thousands of GPUs in your, in your office. It's not available. Okay. Yeah, I can start to see that. That I, okay. I see the use of it. So here you can, this is why they've hired Agnes Leroy because there's a use case for it, but I can see why that would be slow.
00:47:47
Speaker
Yeah. what tell me Tell me how your job has gone optimizing that. So basically, when I started out, we were focusing on optimizing one specific operation that is this bootstrapping that reduces the noise in the ciphertexts, implementing this on the GPU and making it as fast as we could on a GPU.
00:48:09
Speaker
And at first it was not clear what our target was in terms of optimization, because we didn't know if we wanted to be focusing on the throughput. So being able to achieve thousands of these bootstraps in parallel as fast as possible, or if we wanted the latency of one of those operations to be the slow the smallest possible. So it was something that came like with the the growth of the company and the fact we had first clients, it became clearer what our targets were. And my job became simpler because I could really focus on these objectives. So yeah so this function is extremely hard to accelerate on a GPU. It's a little bit like the fluid dynamics thing. It's not really fit for a GPU. So at first, we didn't get any speed up. It was a terrible.
00:49:04
Speaker
But that's expected. like and Also, like the the CPU code we were comparing with was getting a lot more optimized as time went by. So we were comparing with something that was always getting better, which was good. Meanwhile, you're losing ti you're you're losing time in the race because you're converting it over to run on the GPU for starters. Yeah, yeah, yeah. Right. but um But that's also expected. like it's This is something I had already experienced in my dynamics ah um my food dynamics at work.
00:49:34
Speaker
um And so when we started actually focusing on implementing specific integer operations on the GPU, then we started to have like much better speed up because then we were really able to fine tune the implementations for this.
00:49:50
Speaker
these specific amounts of bootstraps would be running in parallel and so on. So that really changed a lot. And now we are able to have three times better performance on the GPU than on the on the highly optimized CPU implementation on very large CPUs running on the large CPUs. So we're really comparing on hardware that are like top of the of the game. And also Nvidia has been releasing the Hopper GPU. They released it like a year and a half ago maybe or more than a year ago, two years ago, I don't remember. But this GPU is like really much bigger than the previous ones. So that also helped a lot in terms of performance on ah on the GPU.
00:50:35
Speaker
And we we also see this trend continuing in the future, like Nvidia is going to read release bigger and bigger and more powerful GPUs. So that's also something that will help a lot for us. But yeah, it's these new GPUs, they also come with new features. So we also try to adapt our software to to use these new features as best as we can. So it's very interesting. We are like really um up to date with the the newest things that come up

Optimizing Fast Fourier Transform on GPUs

00:51:06
Speaker
on the GPU world. So that's very interesting. Give me an example of that. What counts as an exciting new GPU feature? So at first we thought we could use a pencil crawls. So maybe just one thing is that, you know, I said in free dynamics, everything I was doing was coming down to invert ah computing metrics inversions. Here in the cryptography world, everything comes down to computing polynomial multiplications.
00:51:31
Speaker
Okay, lots of polynomial applications. So very classical algorithm again. And to do this, we use the fast Fourier transform algorithm. and Okay, so we are trying to accelerate the execution of thousands of fast Fourier transforms in sequential and in parallel on the GPU. So it's, ah it's basically what we're doing, also using multiple GPUs, and so on.
00:51:56
Speaker
So now I forgot ah why I was saying this. excited We were talking about exciting new exciting hardware work features. so yeah so at At some point we thought maybe we could use the new tensor cores on the Hopper GPUs to perform those fast Fourier transforms. We needed a tensor core that could deal with the data types we are using, which is a double precision protein points.
00:52:22
Speaker
And this was released in the Hopper GPU. um But actually, in practice, it proved that it's it's not so easy because you have to load memory on those cores and to retrieve memory from the cores. And you have to really load them fully if you want it to be very efficient. So for now, we are not digging into that much further. But what we started using was a distributed shared memory feature of the Hopper.
00:52:48
Speaker
So you know I told you we can use shared memory within a thread block to have threads access memory faster than if it were on the global memory. And with this feature, you can have different blocks accessing the same type of memory without needing so much synchronization. So that's really interesting also for us. You can have more cooperation at a higher level on the GPU than before. That sounds like you're going to run and run into even more race conditions and memory errors.
00:53:18
Speaker
all I think race conditions is something we're really used to. bugs They come up with every line of code. like it' it's i don't I don't know. it's like ah Every 10 lines of code, you make one book. I think it was something like this I read some time online. like it's ah It's crazy the amount of of errors and mistakes we make. ah Fortunately, we have lots of testing tools to make sure we' we're not breaking anything. And actually, a big part of my work was to build a very strong CI infrastructure, so continuous integration and and very strong tests. And the whole team has spent a lot of energy on this to make sure that we are testing everything. Because what we do has no reason to it.
00:54:03
Speaker
So yeah this noise makes it hard to actually track what's going on and everything is hidden and looks random. So when you debug, I can, you can imagine how painful this can be yeah because everything looks like a very large, like a close to the maximum of the new 64 representation number.
00:54:21
Speaker
so It's not so easy. And there are lots of good numbers. So it's, yeah, it's, that's funny. And it's, there's a definite parallel there to, uh, the fluid dynamics simulation. Yeah, totally. Very small problems add up. Yeah, exactly. Very small effects can correct your message in the end if you don't, uh, detect them. So yeah. But we are saying that this is, cause it sounds like this encryption technique could be something that's lossy.
00:54:52
Speaker
Yeah, so no, depending on how you do it, it's not. So that's the thing. It's like today with a technique by using, we can ensure that the result is correct. We do certain properties of failure, but we set properties of failure that is so low that this does not happen in practice. That's how we do it.
00:55:11
Speaker
And to to achieve it, we have to make sure the cryptographic parameters are chosen correctly in the sense that you know I said we hide the message said with some random but something that looks random. And to build this random thing, we have to use but lots of integers. And we can choose how many we use. We can choose the size of the the the The ciphertext, basically, and the the variance of the noise we use in the ciphertext, all those things, we the the real trick in this technique is how to choose them so that you achieve a certain probability of failure, you achieve a certain level of security, ah and you don't compromise too much on performance.
00:55:54
Speaker
That's really like ah the core of of what we've been doing at XAML. I think a big part of it is how to choose those things in a way that makes it possible to use the technique and practical that um and secure. um You got the juggling technique of those parameters, the juggling technique of GPU parameters and the juggling technique of low level parallel programming.
00:56:21
Speaker
Yeah. Okay, you've got plenty of problems. That's good. Yeah, that's very exciting. But i like ah the the vision for for the company right now is that on the short and middle term, CPU and and then GPU acceleration will make the technique more performance. But on the longer term, the goal would be to have a dedicated hardware.
00:56:42
Speaker
for this technology and because in cryptography this is something that happened already like ah today everybody is using encryption so this HTTPS protocol is is also possible because you have some ah dedicated hardware in every laptop that that does it. So this AES algorithm is actually implemented in the hardware. And so that's why the vision for this technology to really take on is also to have dedicated hardware to to accelerate it.
00:57:17
Speaker
maybe one possible, one thing that could be possible is to have some cores in the GPUs, for example, even without speaking of a dedicated ASIC, so a dedicated chip, we could imagine some GPUs that have some specific, for example, multiplication cores. Like you have for AI today, you have a matrix multiplication cores, dedicated for it. So you could imagine to have a GPU that has these cores. If we had this, everything would be much faster on GPU.
00:57:45
Speaker
So is that is there this thing of you figuring out how to make it run fast on GPUs to figure out what you might go and ask Nvidia to implement? Yeah, totally. Yeah. Nvidia or AMD or you know it's ah s yeah whoever, but definitely it's something that's ah that's very important no to know exactly because it's not any polymers we are multiplying. They have a specific size, you know, so it's When you try to implement a very generic polynomial multiplication thing, it's complicated because you can have any size of polynomials. But when you know your polynomials are certain size, you can really fine tune everything to fit that size. And that's really very different. yeah So the tensile cores today, they can they were made to be able to handle any size of matrices.
00:58:34
Speaker
But I really think if NVIDIA had implemented something to only multiply, so right now I think they can multiply 4 by 4 matrices depending on your data types it may change, but something like this and and then they they combine them to multiply very large matrices. So yeah, really it's like ah also trying to to experiment with different ah fast Fourier transform algorithms.
00:58:59
Speaker
is super important in our job and we are also trying to see like when we um when we switch algorithms what happens in terms of performance how it behaves on the and the GPU how and we we try to imagine like how in the GPU disk be integrated embedded in the hardware as well. Right yeah yeah because I've seen this thing where sometimes people choose a seemingly slightly worse algorithm because it maps to the hardware much better. Yeah yeah yeah.
00:59:29
Speaker
OK, I think I have to ask you one final question, which may be ah be slightly ah flammable. But I've got to ask, right? So it seems to me that a large part of your programming job is dealing with things like memory errors, register allocation, memory safety, parallel programming, all that stuff. Yeah.
00:59:51
Speaker
i Meanwhile, I look at the history of the CPU-based world and the movies away from very low-level languages to, I mean, it's easy to pick on, say, the the drive to go from C to Rust to get the compiler to take care of these safety issues for you. Yeah, that's that's that's where things have to go.

Desire for Rust Equivalent to CUDA

01:00:15
Speaker
Yeah. Do you think that's, is that something you would like to see happen in your world? Yeah. yeah So he would happily get away from two plus plus. So actually half of my work is also in Rust because we integrate our CUDA library in the Rust project that we have that's for the CPU execution. And so I actually, I worked a lot with Rust since I joined the XAML.
01:00:41
Speaker
And I can 100% say I would love to have a rest equivalent of Kida because this is like changing your life totally. This is my dream. And I know there is a project trying to to and to do this, so I hope that soon this will not be a dream anymore and I will be able to switch and to have some compiler safety and shoulder rules.
01:01:05
Speaker
Say some of your debugging time. Oh yes, totally. totally okay Because you know REST is so powerful. Like for me, it's ah something every developer should look at. If you have the choice between ah writing a program in C++ or writing it in REST,
01:01:20
Speaker
For your own sanity of mine, please choose Rust. There are some drawbacks, and especially it's it's a bit hard to learn Rust, I guess, or at least that's what I seem to see with people. like they They tend to have a kind of a hard time getting started with it.
01:01:40
Speaker
But you really have to talk to the compiler. The compiler talks to you, you have to read what it says, understand it. And once you've done this, things get much clearer and the pain of using Rust goes away and you only see the advantages. It's like, yeah, for me, it's 100% in favor of Rust. If I could, like from the start, I would have written the project in Rust. But to configure the kernels and to write the kernels, we have to write them in C++.
01:02:10
Speaker
For now, we are still then to there. Okay. It sounds like the future episode of this podcast needs to be with the people developing a Rust-based CUDA. Yes. That would be amazing. That would be really amazing. Groovy. In the meantime, I should leave you to go and optimize what you can in C++. plus plus Yeah. Thank you. And yes, thanks very much for joining me. That was fascinating.
01:02:36
Speaker
Thank you very much. Thank you very much, Chris. It was a pleasure. Cheers. Thank you, Agnes. I have to wonder, after all that, are more people going to be disturbed by the image of stress testing dentist tools for root canals or by the critique of C++ plus plus versus Rust? Tough to say. Hopefully neither. But you know if you've been troubled by any of the issues raised in this week's podcast, our contact details are in the show notes below.
01:03:04
Speaker
But you should probably just have a cup of tea. That's probably for the best. I'm going to go and have a cup of tea. Before I do, I shall just say, if you've enjoyed this episode, please leave a like. If you want to support this podcast and future episodes, go find us on Patreon and make sure you're subscribed for next week's episode. Until then, I've been your host, Chris Jenkins. This has been Developer Voices with Ennius LeRoy. Thanks for listening.