Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
PyO3: From Python to Rust and Back Again (with David Hewitt) image

PyO3: From Python to Rust and Back Again (with David Hewitt)

Developer Voices
Avatar
3k Plays4 months ago

There’s huge pressure on Python at the moment to get faster, ideally without changing at all. One increasingly–popular way of achieving that impossible task is to push the performance critical code down into C, C++, or Rust. And this week we’re focussing on the Python route, as we take a look at PyO3.

David Hewitt’s the principal committer to PyO3, and he joins us to go through the easy parts, the hard parts, and the works in progress, giving us an insight into how Python and Rust work under the hood, and quite how much work it takes to make them work as one.

PyO3 User Guide: https://pyo3.rs/v0.22.0/

PyO3 on Github: https://github.com/PyO3/pyo3

Polars: https://pola.rs/

Tokio: https://tokio.rs/

Trio: https://trio.readthedocs.io/

Robyn: https://github.com/sparckles/Robyn

Faster CPython: https://github.com/faster-cpython

Maturin: https://www.maturin.rs/

David on Mastodon: https://fosstodon.org/@davidhewitt

David on Twitter: https://x.com/davidhewittdev

Kris on Mastodon: http://mastodon.social/@krisajenkins

Kris on LinkedIn: https://www.linkedin.com/in/krisjenkins/

Kris on Twitter: https://x.com/krisajenkins

Recommended
Transcript

Clojure and Java Interop

00:00:00
Speaker
Getting different programming languages to talk to each other can be a really tricky thing. I remember a few years back, I had this job writing Clojure, which is the Lisp that runs on the JVM. And I got into this language interrupt discussion with my colleague, because he was saying that Clojure's interrupt with Java is awful. And that really surprised me, because I think it's great. It's lightweight, it's easy to use, it works out the box. What more can you ask?
00:00:28
Speaker
And so we could have had this lovely old argument about it saying, I'm right, you're wrong. But we made the classic mistake of listening to each other and found out that we actually basically agreed. Closure's interop with Java is terrific, but that only helps you so much when you're connecting such fundamentally different languages. Java explicitly builds up the world from mutable objects and Closure explicitly says that's the whole problem.
00:00:56
Speaker
So the challenge isn't getting two languages to communicate with each other. The challenge is that fundamentally they speak different languages.

Python and Rust Interop Challenges with David Hewitt

00:01:05
Speaker
And I had that in mind when I sat down to talk to David Hewitt. He's the principal committer to PIO3, which is a project that aims to make it easy to interop between Python and Rust.
00:01:18
Speaker
And I can instantly see how that must be possible, right? Python can call C, C can call Rust, and vice versa. But the fact that it's technically possible doesn't tell you how well it's going to work. Python and Rust are quite different languages. They have different ideas about memory management, threading, mutation, errors. How do you make them work together well? How do you make it seamless?
00:01:44
Speaker
That's David's aim with Py03. And as we discuss it, David has some really lovely juicy details about how Python's internals work, how Rust's internals work, and all the design work they're doing to make them marry together neatly. There is an absolute ton to learn from this discussion, not least that with the right design, it might well be possible for two quite different languages to coexist happily.
00:02:12
Speaker
I'm your host, Chris Jenkins. This is Developer Voices, and today's voice is David Hewitt. I'm joined today by David Hewitt. David, how are you? I'm very well, thank you. Thanks for having me on the podcast today. I'm really excited to be here.
00:02:38
Speaker
Oh, it's pleasure. I feel like I'm going to learn a lot from you today. You've got some really thorny details to talk about. Let's hope I can deliver on that promise. Yeah, we can definitely deliver on the thorny part. I'm sure of that Rust and Python. Tell me how you got started in this because you didn't start the Rust Python bridge project, did you?
00:02:58
Speaker
No. So it had been going on for a few years before I joined and maybe there's like two different things that we can pick out here really. One is how did I find my way to working on the project and how did the project itself start? So I guess I'll start with like my own pathway in because I think it's a
00:03:14
Speaker
it explains a little bit about the motivations for it, which is I've been a Python dev primarily, I guess, as a history for maybe 12 years at this point. And in all of those jobs, to some extent, Python's bottleneck is inevitably performance. And so you end up tapping down to whether it's libraries like NumPy or other native compile code to get the full juice out of Python, if you like.
00:03:42
Speaker
And then in one of those roles, I ended up in a mixed Python C++ code base where we were doing this ourselves. And so I had a lot of experience actually with both Python 2 and Python 3. It was that long ago, you know, before we all managed to get away from the Python 2 ecosystem more generally in production.
00:04:01
Speaker
I helped migrate this like mixed c++ python world over to python three and so got a lot of experience of how python was tapping into c++.
00:04:14
Speaker
and basically rewrite those bindings for Python 3 and really got down to the weeds for what's going on in that binding layer. And it was actually while I was working in that job, I also discovered this was around the time of Rust 1.0. I think I might have found Rust 1.03, perhaps, very, very early. I think it was 2015 still then. I somewhat lost track of time over the years.
00:04:36
Speaker
But I fell in love with Rust. As a hobbyist, I was just programming on it at home and very much enjoying. Early things I did weren't actually in the mixed Rust and Python space. I was excited in just trying out Rust for web frameworks, because there was some hype around Rust and WebAssembly that maybe has come across in other podcast episodes of yours. And also, I was playing around with building Android apps in Rust, which is a mad experiment that's still not very common.
00:05:05
Speaker
I have a lot of fun every so often with my own personal app tinkering around with that. Fast forward a few years. Basically, I kept finding myself in this role being like, I'm mixing Python and C++. C++ is

Why Rust? David's Perspective

00:05:19
Speaker
a very, very powerful and good language for many, many use cases, but it's also not super ergonomic. Rust is really ergonomic and I find it very fun to work with.
00:05:29
Speaker
I just kept being like, if I could use Rust for this instead of C++. Fast forward a little bit longer, once I'd actually left that role and was going on and doing other things, I had a bit more time for myself again, and then started contributing to PIO3 as one of these... First, it was a bit of a hobby or fun if you like to go do that.
00:05:50
Speaker
And overtime i just kept contributing and kept contributing and before you know i'm the person that's been contributing for the most for the longest and sort of have been championing the story of it. That's a nice way to get into it because it's like it's not like you took on a role it's just you happen to be that enthusiastic.
00:06:09
Speaker
Yeah, maybe I've got a selection bias effect here, but I feel like that's true for a lot of the Rust ecosystem. It's grown up over the last decade, really. Primarily, it's been driven by people who've come along, and maybe they've had prior experience that's somewhat relevant to whatever they're building, but they've sort of had the passion to go forward and create it. There's other Rust libraries, like one called Diesel, which is known for being an ORM
00:06:34
Speaker
Enrasters not the main one that's used maybe today there are other choices but for a long time like this was built by i think they came from ruby on rails. I just had the experience they went and built the russell rm and did a lot for it in that same way. Okay yeah i can see that this is a temptation to when you learn a new language to bring your favorite designs along for the ride yeah you know ultimately the russell ecosystem did need all those projects as well so it's very very helpful to have all this great.
00:07:01
Speaker
Yeah. So that makes a lot of sense if you've got all that experience with wrangling C and C++. We need to get into what makes Python to Rust interesting, right? And I was thinking about this because I have a vague notion of how Python to C works. Tell me if I'm wrong. Python's objects under the hood, Python's types, are basically C structs.
00:07:31
Speaker
Yes. Um, so the Python interpreter is written in C and so this means that, um, if you've done C and C plus plus programming, you define your types in a header file, right? Which basically gives you the layout of what the data that you're going to be passing around your program is in memory. And like it defines other things like the functions you can do on that data. It's kind of like defining your public API in these header files. And, um, and the Python interpreter,
00:08:01
Speaker
because it's built in this way, it's like almost open for extension. They call these native modules extension modules. And it's very much like a part of the Python interpreters design that you're supposed to be able to go compile all these things yourself and like interoperate with those C APIs that are defined. And so, I guess talking back to like that particular point,
00:08:30
Speaker
The idea is that you've got these structures in C, and a Python object is just one of those things. It's got a very specific layout. It's got a few fields in it which really, really matter. Things like a Python object has a type. A Python object is kind of a dictionary with a collection of keys and values. And it's got a reference count. And that's kind of what a Python object is. And that is found in one of these header files as a definition. It's just a py object is the structure name.
00:08:59
Speaker
And then if you've got more custom Python objects like lists, for example, they all have that same structure at the start of that layout in memory, if you like, and then they might carry extra data further on beside them. But the interpreter can work with all of them in the same way as just a PyObject because their first chunk of memory, if you like, is that PyObject value. Right. So if I were trying to take, if I were naively trying to bridge from Python to Rust,
00:09:28
Speaker
My instincts would be to bridge from Python to C and then C to Rust. And that feels possible, but kind of ugly. So your

PyO3: Bridging Python and Rust

00:09:39
Speaker
instinct is right, but it's not as bad as you fear. Because if we go back to the idea of like these header files define the public API of the Python interpreter, if you like, and in so far as like, actually, if you go to the Python main documentation, there's a lot of documentation about what they call this C API.
00:09:58
Speaker
And Rust speaks C, if you like. Rust has all sorts of higher-level language features, despite being a compiled language, but it's also designed to be interoperable with C. And not on the level where you can feed the Rust compiler a C header, and it will know what to do with it. But you can say to the Rust compiler, hey, I have some data here, and this data needs to be laid out in the same way that C would lay out this data.
00:10:26
Speaker
And also, hey, I have a function. I know that this function is actually defined over in C. So when you call this function, you need to feed the arguments onto the stack and registers in the same way as a C program would. So it can speak what we call the C A B I, right? The application binding interface.
00:10:46
Speaker
And so in this way, that's how we do it from Py03. We have in Rust terminology, you call it like a module that is called a crate sometimes. It's like a package, would be the best approximation in another language. And so we have a crate called Py03ffi, where we basically have to define all of the public API of the Python interpreter as Rust would think about it as a C external API.
00:11:14
Speaker
so that gives us the power for then ROS programs to compile against that definition and everything just hangs together. That last sentence seemed like it was doing a lot of work. Everything just hangs together. So, I mean, there's a lot of nuance in that, right? Like the C-A-B-I is to some extent dependent on the actual operating system that you're running on and the hardware
00:11:40
Speaker
and all these kinds of things and so to all intents and purposes if you use like simple very standard things then the Rust compiler understands it well and it's very easy to get this interoperability to work nicely. There are more advanced things in the C language like bit fields where you can pack data into specific bits within your structure
00:12:02
Speaker
For example, I don't really want to go into the more detail, but the Rust compiler doesn't understand those. And there's like different platform specific nuance that's not so well defined there. And so it's almost like not an easy problem to solve. But thankfully,
00:12:17
Speaker
Rust and Python is now very much, I think, an expected use case as far as the Python core developers are thinking. So they're aware of things like that we need to consume the APIs that they're defining. And so they're trying to avoid using things like bit fields, which can be problematic for us. Okay. This isn't a side, but I want to know about that. It surprises me if Rust is on Python's radar.
00:12:44
Speaker
So there's possibly different ways that you might have come across this interaction already. Some of the most well-known examples where Rust and Python are interoperating
00:12:57
Speaker
There is my own work that I do, a company called Pydantic, which is a data validation library for Python. And since middle of last year, it launched its V2, which had the core written in Rust. And that was done for performance, and it was a notable point in the ecosystem. But there's other Rust projects out there, such as Polars, which is, if you've done data processing in Python, you very likely come across pandas and NumPy.
00:13:22
Speaker
And Polars is a new take on it, which is written in Rust. I think Pandas might be a mixture of Sython and C and C++ internally. I haven't looked at that code base myself. But Polars basically is taking advantage of some of Rust's ergonomics in combination with more modern thinking to do things, in some cases, a lot, lot faster than Pandas.
00:13:44
Speaker
And so it's kind of like the new kid on the block that's causing a lot of excitement. At this point, this new kid might be a bit unfair. I think it's approaching three or four years old, but it's growing very quickly and data engineers are doing a lot with it because it can process things faster and with lower memory than pandas can.
00:14:02
Speaker
And then a final big thing that's causing waves across the Python ecosystem is in the static analysis tooling. So, Python's had linters for a long, long time, very well-known ones like Flake and then Black for formatting. And there's a new company called Astral, which have built a tool called Rough.
00:14:23
Speaker
which can do linting and formatting significantly faster than anything that come before it and again is implemented using rust and so in this way there's a lot of adoption of these new powerful tools which are very fast which is causing a lot of excitement in the python ecosystem okay and in a way that isn't going to try and steal attention from bison.
00:14:43
Speaker
Yeah, exactly. I think that's something that I try and say quite prominently when I'm talking about this interrupt. My motivation for bringing Rust to Python isn't to take Python away from the Python programmers. I myself count myself as a Python programmer. It's more about the fact that I use the terminology Rust offers power and precision where you need it and where Python can't quite bring you that itself.
00:15:10
Speaker
Things like getting control over Python's garbage-collected language. It's interpreted. There's different performance characteristics which you can't necessarily control.

Rust's Role in Python Projects

00:15:19
Speaker
Whereas with Rust, you can compile those all the way. You can get access to specific hardware in different ways than you would be able to do in Python. Things like interoperability with C libraries in particular, to some extent C++, though Rust still needs work there. And you can do these things from Python, but it's maybe not so great. So you get a lot more control.
00:15:42
Speaker
And so I guess going back to that point in the ecosystem then, all of these things are beginning to move the needle on the Python world, if you like. And so if you actually count in terms of how much activity is going on here, what you end up with is that if you look at the Python package index, PyP, and how many of these native extension modules are being uploaded to it every month,
00:16:08
Speaker
So typically C and C++ has been the thing that's dominated this for years and years. Basically, it was primarily the only option. And I was looking for a talk I gave last month, actually, just the adoption rate of Rust. And obviously, back in 2015, didn't exist. And where we get to, I looked at April for the last five years. And what I found is that every April, the number of different projects that's uploaded Rust
00:16:37
Speaker
2 pi PE has doubled. And so with that doubling going on, we're now a quarter of the size of C and C++. And then in two years, we're about the same scale if that trend holds up. That's a big if. OK, we've now defined Hewitt's Law with Moore's Law. Well, let's see whether that holds up for anything more than another couple of years. You're not legally bound by that.
00:17:05
Speaker
But certainly, yeah, so it's kind of on that point now where a lot of people in the Python ecosystem are talking about Rust as a relevant factor. And a lot of that is powered by Py03, which I work on to do this bridge. Astral, building static analysis tools like rough, they aren't actually dependent on Py03 to build a code analyzer, but they still use some of our packaging tricks to basically ship that Rust code up onto PyPI.
00:17:35
Speaker
OK, we're going to need to talk about that. But OK, so so getting back to the hair of this, then, now that I understand the ecosystem context, from what you've described, I can start to see how I would sit in Rust and call Python because I could just treat it like a C object and I can call the interpreter and Python's functions from Rust. Yeah, that's exactly right.
00:18:02
Speaker
I still don't understand how I'm going to sit in Python land and call some Rust code. So, um, this trick works in the same way that any Python program has been doing for years, really, which is, uh, suppose that you type like import Numpy, for example, and Numpy we've already mentioned is a big blob of compiled code that goes off and talks, I think even to Fortran, for example, for like battle tested scientific routines, right?
00:18:29
Speaker
But there's some C in there to marshal it all around and other compiled bits. And so what really happens, Python interpreter, when it's importing stuff, normally you think of it as going and importing from a Python file. But if you have a native compiled library with the right name and the Python interpreter is looking in the right directory and sees this file,
00:18:52
Speaker
it can try and import from it. And what does that mean? We talked about the C-A-B-I and how there's a specific memory layout and structure to how these APIs are supposed to behave. And the Python interpreter actually goes and looks for a specifically named function within that library. So if you're importing NumPy, it's called something like pyinit-numpy. It will go look inside that native module for it.
00:19:19
Speaker
And then so having done that, it will get this function and it will just go, cool, go run that. And what happens is that this pyinit numpy function, say, will go and use the other way around that we already talked about, calling back into the Python interpreter to manipulate things. In this case, it's creating a module object is the final result of running this function.
00:19:46
Speaker
And so that module object from the result of that function will contain a bunch of functions, a bunch of classes, which have all been created using this C API to build it up as we run this pyinit numpy function. And so the net result, you get a module back, and it looks a lot like you just imported a Python file. You get a module at the end of importing a Python file. But really, that was done by native code building up this module procedurally using the Python interpreters API.
00:20:15
Speaker
Right, right. Yeah, so it's manipulating the, to check I've got this right, it's manipulating the interpreter to insert functions that look like Python shaped C but are actually Rust.
00:20:30
Speaker
Yes, so that's exactly the hot. Once you've kind of got the idea of how something like NumPy does it, Rust does it in exactly the same way. The Python structure, as I say, it's designed for this use case. And so these

Compile-Time Metaprogramming in Rust

00:20:43
Speaker
APIs are designed to support putting function pointers into them. And Rust can put function pointers in that are shaped like C function pointers, even though they came from Rust, because Rust can speak C's ABI.
00:20:55
Speaker
Right. This suddenly brings into crystal clear focus why there needs to be a library for it, because it's all about making sure that all the specs line up. Yeah. So that's exactly what I kind of view the role of Py03 as the library that I work on to do, right? Is you want to somehow define all of these functions and classes that you want to give a Python programmer when they import your library.
00:21:22
Speaker
And that is a whole bunch of, in Rust's terminology, unsafe C operations. The Rust compiler can't reason about what's going on over in the interpreter. And so there's a lot of invariants that you've got to uphold, you've got to lay out all of the information into these structures correctly as you're building up your module.
00:21:44
Speaker
And so the goal is that you as a Rust programmer can just think about your logic and not about how the nuts and bolts are supposed to fit together at this very low level layer. Yeah. So take me through some of that because the whole, one of the big motivations for using Rust is certain guarantees about memory safety and ownership. And that all gets thrown out the window when you're calling Python shaped C, right? So what do you do? Do you just,
00:22:13
Speaker
It's got to be something more sophisticated than we just disable the borrow checker. Yeah, so disabling the borrow checker and inverted quotes. In Rust's notion what you do, if you were to try and do that, you basically go to pointers and you forget the idea of Rust's references and lifetimes, and you would write everything in so-called unsafe Rust. And so, yeah, that's not the goal.
00:22:43
Speaker
Instead, I guess the first step is to think about how this interface fits together. Rust has functions and Rust has structs. PIO3 approximately maps these into Python functions and Python structs. It's not every Rust function and every Rust struct, but in Rust terminology, we have what we call procedural macros. They can basically generate functionality for you.
00:23:09
Speaker
It's maybe a technical detail we can go into a bit later if we like. But for all intents and purposes, you can think of them like Python decorators, maybe. So in Python, you can use a decorator to put a little, like you can add my decorator on. What's a nice example? Building maybe...
00:23:33
Speaker
in python context managers right there's an act context manager decorator that you can do you can use it to build some advanced functionality from python there or data class is a really nice one actually that maybe is a bit more close to what pie three is doing so the python standard library has data classes you can write that data class.
00:23:50
Speaker
and then class my class or whatever. And that data class can go and generate things like your Python constructor, your Python ordering, or equality operators and stuff like that. So just it's like conveniences. And so you can think of Py03 as working in the same way. You have your Rust struct, and you mark it with, you call it the PyClass procedural macro.
00:24:10
Speaker
And by doing that pi three goes away and generates you enough code to also define a python object which has got enough space in it to hold your rust data. The python type object that you will give to the python interpreter for people to reason about this type from python.
00:24:31
Speaker
So, you know, literally if you write class foo or whatever in Python, you get a type object there foo, which you can do things with. And so pyclass on some struct foo in Rust, we create a foo type object for you that is what's going to go into your module that eventually goes back over to, you know, the py interpreter after importing. And so we can go then, yeah, in this procedural macro, we go and do all the plumbing for you that makes this all fit together so that you have that type.
00:24:58
Speaker
And in the same way, you can mark functions and we'll go and code gen for those the same sort of thing to create a function object that you can hand over to Python in the same way. Right. So lots of compile time metaprogramming to turn my to make my rust shaped functions and objects actually into Python shaped ones.
00:25:18
Speaker
Yes. And so then the goal is that you can use these from both worlds. So you've written what looks like a normal Rust function, and it takes maybe normal Rust types. It could take types that represent Python objects. Py03's API has types that represent Python objects, but you could also just take like, I don't know, numbers or strings or whatever and not think about Python at all, but except for the fact you marked it with py function on the top. And then inside that function body, you can do whatever Rust stuff you want,
00:25:49
Speaker
And then at the end, you return some data and pyo3 will deal with mapping that data back into Python data inside the generator code for you. And so in that way, circling back a bit, you don't have to turn off Rust's borrow checker or any of these kinds of notions because what's going on inside that block is just ordinary Rust code. It can call off into any other piece of the Rust ecosystem that you want to use.
00:26:14
Speaker
and you get all of Rust's niceties inside of that code. And we've kind of dealt with that messy mapping layer, if you like. Okay. Yeah. That makes sense to me up until the point that in my Rust code, I call some Python native function and it returns an object that could presumably have other Python things referencing it.
00:26:39
Speaker
So now I've got an unmemory managed Python thing in my hand, which feels like a bomb. Right. Because specifically talking about memory management, because, um,
00:26:53
Speaker
Basically, Python's just sent a pipe in sort of the detail of what's going on. Python's just given over a pointer to this memory, right? That's what this Py object is. And it's just like, here you go. You can work with this. Yes. And so there's a few different things that we can do there to make that into what we call a safe API on the Rust side. And so first off, what's the actual responsibilities of what we have to do with this pointer? Well, we have
00:27:21
Speaker
to do reference counting on it to behave properly. And more importantly, the Python interpreter also has a global interpreter lock at the moment. And you're only allowed to do reference counting when you hold the global interpreter lock.
00:27:38
Speaker
And so in PyO3's role, we have to somehow build a type, which is reference counted, which is completely doable. Like Rust's standard library has some reference counted primitives. So, you know, this is a recognized pattern.
00:27:56
Speaker
There's one for single-threaded programming called RC, just reference-counted, and then there's actually an atomic reference count, or ARC, which you can use for multi-threaded programming. And so in PiO3's world, we actually come up with two reference-counted pointers in the same way. We have one that's just called Pi, which is kind of like an obvious one. It's a Pi reference count. And then we have one that's called Bound, which is just a Pi that's also bound to the global interpreter lock's lifetime.
00:28:24
Speaker
Right. And that's sort of like a fundamental trick that we can do in Py03 because Rust has this idea of lifetimes and we can give a lifetime that describes how long we know we're holding the global interpreter lock. And that allows us to just assume that we can do a lot of stuff safely because we're holding that lock. Okay. Yeah. Using the same, possibly not great for the long-term trick that Python uses. Just stop the world and I own everything.
00:28:53
Speaker
Yeah, well, we have to do that because that's how Python wants us to think about it. And so, yeah, they're talking about removal of the global interpreter lock. And then there's still invariants, actually, that we have to uphold when we're calling Python's reference counting APIs. They're a little different, but there's still, for example, in the first
00:29:10
Speaker
case when we remove this lock, the garbage collector, we are still not allowed to do reference counting operations when Python's garbage collector's running. So there's still a stop the world point there. So Py3's APIs actually don't need to change too much for that. We can talk more about that later, maybe.
00:29:26
Speaker
But yeah, so rewinding a little bit, what we end up with is Python gave us this pointer, which was, you know, what else do you do with this? You say it's like a bomb waiting to go off. And the reality is, is that what PiO3's job is, is to go, okay, cool. So we know that we hold the global interpreter lock right now, because we just called a Python API. So pretty much that guarantees that's true. And yeah, we thread this through Python or through Rust code using a lifetime.
00:29:51
Speaker
And then we can wrap it up in something that's got reference counting on it so that you're not going to leak memory or use memory after free.
00:30:01
Speaker
And

Memory Management in PyO3

00:30:02
Speaker
then finally, because this is Rust and we have methods, like normal kind of object-oriented kind of feeling behavior, we can then give you the opportunity. So with this currently probably object of unknown type, you can do things like Python type checks on it to cast it to like a Python list object, not physically changing it, but go work with Rust type system to go, OK, I want to check that this is a list. Cool. Now I know it's a list. Cool. Now I can start doing listy things to it, like indexing it or pushing to it.
00:30:31
Speaker
or whatever. We have all those APIs built into Py3 for you to do these kinds of operations. I think I'm getting that. This is almost a footnote, but just to make sure we all understand.
00:30:46
Speaker
Remind me about Python's lifetime. Sorry, not Python's lifetime, Rust's lifetimes. Okay. Yeah. Yeah. Yeah. So this is, um, borrow checker and lifetimes, I guess is a big, um, thing that Rust is like known for is some people call it like the most painful thing about Rust. And in some ways it's the thing that's most different about Rust compared to a lot of languages. Um, conceptually.
00:31:10
Speaker
a lifetime, as I think of it, boils down to this. It's like you have something in your program that you know needs to be true for some length of time.
00:31:26
Speaker
is the most abstract we could possibly go. And then let's try and boil that down a little bit. So a really classic example of this is allocating memory or referencing memory. If you're a C programmer, you might call new to put some memory in your computer or allocate to your memory in your computer somewhere. You know how big that memory is. And you can safely go and access that memory until you also need to call, I think it's delete if I recall my C days correctly. Is it free?
00:31:54
Speaker
free, maybe. Yes, that sounds more correct. You need to go free that memory later. After that point, you can't go safely access that memory anymore. It's now your use after free if you did that, and that's potentially a security vulnerability depending on what else is writing to that location.
00:32:15
Speaker
Another way of thinking about a rust's lifetime is it is the span of code pretty much between that new and that free. You can describe with a lifetime. So it's very commonly like ascribed a syntactic meaning like that. And you can get more fine grained than this, right? So typically a lifetime is involved in borrowing some resource from something else. So if you're you've got
00:32:43
Speaker
of say, a Rust vec, like a Python list, and you're reading the fifth element from it. So for that duration, you're reading the fifth element, you might have a, there's a function called get on a Rust vec, you pass it an index, and that borrows the vec, and it will give you a reference back to that fifth element. And you have a Rust lifetime there, which started when you called get, and will end somewhere later when you're not reading that fifth element anymore.
00:33:14
Speaker
Right. And then there's syntax to describe this. It's like a single quote and then a name of the lifetime. And so the signature forget on a vector will have this basically borrowed lifetime on the return value to say, hey, I'm tied to me self is the this argument, the value of the method. And so
00:33:40
Speaker
In this way, lifetimes allow you to reason about memory and other invariants correctly. So in our case, we hold a lock. We can say we hold a lock for this duration of code. And from that point where we start the lock, we can borrow from the lock. We create a lifetime that represents the global interpreter lock's lifetime.
00:34:01
Speaker
And so we call that a quote pi, or I think the language designers used the phrase tick. So tick pi would be how you'd read this. And you see it a lot in PiO3 code, this tick pi lifetime creeping about the place.
00:34:13
Speaker
And so I think because it's a syntax that no one's ever seen from any other language, combined with the fact that Rust brings you this idea of you can have either sharing or exclusive usability, but not both at the same time. So if you've got any readers that aren't you, you're not allowed to be a writer to that data. And that forces you to rethink about programs.
00:34:38
Speaker
and it forces you to then immediately confront the borrow checker in these lifetimes and that idea. It kind of creates this effect where new syntax and new concept, and it kind of goes, people go, ooh. And there's a common phrasing, fighting the borrow checker going through this phase. But once you've gotten through it, I find it's actually a really effective tool to reason about how programs are built.
00:35:01
Speaker
So maybe it's Stockholm Syndrome at this point when I'm too far gone. But I'm quite comfortable with the idea of a lifetime now. Yeah, it makes sense that the whole aim here is to do something like saying, I think I've got exclusive access to Python's global interpreter lock. If I teach the compiler that I think that, the compiler can check when I'm wrong. Yes.
00:35:29
Speaker
Yes, let's see if we can find a nice way to describe that. So what we do in PyO3's world is we have
00:35:46
Speaker
Like I said already, we have two smart pointers, one called pi, which just basically holds a Python object, but it doesn't have this tick pi lifetime on it to say I'm affiliated with the global interpreter lock. Like this is just an object and I'm holding it and I don't know, I'm maybe not allowed to do anything with it. But then you can also, we have the global interpreter lock, we have a token, which we call Python, and that has the tick pi lifetime on it. And by either using that token explicitly,
00:36:15
Speaker
or by combining it with a PI smart pointer to make one of these bound ones, which we already talked about or mentioned earlier, then the bound smart pointer does have a tick PI lifetime on it. In Rust's type system, it just knows that you can't have constructed that thing unless you also had the global interpreter lock Python token around. By construction, you can't actually do anything invalid in that sense.
00:36:42
Speaker
And so it forces you to be a bit more rigorous about how you're thinking about your program, because you have to go, right, am I attached to the Python global interpreter lock or not? How do I prove it? But I guess the reason I'm poking on this is because you use the phrasing like the compiler can check that you're not. And it's not like a runtime check where the compiler is inserting, like, is the global interpreter lock held or not? It's pretty much by the way that you've arranged your program,
00:37:08
Speaker
It's guaranteed and so it costs you nothing actually at runtime. Yeah, all the checking is happening at compile time. Yeah, exactly.
00:37:23
Speaker
This is something that I really like. It's reminded me of Haskell, the idea that there's lots that we can check at compile time usefully. And it initially feels like a pain because you're fighting the compiler until one day it all drops into place and you realize what you're actually doing is getting all your problems up front. Yes. And that's a common feature of what Rust programmers will tell you, that the compiler typically makes you work for building a Rust program. But when you do,
00:37:52
Speaker
almost inevitably not entirely true, but a lot of the time you will end up just having the result you wanted with a very minimal debugging experience because it tends to be that you've actually built by the compiler and also library design choices, making you think about things like error cases and how your program is structured up front. It tends to just work quite well. Yeah. You're going to get the errors either way. Do you want them sooner or later? Yeah, precisely. Yeah.
00:38:20
Speaker
Okay. So my head is spinning a little bit, but nevertheless, I want to go deeper. Okay. So we've talked about some of the ways in which Python and Rust don't quite match, like memory management is one you've got to work at. But

Error Handling: Rust vs Python

00:38:38
Speaker
I know there are some others that come up. So do we want to talk about, what do we want to talk about first?
00:38:45
Speaker
Errors, error handling, does that get thorny? Do I call my, what I think is a Python library and get Rust errors thrown at my face?
00:38:53
Speaker
Yeah, so maybe that's a nice way to also dial back the technical complexity just for a moment at the same time. So that is something that I really love about Rust actually is the error handling. It kind of brings you back to that same conversation just a second ago about how the compiler makes you work for it. It tends to work well. One of the reasons why is the Rust's take on error handling
00:39:21
Speaker
And so if people have done any Go programming, they might have come across this idea that Go programs typically, I think, return errors.
00:39:31
Speaker
in the function return value. In Go syntax, I think you typically get a pair or a tuple of both the success value and an error, and you have to check that the error is not nil and all of this. Rust is leaning towards that same kind of strategy of error handling. If your function can fail, it will return a result.
00:39:53
Speaker
And then that result might contain an OK, which contains a success result, a successful value from your function call, or it might contain an error. And how this is working is built on a whole feature of Rust standard library called some types or enums and match pattern matching, which is
00:40:16
Speaker
a really, really nice feature of Rust that I just haven't seen reproduced to the same level in any other language. Python has pattern matching. It's nowhere near as amazing. What you end up with in Rust is when you're doing this error handling, you basically are confronted all the time with you can't actually access
00:40:36
Speaker
the value from a successful computation in this result until you decide what to do with the result in general. And the simplest thing you can do is Rust has a question mark operator. And what that does basically is if you're in a function that is returning a result as its own error or return value, then you can also put the question mark onto other results. And if they're errors, it will immediately just pass the error straight out and you don't even have to think.
00:41:05
Speaker
So then you end up with something that's a little bit more like what people might see as exceptions from other languages where errors just fly up your call stacks and you don't even see them. But you've had to deliberately mark them in already with a question mark.
00:41:19
Speaker
And sometimes you're calling functions where you know what you'd like to do if there's an error here. And so then you can start using either the match keyword and unpicking the result and doing more specific things, or you might transform the error using those different functions on result to basically do all sorts of operations with these errors. And in some way, you'll either decide to recover from the error perhaps and just keep going anyway. Maybe you ignore the error because you don't really care.
00:41:47
Speaker
But Rust is making you make that decision right up front. Right. And so in that way, yeah, you end up with this nice sort of position where when you finally compile your code, you've had to think about what the error conditions were. And that's a bit different to say something like Python, where typically the first time you write your code, you don't think about any of the possible error conditions. And then you write your unit tests and you realize that there are conditions all over the place you have to handle.
00:42:13
Speaker
How does this marry up with Python's story? Python has exceptions as the primary mechanism for error handling.
00:42:25
Speaker
What we really need to think about is two different things. One is what happens when we call into Python, and Python gives us an exception back. And then also what happens when Python calls into a PyO3 mapped function, if you like, and then that function wants to fail. So let's start with a second. So you're a Rust programmer who's written a function that could fail.
00:42:49
Speaker
So in the same way, your return value will be not just like an integer, but it'll be a result of an integer and presumably some error like structure that I've annotated with make this a Python exception.
00:43:04
Speaker
So there's different choices there. So Rust has pretty good, in Rust's terminology, we call these traits. They're kind of behavior that you attach to things. Different analogies in other languages would be Python has protocols or Java has interfaces, which kind of feel the same sort of idea. A trait describes behavior or something. And there's a trait for conversion called into.
00:43:30
Speaker
And so if your error type that could be in your result is a type which has a conversion to PyAir, which is PyO3's definition of a Python exception, then you can have whatever custom type you want in there.
00:43:48
Speaker
And as long as that conversion holds, when an error comes up through the py3 layer, we'll turn that into a Python exception object for you. And it could, of course, just be an error. You could have had a py error from the beginning. So you might have already been carrying a Python exception object, and then we don't need to do any conversion. It just goes up into Python.
00:44:08
Speaker
And there's python APIs that we call inside that generated layer right to tell the interpreter hey an exception is now set you need to raise this and continue like I'm winding like a normal exception would write in the rust layer yeah it's just an object which is in the error variant of your result and you can pass it around in rust sways.
00:44:29
Speaker
Okay, so to check I've understood that I'm just, as a Rust programmer, I'm writing a function that returns a result as normal, and I just make sure the error type implements this, turn me into a Python error interface. Yes, exactly. Gotcha. And then that helps us loop back round to what happens the other way around. If you call a Python error a function from Rust, and that raises an exception.
00:44:53
Speaker
Well pi three again translates this so that if you get a python exception when it gets to you in the rust code we take that python exception object and we stick it into an error variant of the result.
00:45:06
Speaker
So then it's suddenly into Rust's error handling world and you get the choice on whether you want to match it or you want to use question mark to just throw it back up to the next person to deal with. So in that way, it can cross through Rust code in Rust error handling ways and then might pop out the top back into Python as an exception even.
00:45:26
Speaker
depending on what your module is doing. Right, right. OK, that makes sense. I feel I should know the answer to this, but you're turning out to be a really good Rust teacher too, so I'm going to ask anyway. Does Rust force you to check the error somehow, either with a question mark or with pattern matching?
00:45:45
Speaker
Yes, so I think there's probably, if you wanted to, so it's different functions on results. Results is a type and it's distinct from, it's a generic type and it contains, say, a success value, result I32 might be a result containing a 32-bit integer. And you can't actually access the I32 value because it's a different type.
00:46:10
Speaker
So you have to do something and the most simple thing you could do. We've already talked about the question mark operator, but that requires you to yourself being an error handling.
00:46:22
Speaker
mode, like your function has to also have a result return value. So you've already had to do a little bit of thinking about error handling. You can also just do a function called dot unwrap, which will take your result. And if it's a successful result, it will give you the underlying value. And if it's an error, and this is actually maybe an interesting extra piece of rust error handling, it will do what's called a panic.
00:46:47
Speaker
And basically it stops your program. This now, a panic feels a lot more like an exception in other languages. It kind of goes out of Rust's normal control flow and starts exiting your functions one by one, cleaning up anything in them. And a panic is kind of intended to be a graceful shutdown.
00:47:09
Speaker
The use for it in Rust is somehow developer got logic wrong, probably not safe to proceed, let's shut down. It's not meant for bad data. It's like a more aggressive kind of assertion failed sort of situation. Crashes gracefully as you can. Yeah. And so Rust has a compile time switch even to upgrade a panic to an abort, if you like. So it's really, it's meant to be this like terminal thing and it's just about how cleanly can you terminate.
00:47:39
Speaker
Right. And so I guess just closing off the story there.

Handling Rust's Panics in Python

00:47:45
Speaker
There's two things here. Let's first finish talking about the result, and then let's go back into talking about how panic interacts with Python, because that's quite fun too. And so you've got your result, and you have these different choices on it. You can use a question mark to get the value up. You can just call dot unwrap. And then, yeah, you have like a bomb waiting to happen. It's generally not a good thing in library code to unwrap. But in your application, maybe you've got a good justification for you know that the error will never happen there.
00:48:15
Speaker
or you can use the match statement which is a very powerful syntax where you can have you can explicitly handle the okay case and do what you like with it or you can handle the error case and you can transform the error or choose to panic yourself or do anything you like ignore the error.
00:48:34
Speaker
I guess you could even, Rust has on result one more method, unwrap unchecked, which is unsafe. Then you can really, if you want to just have no safety at all, just pretend that nothing went wrong. Of course, in that case, if the result is an error, you're actually reading an invalid memory. That's an unsafe function. This is a good example of how Rust lets you take off all the safety valves if you want,
00:48:59
Speaker
But generally, you don't use that method. That's only in extreme situations where the performance matters so much that you don't want to check your return value. And you're basically certain that it's not going to be an error. Because if it is an error there, you've hit basically a problem. Yeah, yeah, OK. So it has mechanisms for you to shoot yourself in the foot as much as you like.
00:49:21
Speaker
Yeah, Unsafe Rust is in some sense, it should be as like, you can think of it as like C. You haven't really got the same level of guarantees at all. There's still a lot of ergonomics that it wins you, but you're turning off a lot of those basically sounds checks.
00:49:36
Speaker
And so it's something that you use, I guess, with good justification. And I think generally I would even argue that performance isn't good justification unless, you know, that one checking whether the error is like an actually an error value or not is going to somehow break your program. And I think in this case, like it's very unlikely that your performance overhead is going to be so critical.
00:50:00
Speaker
that checking for that being a success is going to a problem. So I would say unwrap unchecked is like in the category of never use this function unless you really, really, really, really, really have to rather than I'd like my program to go a little bit faster. I don't think that counts. And as always with performance, measure it first, right?
00:50:19
Speaker
Yeah, exactly. Come up with justifications rather than just freely going, I'm going to upgrade every single function I have. You know, you could do the same way for a Rust vector. You can go from get to get unchecked and just pretend that the value is within bounds of your list, if you like. But, you know, if you have only five elements in there and you get unchecked the 10th, again, you're in a bad space. So don't do that. Yeah. And it comes back to that idea that we were talking about that. Are you fundamentally the kind of person that wants you areas upfront?
00:50:48
Speaker
Yeah. And I think in general, the answer is yes. Rust wants you to think in that way. And it makes it, like I say, really easy and ergonomic to do it with this result type. And there's also a corresponding one option for where stuff might be present, but might not be, but there isn't really a good error to describe it. So get on a list, for example, returns an option actually, because it's telling you if you're out of bounds, it would just give you nothing. Yeah, that makes sense.
00:51:13
Speaker
So yeah, and then, so I wanted to talk about Rust's panics because that's kind of cleansing off this chapter of error handling. And they're sort of meant to be a graceful shutdown. And so in some sense, when you panic, you can capture panic. There is a, there is a Rust function for this in the standard library, but you don't really want to do this because you're kind of interfering with the graceful shutdown.
00:51:38
Speaker
Now, in the case of going through Python's interpreter, we kind of have to catch a panic because the panic is a Rust construct that the Python interpreter can't reason about. And if we let the machine continue to be going and unwinding function stacks as a panic would, then when it crosses over into the Python interpreter, we've basically got no guarantees about how that should even behave. That's not just undefined behavior of the worst sense.
00:52:07
Speaker
Okay. I, I can think of a place in code I'm using at the moment where this might apply. So I'm using Python with syntax to get a database handle and make sure that when that block exits, the database handles released correctly, which is important to me. And this is what we're talking about. If I called a rust function in that block and it blew up, I want to make sure that that with is closed off properly. Right.
00:52:32
Speaker
Yes, as part of your shutdown. It's not quite just as simple as making sure your block is closed off properly because I think the C interpreter is not really meant to handle this panic at all. It's more like on a technical level, we don't want to get there. But then, yes, what we can do is we can turn it into a base exception when it gets to Python's layer.
00:53:00
Speaker
And if you've dealt with Python's error handling, basically a base exception is a very special type of exception, which is not used for like normal logical flow. So you typically don't catch them. Like if you're doing stuff like a control C keyboard interrupt, so like shut down a program, that is itself a base exception rather than normal exception.
00:53:24
Speaker
And so it will unwind your Python program in the same way, but it will actually like basically get to the top of your program and quit. And so we turn a panic into the same sort of idea and it gives your Python program a chance to shut down, but we kind of force you to quit.
00:53:39
Speaker
Right, yep. Okay, what if we're weaving back and forth though? Because if I throw a panic in Rust, which goes across to Python, but comes back across to Rust, which must happen? Are you also converting base exceptions back into panic? Yeah, we have to, right? Because otherwise,
00:54:02
Speaker
I spoke before about how we don't really want to catch panics and like, uh, recover from them like you can, but it's not really what you want to do. Whereas, um, if we did what we were talking about before, without handling where rust gets the Python exception object goes, ah, cool. I'll like now just put this into a result and let the cooler decide what to do with it.
00:54:22
Speaker
we don't really want to give them that chance with a panic. We want to go, OK, it's a panic. And then there is another function in the Rust standard library to actually resume unwinding from a panic. So we can actually go back the other way. And so, yeah, pyo3 at that layer has to first go, oh, we're going to panic. Let's stop here. We'll turn that into a Python base exception over to Python. If that should cross back into Rust again,
00:54:44
Speaker
It's a particular type of base exception which we subclass called a panic exception so we can see that and we go oh yeah we're panicking here we better keep panicking. And then we we allow us to continue on winding and so yes you can have a bit of a struggle between the two languages as it goes back and forth.
00:55:03
Speaker
and panics and then switches to an exception, then panics a bit more and then switches back to an exception. Eventually, you'll get to the top, probably in Python, not necessarily. You can have a Rust starting point for your program too, and then we'll shut down.
00:55:18
Speaker
OK, I can see how that works. A bit of a dance, but it's only really a dance with two steps. A dance with two steps. And it's sort of like I say, panicking in Rust is meant to be like it's a developer error and something is logically wrong. So it's kind of a last resort. You know, if people are seeing panics, we say go report it to the library that panicked because it shouldn't be panicking there. Probably
00:55:43
Speaker
Either it was a reasonable thing for a user to do, and the library should be giving a proper error out, not panicking. Or the library actually has a genuine logical bug and needs a bit of rethinking internally. One of the two. But either way, it should be classed as a bug on the library that panicked. Right. Yeah. OK. That makes perfect sense to me. There's one big thorny topic we're going to get into. But before we get there, you've just made me think.
00:56:10
Speaker
PIO3 is doing things like checking when this function comes back, did it return a panic exception? Makes me wonder about performance. How much is PIO3 as a binding layer, like handling and slowing things down while it does its own set of ceremony? Yeah. So that's an interesting one to come into. And I think, um,
00:56:35
Speaker
Ultimately, Py03 and Rust fall into the same category of C and C++ in terms of performance. They are compiled languages that come down to what we could term zero cost abstractions. Rust has quite nice higher level interfaces, which in theory can compile away to cost you nothing at runtime, which is very, very cool. And the only thing that makes Rust a little bit different from C and C++ is it has these extra safety guarantees on top of it.
00:57:04
Speaker
And so when it comes to PiO3 side, my ethos on how performance should be is that basically for anything like reasonable operation, we should assume that we're able to get pretty much the same sort of characteristic of performance and behave more or less equivalently to C and C++. And it's only really
00:57:26
Speaker
in these extra cases of we need a little bit extra safety that we have to, I think it's correct to sacrifice a little bit of performance to get the safety ethos correct. In some sense, a C and C++ program should be doing that anyway, but probably just wasn't thinking about it. Granted Rust, again, with things like the Borrow Checker, maybe is a little bit stricter than C and C++ ever would be.
00:57:50
Speaker
But it's a price that's necessary to pay. And so how does this clock out? We've been working over the last few Pyre 3 releases to optimize some of these pieces away. And it's things like where if you'd looked at Pyre 3 a couple of maybe a year ago, and compared like calling a Python function versus calling a Pyre 3 function, there's these steps on the boundary that do create overheads.
00:58:16
Speaker
mean that we ended up a bit slower than calling a normal Python function. But then once you've crossed over into the Rust code, that Rust code could be highly performant. So supposing your function is complicated enough, you actually have a net win still.
00:58:28
Speaker
But whereas the further you, with more recent Pi 3 versions, we've been working to basically take more of these things where we were doing these safety checks and either understand whether they can be proven instead like at compile time or by construction, and therefore remove the need for them, or otherwise just other optimizations that C and C++ have had a long time to do in that ecosystem. For example,
00:58:55
Speaker
Sython is a Python project that allows you to build a C-like syntax and generate native modules for you. That does optimizations all over the place that assume all sorts of details about the Python interpreter. And that's both a function of age and also a function of it reaches into the interpreter maybe more than Rust would ever want to.
00:59:15
Speaker
But we're slowly understanding what sort of optimizations that on that kind of category, how we bring them across to Pyre 3 and what we can add to. So we're kind of like on the function boundary overhead. I would say we're now more competitive than we were. I think Sython is probably still the fastest. In the next Pyre 3 release, we will be faster than a pure Python function, which is an understandable place to be. But again, while we have these overheads,
00:59:42
Speaker
The point is that once you've crossed into the Rust code, you then get that control. It's like you have to sacrifice a little bit on the boundary so that you can have the power later. It brings a design choice. It's not like a free lunch, but it's not like these operations will absolutely kill you straight away.
01:00:02
Speaker
Right. Yeah. But presumably a lot of the times when you're using this, you would be diving into Rust, do something which is quite expensive in Python. Yes. A lot of the times it's going to be a small number of calls to a very fast loop of some kind, something like that.
01:00:22
Speaker
Yeah, so that's exactly the kind of way that would make a lot of sense to build a PyO3 Rust package. So that's again, if we talk about the polars example, that's a really good example of this, right? You're dealing with data frames of data, two dimensional, big blocks of stuff.
01:00:39
Speaker
And the Polars API allows you to describe, hey, I want to take this column, I want to group by it or sum it or whatever. I want to filter based on this other column or create a new column of data by applying some function to something else. It's like higher level decisions that control what's going to happen. And then you basically hit go and it enters Rust and does a whole bunch of logic and computing.
01:01:04
Speaker
which is so fast that more than pays for the cost you spent basically doing the Python to Rust calls at the beginning to describe what you want.
01:01:13
Speaker
Yeah. Yeah. That makes sense. And also it also makes me think of like AI type of matrix multiplication. Yes. Very similar. Yeah. Yes. Okay. Okay. With performance thought about, I mean, I'm assuming this is going to be the thorniest topic in this podcast.

Async Programming: Rust and Python

01:01:31
Speaker
We need to talk about, um, asynchronicity and threading. Okay. Um.
01:01:37
Speaker
Because both Rust and Python have ideas about parallelism, asynchronous code, and they feel quite different.
01:01:52
Speaker
lots and lots of interesting details here. And so we'll start from the premise of, again, where I want PIO3 to be, which is it's meant to be this library that takes analogous concepts between the two languages and lets you speak them kind of coherently and conveniently, and ideally with as little performance overhead as possible.
01:02:17
Speaker
Async is the confluence of how hard can you possibly make this premise. Let's start by talking a little bit about how async works in each of the two languages, because I think that helps.
01:02:32
Speaker
give a sense of what the problem is and how complex it is. And also, I'll just sort of set up by saying in PIO3, there are different approaches to async that are being explored. And we have one inside PIO3 itself that we've marked with an experimental marker. And then we've also got an external library called PIO3 asyncio, which does some of this interoperability too.
01:02:55
Speaker
And so it is an area to be explored. I wouldn't say our support is perfect yet. There's a lot, a lot of it can still be improved. Um, why is it hard? So, um, rust has, has come up the async, uh, story from the typical like rust mantra, which is we want to do everything with zero cost. And so we, we build an abstraction, which defines, um,
01:03:22
Speaker
how to think about asynchronous programming. And there is a trait called a future, which describes some computation that's going to happen later. It has an output value and all of this kind of machinery. And then the standard library is deliberately super lean. It doesn't even define an async runtime. So at the moment in Rust's async world, the standard library basically has the async fun syntax for defining an asynchronous function. And that's about it.
01:03:52
Speaker
It has notion of what a future is, but leaves it to the library ecosystem to decide what to do with that. There's lots of different ideas. One of the biggest ones that has taken force is a project called Tokyo. You may have heard of it. It's a Rust library that allows you to have an asynchronous runtime.
01:04:17
Speaker
which uses all your CPU cores and does what's called work stealing to pass work between CPU cores or threads or workers as and when whichever is busy and whichever is idle. So the theory being that you can maximally use your computer for
01:04:35
Speaker
basically an event loop running on everything at once. And it's trying to get as much performance as possible out. And Tokyo is very, very popular in the Rust ecosystem. And it has a lot of the higher level constructs that you would think about like tasks or how to deal with file systems or networking. It deals with all those kinds of things. And then web servers are built on top of Tokyo, typically, to allow you to get to the next step.
01:05:00
Speaker
And so the key point there is that Rust is allowing you to write these futures and it has the sort of the basic syntax as a bit of building block. Everything else is deferred to the ecosystem and the ecosystem has worked out that we can do the most complicated thing we can possibly do and just push data onto all the CPU cores. And the cool thing about Rust is that because of the standard libraries, not just the future trait, but also one called send, it allows you to describe when you can pass data between threads safely.
01:05:28
Speaker
you kind of end up with this ecosystem that works. Python's model is very, very different, right? So in Python, first we need to talk about Python's dynamicness. Instead, Python objects basically can implement any behavior that they like by implementing these magic double underscore methods, like dunder methods we call them. So you're probably familiar with basic ones like double underscore stir to control how a Python object is rendered when you print it.
01:05:58
Speaker
But there is also the asynchronous behavior in Python is kind of guided the same way. You have double underscore a weight, I think it is, which has, in some sense, this needs to behave as a future that you can await in Python syntax.
01:06:20
Speaker
I forget the exact mechanics of it because it's super-duper complicated, but basically like the double underscore a weight method returns another Python object, which I think is iterable, if I recall correctly. And the idea is that when it stops, every time you iterate it, you're kind of driving your Python async stuff forward, whatever that is, until eventually the iteration finishes, and then you've got your result of your future, if you like.
01:06:47
Speaker
Right. And so you've got all these Python objects getting created to describe your asynchronous calculation. And there's a lot of complexity there. And the Python interpreter has smart stuff that it knows about which async call stack are you on and all sorts of... There's a lot of stuff that's very complicated, just at the runtime level, not even at the
01:07:11
Speaker
Because that's like partly the type system, but then actually influences runtime a lot. Like you get these iterable objects and you have to do something with them. It's very different to Rust's like zero cost abstraction idea. And then similarly, the Python ecosystem has grown up around the global interpreter lock where threading doesn't really matter.
01:07:33
Speaker
And so if you're thinking about async in the Python world, the other big difference to Rust is that typically your async runtime runs on one thread. So that's how Python standard library asyncio works. It has these parameters for basically doing file system operations, networking in an event loop, in asyncio's event loop, which runs on a single thread.
01:08:00
Speaker
Maybe that sort of like begins to give a flavor of what the complexity is here. Somehow we've got to marry the idea that Rust is kind of happy for asynchronous work to cross CPU cores. And Python is very much not. But it actually goes further. Python has all this complexity around how asynchronous objects behave at runtime.
01:08:26
Speaker
And on top of that all, there's two or possibly more asynchronous runtimes in competition, right? Like Python has AsyncIO, and in Python, you can have alternative runtimes, right? There's one, I think it's called Trio, and you basically choose, but they work very, very similarly. And if you're putting tasks onto your AsyncIO runtime, like doing file system operations or whatever, it's AsyncIO, which knows how to resume your asynchronous work.
01:08:55
Speaker
later if something's ready. And the same is true on the Rust side. If you're running Tokyo and doing file system or networking work, it's Tokyo which knows how to continue when your task is ready. So not only that, but your runtimes then have to signal to each other somehow
01:09:14
Speaker
the tasks which were running in the other runtimes already. Yeah, it's like trying to run an operating system with two separate schedulers, right? Yes, exactly. That's very much a good analogy of the problem. And so, yeah, this is the possible problem space that we sit in. And it's been like a 10-minute summary of that, which gives you an idea of how painful it can be. So what's Py03 doing to try and make this more tractable?
01:09:41
Speaker
Well, first off, we have the Rust async fun syntax. And in the same way, when you write, when you decorate it with the py function annotation, then we want to go and generate you a Python function which should behave like a Python asynchronous function. So we have to go and under the hood create all of this like runtime behavior for an object that, you know, can be iterated to eventually produce the result.
01:10:14
Speaker
And that's sort of like in itself, we need to be a performance despite that being a runtime overhead.
01:10:21
Speaker
And then we've also got the possibility of what that object contains. It might contain a Python. Ultimately, it might contain a Python object or Python task, which needs to somehow interact with the Python runtime. Or it could contain a Rust task, which needs to interact with a Rust runtime. And so that's what Py3 in the main core hasn't really dealt with at all yet. And then Py3 async.io,
01:10:47
Speaker
is a project which allows you to basically have a Tokyo runtime and a Python runtime. And when you await a task on the Tokyo runtime, you can create a Python task which knows how to be woken up when the Rust task is woken up, for example.
01:11:05
Speaker
And so it deals with that bridging for you. So then as a programmer, you kind of have to acknowledge I have a Tokyo task here and I need to wrap it for Python, but then it will know that that can then be run on the Python asyncio event loop. Okay.
01:11:20
Speaker
And when it's done, you'll get like an asyncio result or whatever, but ultimately asyncio had to go ask Tokyo's event loop for some work to be driven there for you to get the result. And so, PyO3 asyncio has dealt with a lot of that complexity, and PyO3 maybe one day internally will deal with that instead.
01:11:41
Speaker
It was a long way to go. But yeah, if you're staying within just Python futures, there's like different tricks that we can play in pyo3 that we haven't done yet. Like if you await an asyncio task, we kind of want to just give you that result straight back out without having to create extra wrappers to avoid like the performance overheads and stuff like that. And so there's lots of optimizations yet to come as well as understanding what the most ergonomic APIs are. So yeah, that's
01:12:11
Speaker
It's a topic that people are actively working on. You know, there is at least one Python web server. I think there's one called Robin and I believe there are others which are built on top of PIO3 and Rust and use, I think, Tokyo internally for various bits and pieces. And so, you know, it's very, very possible. I would say that there's still early days on this. Okay. Is it, is it, are we talking like kind of alpha state? Is it, or is it that it's production ready, but hard to use or non-ideal?
01:12:39
Speaker
I think Robin is basically production ready. My instinct is that Robin's internals are probably a really hard fought battle between what Pi three gives Robin versus what Robin's had to figure out themselves. And so probably
01:12:58
Speaker
There's places where performance has had to be sacrificed a little bit because PIO3 has made their life hard because we haven't solved those fundamentals quite yet. Maybe or maybe not. I suspect there's more to be won from web servers like Robin in the future, but you could probably build a perfectly functional application on top of them. Unless you were trying to be a hyperscaler, which needed every last ounce of performance, it would probably be completely fine for you.
01:13:25
Speaker
Okay. I think that's probably as thorny as we want to go. I definitely see the problem and how it begins to be solved. There's a lot still to do, and there's about five PRs, I think, currently up on the PiO3 repository, bringing in bits of these ideas to make it a little bit better, but there's still a long way to go. On top of this, you have the work, because do we want to talk about the removal of the gill?
01:13:54
Speaker
Yeah, we can. I'm very excited about it. We can keep it, again, quite high level, if you like. Rust has these primitives, like we mentioned the send trait very briefly for allowing you to send work between CPU calls. There's also one called sync, which allows you to describe when you're allowed to access data, which might be also being accessed by other CPU calls, if you like.
01:14:22
Speaker
And the point is that Rust actually lets you reason quite a lot about concurrent programming using these building blocks. And so it makes Rust very good at multithreaded programming. And historically, Python has not been. Historically, Python hasn't done it, right? Well, I guess, to be honest, yes. You do write multithreaded programs in Python, and you do actually have to think about threading guarantees already today.
01:14:51
Speaker
But it's just you're less likely to encounter them because of the fact that Python programming is naturally single threaded. And so you don't tend to try and write your code to make use of multiple threads because why would you?
01:15:05
Speaker
So I'm actually quite excited that Rust is able to give quite a lot to, you know, the Python ecosystem in this like change, because it should be possible to write, uh, Rust modules, which perform really well under conditions where the global interpreter lock is removed, because we should be able in Py03 to help Rust programmers reason about that concurrency using basically the Rust standard library primitives. And so, um,
01:15:32
Speaker
The Pi 3's APIs will have to change a bit. I don't think actually too drastically to basically change the places where we ask you to guarantee that something is send or something is sync because we now have to deal with the possibility that the work can cross between threads potentially. But otherwise, I don't think it's like a huge fundamental shift for the Rust ecosystem built using with Pi 3.
01:16:01
Speaker
And it's more about just like refinement and leaning into what Rust is already really good at. I have to ask, what's your motivation here these days? Cause is it that you just like, I mean, these problems are really thorny and could be seen as like a real drag to some people. Is it that you like the problem solving? Is it that you like bringing some of Rust's power to Python world? What excites you about it?
01:16:32
Speaker
I think it's a mixture of different things. So first off, it was, I guess, a notion of it was a fun problem to be working on. And also, like I say, I was very passionate about Rust as a language. So the opportunity to use more Rust in that way as a hobbyist that became something more serious was really, really exciting. There's also kind of like
01:16:56
Speaker
I think now at this point, it's a building block that other people are working upon. It's so satisfying to see that be something that's smooth and effective for other people to work with.
01:17:12
Speaker
it looks like because people are enjoying working with Rust that this is going to be a building block that will be fairly major to the Python ecosystem. We spoke about the scale of the numbers earlier and how within a couple of years it will be as commonly used at all perhaps as the C and C++ options already are.
01:17:33
Speaker
And so to some extent, there's work that needs to happen there because the whole Python ecosystem is somewhat dependent on it. And I think it is, in my mind, a nicer option than using C or C++. I'm biased. But I prefer to work in raster than I do in those languages. And so by working on this tool, I'm hoping that it makes Python programmers more, I guess, productive.
01:18:02
Speaker
And it's not really about taking, again, back to earlier, taking Python away from the Python programmers. I am one. I write a lot of Python code, typically. My day job, it's either Python or Rust, depending on what fits the need best.
01:18:17
Speaker
And there's a lot of teams out there, say, which I think are in the same boat, where there's a lot of Python developers that don't necessarily want to or need to know Rust. But there is some notion that their code needs to be performing. And there are other members in the team who might be working on more fundamental building blocks in the same code base, where it makes sense for that to be built in a lower level language.
01:18:40
Speaker
I've seen that in lots of workplaces that I've been at, not just the Python C++ place that I described earlier in the podcast, but other jobs. Typically, there's been a few people who've been working on really the nuts and bolts, and they need to get the control that they need, and then other people who are building more business level logic where they don't need the same performance, but it's really helpful. All those building blocks are fast.
01:19:03
Speaker
And so I think it's quite a common pattern and it makes sense in my mind to make these tools as good as possible for these teams to function effectively.

Future of Rust and Python Interop

01:19:12
Speaker
Okay. So you are the great enabler in this field. I mean, I guess to some extent, yeah, it was definitely personally, I really wanted it to be. And then I think increasingly, you know, there are people who are needing this and it's important to make it happen.
01:19:27
Speaker
So one thing that I guess you could say keeps me up a little bit more at night is the interaction between Python's sub interpreters and Rust. And so this is, you'd think on the surface, like I've just talked about how Python is removing the global interpreter lock, allowing Python to be much more parallel and Rust is good at this. And then the sub interpreters is a very similar approach that Python's doing. I didn't know Python had sub interpreters.
01:19:57
Speaker
OK, so let's rewind just a touch there. So basically, Python in recent years has acknowledged that basically performance and parallelism are very, very important.
01:20:12
Speaker
to the future of the language. And so there are lots of projects going on actively at the moment to basically make this happen. And so Microsoft has been funding a team called Faster See Python, who are busy building things like a JIT compiler into the Python interpreter to make, you know, pure Python code run as fast as pure Python code can. But the other access, access, our access, our dimension to go on is parallelism.
01:20:38
Speaker
There are basically two different strategies that the Python ecosystem is going through. Maybe we should have summarized this a little bit before we went into the technical global interpreter lock idea. At the moment, Python is constrained by the global interpreter lock, which means a single thread is the only thing that can be running Python code for the whole process.
01:21:02
Speaker
And there are two different ways that you can get around this, right? You can either go, okay, cool, I'm going to have multiple global interpreter locks. Each then can be run in isolation and can get on and do work. Or I get rid of the lock and I allow stuff to run in parallel that way. And so that's where you get these two different sort of methodologies which are both being built. And so sub interpreters is leaning into that first idea. Then each sub interpreter
01:21:32
Speaker
can have its own set of Python objects and be doing its own work. And it still has a global interpreter lock, but that subinterpreter can then be running an asyncio runtime. And that's a single threaded thing that if you have four of those, you're using four CPU cores very, very effectively.
01:21:52
Speaker
How do they communicate with each other there? That's the goal. They got you, right? So they have to be isolated in the sense of you can't reference Python objects in another sub-interpreter because the synchronization doesn't work. The global interpreter lot protects you against not breaking your own objects, not other people's objects. So they have to basically rely on message passing at the most basic level.
01:22:20
Speaker
uh, Sarah.
01:22:23
Speaker
more advanced techniques being researched by the people working on subinterpreters to basically allow sharing subsets of Python objects that can't be modified or more advanced combinations like this. It's still very early days for that kind of advanced techniques. So what I think of it is like message passing with a few possible extra tricks. But it's a very, very strong constraint. You cannot have memory from one subinterpreter basically be modified by another subinterpreter.
01:22:56
Speaker
That's where it seems quite hard from a Rust perspective to reason about, because that's very different from how Rust currently thinks. Rust is good at the
01:23:07
Speaker
It's like the other approach where we take away the global interpreter lock and you let everything run freely and you have a few guarantees about ordering and synchronization, but for the most part, that's the same way that Rust thinks. Whereas the sub interpreters, it is, I have this piece of data and it's never allowed to cross
01:23:24
Speaker
not necessarily into another thread because a subinterpreter isn't really a thread, but to this other section of my program, other unit of execution. And Rust doesn't have an abstraction that describes a Python subinterpreter naturally. And so somehow
01:23:41
Speaker
If we want PIO3 and Rust modules to be compatible with subinterpreters, which I think is very desirable because we don't want to get in way with the adoption of them across the ecosystem, we need to understand, how do you stop people from passing data from one subinterpreter to another? And it's an unsafe operation to do this. And what's the implications of that for the whole PIO3 API? And that's something which, I guess,
01:24:09
Speaker
Motivationally, I feel like we need to deliver eventually, but technically that one, I find much more daunting than the global removal of the global encircular lock. Yeah. Yeah. Yeah. I can see that there is, I mean, it's the temptation in Rust, in writing a Rust program to design in such a way that you can model the guarantees in the compiler's language.
01:24:36
Speaker
Yeah, I would say more than what was the word you used desire, it's sort of like that's how the Rust ecosystem wants you to think that the promise should be that if you're writing what's turned safe Rust code and you're not using the unsafe keyword, that you shouldn't be able to do stuff which is fundamentally unsound.
01:24:58
Speaker
Yeah. I wonder if in trying to do the sub-interpreters approach, you'll hit some of the limits of what Rust can currently model. Yeah, it's a great question. I think exactly that, yes.
01:25:20
Speaker
I feel like I'm going to have to talk to Rust language designers and be like, hey, have you got any ideas how this could work? Because the ultimate goal, really, for the Rust programs is that you have all this safety, but you don't pay for it. And we're talking about the zero cost abstractions. And now there's an obvious way that you can make everything safe, which is every time you do an operation in Py03,
01:25:49
Speaker
where you don't know if an object might have crossed between sub-interpreters, you do know which sub-interpreter an object belongs to, because objects can be linked back to an interpreter ID eventually. There is a chain of data structures that you can follow to get there. So you could just introduce runtime checks everywhere that objects haven't crossed to the wrong interpreter, that you're going to pay for a huge slowdown to do that.
01:26:20
Speaker
That seems like the nuclear option rather than the option that you actually want to follow. Yeah, it would work, but it's not very rusty. Yes. And so the game instead becomes how can Rust's type system be leveraged to basically, probably you would want to have that check somewhere, but remove that check from as many places as possible would be the goal.
01:26:43
Speaker
Yeah, that feels like one of those places where it's quite possible there is no current good answer and someone might get a PhD in compiler design out of it. Yes, that and then hopefully useful conversations from both the Rust and Python language designers really about, you know, if there are
01:27:08
Speaker
strong assertions from the Python side as well about when our objects are allowed to cross sub-interpreters or not, then that can help us potentially simplify the problem. The more advanced techniques they come up with for sharing these objects, the harder that problem becomes, but hopefully there's a relatively formal way of describing it at the end of the day.

Writing Fast Rust for Python: Tips and Tools

01:27:31
Speaker
I hope so, especially if you're getting while the design is still being firmed up. Yes. Okay. Okay. Dialing it all the way back to something much simpler. If I just want to write some fast Rust functions and call them from Python, how do I get started? Okay. Yeah.
01:27:50
Speaker
This links again back to my experiences of having tried with C and C++ several times. I've used it. I've done it a lot in production. I also started as a hobbyist. I had a Python function. I wanted to write a faster implementation for it. I think I tried to use C, and the project setup was super painful and non-obvious to me.
01:28:11
Speaker
This was a long time ago when I was a much more inexperienced Python developer, but I just couldn't get started very easily on that. One of the driving things I care quite a lot about, again, is that when people want to start doing this, they can because they haven't necessarily used Rust before. They're probably a Python dev who's got a practical problem that they need to solve. Somehow, they know that Rust can offer them this performance boost,
01:28:39
Speaker
And so they might be tempted to try pyro through that way. I think that's quite a common pattern. And so how do we make that easy? Fortunately, Python packaging has come a long way since the times back when I was trying to do this in C years and years ago as a junior Python dev. But we have a command line tool called Matcheron.
01:29:02
Speaker
And this allows you to set up a new project with the right structure. So we're talking about like a Python project. Typically these days has a pyproject.toml file, which describes things like its name and its dependencies. It will create a few of rustcargo.toml file at the same time, which contains your Rust project dependencies as well and your Rust project name. And then it will set up like a source file for you where you've got a bit of a pyo3 module already started with a simple function in it so that you can literally just start editing.
01:29:32
Speaker
And then you can also run matcher and develop. We'll just compile and install that straight into your virtual environment for you. But because of what matcher and set up, you can also just pip install that same module and it will just work. It's designed to just be like, here's the code go. And we've got a getting started guide on our website, which is trying to again, like help you. If you follow these steps, you should be up and compiling and be running your Rust function from Python pretty quickly. Okay. So as long as you already know some Rust, it shouldn't be too painful to get started.
01:30:03
Speaker
Well, you say, you say you should already know some rust. Like we, again, I kind of want it to go a little bit further than that, where if you really wanted to, and you're, you're not a rust developer, but you're prepared to just give it a go. It should be that it just works. And so again, like, you know, our user guide is kind of intended to be.
01:30:21
Speaker
Yeah, I mean, we have to get into the complex bits of Rust quite quickly, but it should be that we want it to be readable to a Python developer who's just kind of wants to maybe a bit like stack overflow sense, like copy some examples and just start piecing things together and, you know, their code's working and, you know, learn by doing. I want that to be possible with the mindset that it might be someone who's coming to Rust for the very first time. OK, OK.
01:30:46
Speaker
And I think for the most part, we do a reasonably good job of that. Like the developer experience, the hardest step that I think I see people tripping up on these days, I mean, when I first came to the poetry project, maturing was much, much less advanced. And it was about, again, project set up. And I had a bit of wrangling to figure that out. These days, people tend to be like, they have a really nice and smooth experience and they can just get going. And a classic trap is then you're not used to the Python developers having to compile your code.
01:31:15
Speaker
And so you edit your Rust source, and then you rerun your tests, and they're still broken. But you can clearly see that you changed the logic in your Rust source file. And there's a few different things that we can do to make that better. You can add what's called an import hook to the Python import loader so that then we could be a bit smarter about if you try and import your Rust program and it's not been recompiled, we could immediately recompile it for you so you can't get that wrong.
01:31:42
Speaker
So we have a, it's called match or an import hook to achieve that. I think it's not like super duper refined. Um, last time I tried to use it, it was a little bit clunky, but like that sort of thing in theory, we could make super seamless.

Community Support and Resources

01:31:56
Speaker
And, um, and so it should be that that edit, edit and test loop should in theory feel again, no different to you as a Python dev when you're getting started.
01:32:07
Speaker
That's quite impressive if you've gone to that level of integration to try and make people's lives easier. It's the goal. It's a good piece of rust. It's design is really ergonomic in a lot of ways. It tries hard to make things useful and helpful, I think.
01:32:23
Speaker
It really sounds like it, that the whole project is about trying to make this thing more ergonomic, as you say. Yeah, that's a big goal of mine. Nice. Well, I happen to know, I'm going to be spending a lot of next week writing Python. Wouldn't be at all surprised if one quiet afternoon, time bit of rust sneaks in there just to see what it's like.
01:32:45
Speaker
Yeah, absolutely. We also have a Pio3 Discord and the Pio3 GitHub, so if you have any questions, do feel free to pop along on that. Awesome. It's a space where we expect people to ask questions and get help. Cool. I'll put links to all of those in the show notes. Perfect. David, thanks very much for joining me. Thank you. It's been a pleasure.
01:33:07
Speaker
Thank you, David. You'll find links to all we discussed, an absolute treasure trove of links to the things we discussed in the show notes below. Probably, probably below. Kind of depends on which app you're using, right? Go and hunt for the links. They're in the show notes. They should be easy to find. They're probably down there. And if you happen to pass the like and subscribe buttons on your way, and if you've enjoyed this episode, please take a moment to click them. Maybe share this episode with a friend.
01:33:35
Speaker
And if you enjoy the podcast regularly, please consider supporting it. I opened up Patreon and YouTube memberships last week and the response has been really great. So thank you if you signed up. And if you want backstage news, this room really isn't large enough for a backstage, but if you want backstage news and to make sure there are many, many future episodes, consider signing up and supporting.
01:33:58
Speaker
Links in the description, which, as I say, should be easy to find. But until next week, I've been your host, Chris Jenkins. This has been Developer Voices with David Hewitt. Thanks for listening.