Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Pony: High-Performance, Memory-Safe Actors (with Sean Allen) image

Pony: High-Performance, Memory-Safe Actors (with Sean Allen)

Developer Voices
Avatar
0 Plays2 seconds ago

Pony is a language born out of what should be a simple need - actor-style programming with C performance. On the face of it, that shouldn’t be too hard to do. Writing an actor framework isn’t trivial, but it’s well-trodden ground. The hard part is balancing performance and memory management. When your actors start passing hundreds of thousands of complex messages around, either you need some complex rules about who owns and frees which piece of memory, or you just copy every piece of data and kill your performance. Pony’s solution is a third way - a novel approach to memory management called reference capabilities.

In this week’s Developer Voices, Sean Allen joins us from the Pony team to explain what reference capabilities are, how Pony uses them in its high-performance actor framework, and how they implement a garbage collector without stop-the-world pauses. The result is a language for performant actors, and a set of ideas bigger than the language itself…

Pony: https://www.ponylang.io/

The Pony Tutorial: https://tutorial.ponylang.io/

The Pony Playground: https://playground.ponylang.io/

Azul Garbage Collector: https://www.azul.com/products/components/pgc/

Shenandoah Garbage Collector: https://wiki.openjdk.org/display/shenandoah/Main

A String of Ponies (Distributed Actors Paper): https://www.doc.ic.ac.uk/~scb12/publications/s.blessing.pdf

Garbage Collection with Pony-ORCA: https://tutorial.ponylang.io/appendices/garbage-collection.html

Support Developer Voices on Patreon: https://patreon.com/DeveloperVoices

Support Developer Voices on YouTube: https://www.youtube.com/@developervoices/join

Kris on Mastodon: http://mastodon.social/@krisajenkins

Kris on LinkedIn: https://www.linkedin.com/in/krisjenkins/

Kris on Twitter: https://twitter.com/krisajenkins

Recommended
Transcript
00:00:00
Speaker
This week we're gonna look at the programming language Pony, which was born out of what should be a relatively simple requirement to do actor model programming in C with C's performance. And on the face of it, that shouldn't be too hard to do, right? Writing an actor framework isn't trivial, but it is well-trodden ground. The computer science is well thought through. The hard part is balancing performance with memory safety.
00:00:29
Speaker
I mean think about it you've got a busy actor based application that's creating a hundred thousand messages and throwing them at another actor. What's the life cycle of the memory for this hundred thousand messages whose responsibility is it to free it up if some of that data then gets passed on to a different actor i mean are you creating a hundred thousand memory leaks.
00:00:53
Speaker
And the answer is maybe, and that's not a good enough answer. And a lot of the solutions to that kind of problem revolve around generously copying data and garbage collection, buying memory safety at a cost of performance.
00:01:09
Speaker
So the question becomes, is there a way to solve the memory safety problem in an actor model without paying a performance cost? And Pony has an answer, and the answer is an interesting technique called reference capabilities.
00:01:24
Speaker
and I'm joined this week by Sean Allen from the Pony team, who's going to explain what reference capabilities are, how Pony uses them in its high performance actor framework, and how they enable an essentially pauseless garbage collector approach. The result of this is a C-like language for writing high performance actors,
00:01:45
Speaker
And more importantly, a set of ideas that are bigger than the language itself. So let's hear them. I'm your host, Chris Jenkins. This is Developer Voices. And today's voice is Sean Allen.
00:02:09
Speaker
Joining me today is Sean Allen. Sean, how are you? I'm good. How are you? I'm very well. Very well. i A few people don't want me to check out Pony over the weeks and months. And one thing I learned from checking out is the history of Pony begins trying to write concurrent high performance actors in C, right?
00:02:31
Speaker
Yes. yeah um Many, many years ago, Sylvan Klebsch, who created Pony, was working at one of the major banks and i had built a framework for people to write highly concurrent programs using the actor model i in C, but it had a variety of ah problems, most of which were related to the memory model, things crashing, segfaulting. Basically, there was there was nothing that was protecting you from yourself. It had all of the same problems that any standard C thing which is doing threading might do up with unsafe sharing of memory, et cetera. And one of his takeaways from that was that while the runtime that he built was really good,
00:03:22
Speaker
there needed to be something more on top of that there need to be a compiler um which could actually enforce the safety properties that you would need in order to do ah high speed concurrent programming with an actor model. okay so I have to ask the question this comes up immediately you want to do memory safety or actor model why not go to rust or airline.
00:03:47
Speaker
Well, first of all, at the time when he did it, Rust was this tiny, tiny, tiny little thing that was barely being used. It might have actually... ah I don't think that Rust had actually been publicly announced at the point that Sylvan first wrote that. um and And Erlang because you want to go fast. And Erlang is many things, but fast is not one of them. oh Tell me more about that. Why is it not fast enough?
00:04:17
Speaker
um I mean, if what you're interested in doing is things where you would normally turn to see, for example, for your performance concerns, then that's not Erlang. Erlang was never designed for that. That's not the environment that it was designed for. um It does a variety of things. um As a basic default, i'm I'm going to lie here a little bit and say that pretty much anytime you send a message from one actor to another in Erlang, there's going to be a copy that happens in terms of the memory that you're sending. They have a thing called binaries, which
00:04:56
Speaker
mean that that's not actually the case, but it is the basic model that it was built with. i mean The initial version of Erlang was written in Prolog, not the world's fastest language. like Extreme performance was never a concern for Erlang. Erlang has many incredible incredible features in it. i mean I think it's an awesome language. ah Sylvan you know thought it was an awesome language, but Erlang wasn't going to be good for those, those particular use cases. Okay. So we're talking about the kind of high performance, like bank level, day trading type stuff.
00:05:37
Speaker
Yeah. so or um yeah in the The very first application that it was used for was one that would listen to network traffic with inside the bank and try to find anomalous activity where you're capturing huge amounts of network traffic and trying to analyze it in real time. That's interesting. I wouldn't have thought it had to be begin its beginnings in security.
00:06:03
Speaker
That's cool. Okay. So you've mentioned copying. I can see that copying, copying like complex data is definitely going to slow you down, but it does solve the problem of sharing a message from one actor to another. Right. It certainly does. Yeah. It is a, it is a very straightforward way to solve it. If, if your, if your application can trade speed for safety, it is a very straightforward way to go about doing things and can give you a programming model, which is.
00:06:34
Speaker
very nice and you don't have to do a whole lot of thinking about things because it's just kind of safe by default. yeah Yeah, I can see that. so So guide me through how to do it with without trading performance.
00:06:49
Speaker
So the the Pony approach um is using a thing called reference capabilities. um It has the same ah same end goal as what systems like ah Lifelines, the Borrow Checker, and Rust, which more folks at this point are familiar with, um ah but it has ah it operates rather differently.
00:07:15
Speaker
The basic idea with reference capabilities are, and depending on the particular language, which is putting them in, ah they can either exist purely as ah as a compile time ah concept, or you could carry them through and have them actually exist at runtime if you wanted to. But in Pony,
00:07:33
Speaker
ah at this point in time, they only exist as a compile time thing. And you can think of them as sort of like ah annotations on your types. I mean, technically, they're actually part of the type itself, right? okay And They allow you to say things like, ah a few of the a few of the ah few of the capabilities are this one called Val. Val basically ah says that this
00:08:04
Speaker
can be shared and cannot be updated, right which means that any number of threads would be able to access this. We would normally call this immutable. right this is This is some immutable data. ah There's ref, which is just straight up mutable data like everybody is used to. You can mutate it all you want, but you cannot share it because sharing it would lead you to data races right because you could have If you have one thread of execution reading from it while another is writing from it, you'll get unknown results. So you could have two things trying to write to it, general, bad, data race, potentially, kaboom, segfault, all of the fun that you can get with current programming. Another one of the reference capabilities is ISO. And ISO allows you to share mutable data.
00:08:58
Speaker
um ah in a safe way, which is that ah in order for me to share some mutable data, I have to give up my reference to it. That is, only one thing can have a reference to mutable data at any particular point in time. ISO stands for isolated. It must be isolated in order for it to do it. there And there are some other reference capabilities, but those are three of the ones that that people interact with most often.
00:09:25
Speaker
And so you would have something where it's like, yes, this this particular variable is bound to an instance of a class and it is Val or it is ISO and then I can do those things with it that up.
00:09:42
Speaker
I would. Usually, it's it's easiest to describe ah reference capabilities to people by talking about what it allows you to do. But actually, in Pony, um what's really important about them, and it helps you to fully understand the model, but it's usually harder for people to grasp in the in the beginning, is that it's really about what you can't do with a thing.
00:10:03
Speaker
So for example, REFT is not allowed to be shared globally, right? it's It denies you from doing that. And if you try to do it, the compiler will stop you. It'll give you an error and and say that that's not allowed. Okay. And this is therefore must be how you get into making the actors work. You have an isolated piece of memory tagged, ah presumably on the pointer type.
00:10:30
Speaker
that says, so once I pass it around, someone else is free to pick up that memory without copying it and use it as though it was mutable data. Yes. And all all of all of this is happening. um at compile time, so there is there is no runtime cost to any of these. There's no runtime checks to make sure that this is actually safe. You are relying on the compiler to be correct and on the runtime to be correct. What's amusing is is that there are actually some scenarios that we've encountered where having that information at runtime would probably end up making things faster. Oh, who's got an example?
00:11:12
Speaker
um so ah So there is a garbage collector in Pony. um it is It is not a garbage collector that is like what an awful lot of people are used to with stop the world garbage collectors like you would find and in Java um you know um or or Ruby or Python or a variety of other ones. But it is still in the end, it is ah automated memory management.
00:11:40
Speaker
and um ah knowing whether a particular bit of memory is isolated or immutable, etc., um would open up some things where you could make ah you could make the garbage collector faster. And a lot of that information ah is available at compile time, and we compile in specific bits of code in order to make things faster for ah variables which are being used ah in order to take advantage of that. But there are scenarios where um we could get further performance improvements if we had that information at runtime rather than just compile time, because sometimes a compile time
00:12:28
Speaker
you're you're not You're not quite sure. In particular, um um a bit of memory could transition from being isolated, for example, right to being immutable. And the compiler is perfectly able to enforce this. you It's isolated, you do a bunch of stuff to it, ah then you get you you give up your reference to it and you send it to something else.
00:12:52
Speaker
as immutable and the compiler will prevent it from mutating it. However, the runtime at that point is unaware that this is immutable and still has to treat it as though it was isolated and there are some performance improvements that you can't get um with isolated mutable data that you could get with things that are immutable.
00:13:13
Speaker
it's it's a
00:13:18
Speaker
They're relatively small and esoteric things. that But for certain classes of applications could have a large impact. But i I just like to talk about it because it's normally that thing you think, oh, having the information at runtime and keeping it around to runtime would probably make things slower. But in actuality, for some cases, it could make it faster. That's interesting. Yeah, yeah I can see that. it's We associate that kind of runtime information with checking and enforcing, but it could be yeah leverage for optimizing. yeah Yeah, I can see that. This makes me wonder how large like the runtime system of Pony is. It's got a built-in concurrent garbage collector, an actor model. um What else is there?
00:14:07
Speaker
um some very fast optimized queues and and an awful lot of awful lot of hash maps. um I mean, if if he the the runtime is mostly a whole bunch of hash maps and and some queues. um ah That's slightly a joke and slightly not a joke. I mean, um like all of the information about stuff that's in memory for like the garbage collector in a variety of other things usually in the end is being stored in a highly tuned hash map ah in order to make it easy to find things.
00:14:45
Speaker
um um I don't really know how to talk about the size of the runtime, right? I mean, um pony binaries when they're compiled are fairly small. They're not small, like, you know, Hello World would be in C or something like that. But they're also considerably smaller than like, Hello World would be for, you know,
00:15:10
Speaker
um you know a Java type thing where, especially if you're including like you know the size of like the the JRE or whatever oh in that. and and that runtime is you know it's it's it's a You have a single binary and you run it like a single binary. in The binary in the runtime is part of your application. It's responsible for actual program startup, etc. Where instead of writing a main method,
00:15:42
Speaker
ah which ah there's which you know which the operating system takes care of starting up. um there is an actor called main and there is a main method, which is part of the runtime, which the operating system is responsible for starting up. And then the application ah from what has been compiled will start up your special actor, which is called main, ae instantiate a copy of it and send it a message to start its constructor. And from there, everything else that happens in your program goes.
00:16:21
Speaker
okay So is it the case that Pony is very heavily actor based? I mean, would would you use it to write a regular C programme and occasionally bring in actors or is it like really baked into the thinking? but One hundred percent. Actors everywhere. OK. If you want if you wanted to do something which actors are not good for, you should not use Pony. OK, so where do you think it shines then?
00:16:48
Speaker
um I think particularly in um ah application level programming, particularly, where everybody has their own definition of what ah application level is, um where you're looking for it where you ah can benefit from high concurrency and and you have ah memory usage which is conducive to working well in an actor model. um I can give ah an example of something which does not work particularly well in an actor model. right um If you think about a standard relational database that has a bunch of tables and you want to be able to join across those tables,
00:17:39
Speaker
like and For a particular operation, you need to have possession over these things. In an actor model, like the default way that you would go about doing something like that is you would probably have an actor per table. But um because everything is done in an actor model in a message passing fashion, this now means that you're into some level of ah If you're doing an update, you've got some type of multi-phase commit that you're going to have to be building in, um or you have to manage to obtain locks, except there's not locks in actor models across multiple tables in order to do joins across them. If you wanted to implement you know various like isolation models that like a relational database has, it's not a good fit. You could do it,
00:18:31
Speaker
But you're doing a lot of extra work there and you would probably want a different model to go about doing that. If you're building a key value store, you're building a Redis type thing um or you're doing ah you know socket programming where you're pulling a whole bunch of stuff off and you're not trying to do huge joins across things or.
00:18:51
Speaker
Or you can de-normalize that stuff in memory so that you have many copies of something, potentially so that ah you're okay with some sort of eventual consistency where you have actors which are being occasionally updated with the latest view of the data but do not have the absolute latest view. Then all of these are things where an actor model can really shine.
00:19:12
Speaker
OK, yeah, I can see that. So who is using it? Maybe that's another way to think of it. like Is pony production out in production at the moment? Get asked that question on a semi-regular basis.
00:19:27
Speaker
um ah There were times in the past when I've known of Pony being in production. um I know of one company where they have one person who so does their security stuff and has written ah pretty much all of their security stuff either in Pony or in Erlang. A combination of the two, they came from an Erlang background and started using Pony for things where it was um it was a
00:20:00
Speaker
Erlang was having problems with the amount of data it was trying to process within the time that they needed to process it. Pony provided ah a performance improvement while still providing a similar programming model. um From time to time, people pop up in the Pony Zulip.
00:20:16
Speaker
ah who appear to be recruiter types for interesting organizations. um i have I have no insight into what may or may not be being done with any of those things because they're all very like quiet. um One of those being a British spy agency, somebody popped in from another one being a ah another one being the National Health Service in the UK and one being a um a on network appliance router slash fire company firewall sort of company. um Now, are any of those people using it still? I suspect they were at some point. Are they still? I haven't the slightest idea. um At this point, probably anybody who might be using Pony in production, we have no idea.
00:21:08
Speaker
Right. Right. and Sounds like those organizations wouldn't tell you anything at all. Yeah. Yeah. and So they may be, they may not be, but it's certainly it's certainly not a language which um you know is being used in the way of something like Rust or whatever, which you know has had a a high growth trajectory or whatnot. um i I just assume that, well, there are many things I don't know, but that Pony is primarily used by hobbyists and people who are not making money with it. That is my assumption at this point in time.
00:21:44
Speaker
In that case, let's go back to what the language is like to use because the actor model aside, how much is it like C plus the reference um capabilities and actors? Does it feel like C other than those things? um A similar type system, for instance.
00:22:05
Speaker
No, no. Um, uh, the type system has the most similarities to ML family languages like OCaml, uh, F sharp standard ML and whatnot. Okay. How does that fit into a manually memory managed language?
00:22:29
Speaker
Well, it's it's you you're not meant um manually ah managing it. It is it is it is a a garbage collected language, right? Where um garbage collection for the most part means that there is a system within the runtime i to communicate reference counts to um to different objects and periodically each individual actor will go ahead and do a basic mark and sweep GC operation and for anything which has a reference count of zero, um that memory would be returned to a pool allocator for reuse within the system.
00:23:15
Speaker
okay That makes me think I should have picked up on this earlier, but you've got a non stop the world garbage collector from what you've described. Are you saying it's actually a stop the actor garbage collector? It is a stop the actor garbage collector it's granular stop the world. Yes. um So ah I got involved with Pony ah in part because we decided to use it at a startup that I was at a couple of jobs ago and we were targeting um
00:23:51
Speaker
We were targeting targeting banks, and there are systems that are right below the trading systems, right? Things where ah like risk analysis systems, etc., order books, stuff that's not the actual trading systems, the stuff which is very important but usually neglected when budget time comes around. oh yeah And one of the things that was really nice was that our latencies were just I mean, our latencies and throughput, if you were giving us a consistent level of traffic, our latencies and throughput were flat. like like the the The amount of time spent in garbage collection was really low. um And ah t t it's pretty much basically what it works out to is i incremental concurrent garbage collection. all right As a general strategy, you see um
00:24:51
Speaker
ah ah The Azul garbage collector for the JVM, the Shenandoah garbage collector for the JVM, yeah both attempt to adopt similar things, ah but with a different memory model where you are attempting to constantly garbage collect and you give up some level of throughput like you would with Pony because you were occasionally ie actors are doing a little bit to clean up memory. But otherwise, right you should be getting pretty much consistent performance instead of having the spikes like you would see would stop the World Garbage Collection like the one that shipped as the default for years with the JVM. um Yeah, yeah. Okay, so so you're saying that in practice that's enough to make the latency spikes. Is there any coordination on that? Are the actors trying to make sure they only do their own garbage collection when other actors aren't busy doing it? Or is it just randomized?
00:25:48
Speaker
ah it closer to randomize, but not actually randomize. So there's a thing in the garbage collection system called the GC factor, right? And this is set globally. um ah Although if you want to get really sneaky with the CFFI and know the runtime, you could set your GC factor on a per actor basis.
00:26:15
Speaker
um But in general, it's said globally, and this basically says after every garbage collection, right um however much memory this actor has left, do not garbage collect until it is at this factor of this amount of memory. So the default is two. So any time after a actor garbage collects, on it will not attempt to garbage collect until it uses 2x whatever is leftover that is still in use after that garbage collection.
00:26:43
Speaker
I see. yeah That actually makes it sound um quite easy to implement.
00:26:51
Speaker
relative, I mean, relatively on and all of the all of the garbage collection, then there is messaging between actors, right? um Which ah so that, ah you know, there's no coordination, there's just, I'm going to let you know that this memory which you are in control of, right? um I am no I no longer I've given up my reference to it. And the actor can keep track of, it processes those messages just like it would any other, decrements reference counts on objects. And ah when it gets to a garbage collection point, it will free them. Okay, so just to check, I've got that. It's the responsibility of the receiving actor to clean up the memory. you So,
00:27:42
Speaker
um a actor has some memory which has been allocated to it right from the pool which it is now responsible for and any object which was allocated in that act by that actor in that pool will be freed by that actor. Even if it's like this was some isolated memory and I sent it off to somebody else, the receiver of the isolated memory will not be the one freeing it. um Instead, it will be this actor. this That is actually a case where if we was guaranteed and we knew that this was isolated at runtime, ah you could cut down on GC messages, at least in theory.
00:28:22
Speaker
except that you have the whole problem of, well, you're maintaining some information about within this memory that an actor owns what's there. So you still can't actually because you have data structures around the memory that's in use where an actor has to go ahead and it's it's maintaining it.
00:28:41
Speaker
But yes, so you send the message back, ah it goes ahead and um potentially returns that memory to the pool, right? ah Basically, um there are chunks of memory um that the actor gets, and any time a chunk is, ah you know, completely available, it'll return it to the pool. Okay, okay, I think I've got that.
00:29:07
Speaker
This makes me wonder then, perhaps I should step back and say, is the actor model, the one I'm most familiar with is the Erlang one. Is it normal for actors to crash? And then yeah who's responsible for cleaning up the Q's memory? Ah, actors do not crash.
00:29:27
Speaker
Ah, that's very different to Erlang then. Yes, that is that is very different to Erlang. There was a fellow many years ago named Dara, and Dara, while talking with Joe Armstrong, decided to coin the phrase that ah pony actors crash at compile time.
00:29:45
Speaker
right okay there is There is a robust type system and um the compiler will not let you do a number of the things that you would expect to have as potential things that could happen in Erlang. For example, in Erlang, ah you could try to send a message to an actor which no longer exists.
00:30:07
Speaker
Right? um That unto itself is a potential mental problem. theory There are no actors crashing um in in Pony. That actor will always exist. If you wanted something like that in Pony, you would need to implement it yourself where you could basically tell an actor to reset itself to some known state. So um one of the Like Erling was designed you know to be used in a telephony setting um where it's possible that you perhaps missed messages, right? And you're running a telephone circuit and you get a message that is not valid for the state that you're in. What do you do in any situation if you get like an operation which is not valid for your current state machine, right?
00:31:00
Speaker
see an erroring approach that would be is to simply not handle that message, have the error right from ah you know that the message isn't handled, ah the actor crashes, ah you have something else which will then restart that crash, the erroring case, a supervisor, and it is back to a known and is back to a known state.
00:31:21
Speaker
right For some things, this could be a very good programming model. For others, where you were keeping important state inside of an actor, um that could be a very bad model. right um it really It really depends. so yeah there's There's no crashing in ah there's no crashing and pony programs for the stuff that I wrote for pony, including stuff that went out in production. We actually put stuff in which was a full program crash. um But these were for ones where our full program crash messages were things like, the unreachable was reached, or ah this should never happen. You know those things where in a type system, the type system is not powerful enough to prove that X thing is never going to happen, right? right where
00:32:10
Speaker
this variable should always have this value. It's impossible for it not to have this value, or so you think it's a programmer. But reasonably, there's no actual guarantee on that the type system can't prove it because it's not there. um And in those scenarios, we would actually literally just crash the program. Because at that point, it's like,
00:32:34
Speaker
You were in a state that you as a programmer had absolutely no idea it was possible, that you believe shouldn't be possible. Some basic constraints of the system have been violated. What do you do? The the approach you know that Joe and others took with Erlang of, if we're in a state that we don't believe should be possible, I'm just going to start over is kind of a kind of a reasonable approach, at least some variation of it. But yeah.
00:33:02
Speaker
Normally, get you have to you actually have to go about doing that yourself ah for a particular for particular thing because the actors always continue to exist otherwise.
00:33:14
Speaker
Okay, does this mean that messages between actors are also strongly typed? So you can guarantee at compile time which kinds of messages an actor can receive? Ah, yes. So in Erlang, which would for folks who might be familiar, it's basically the message is a tuple and you match on part of the tuple in order to decide in this single message processing function, right? um There are good.
00:33:39
Speaker
hand wave terminology. There are two types of functions in Pony. There's behaviors and functions. Behaviors only exist on actors, and you send ah when you do a message send, you are very specifically sending a message to execute a specific behavior, and it is it is strongly typed. You have to provide the proper arguments of the proper type, et cetera, to that behavior, and then that behavior will be run. So you can have many different behaviors. It's not just one process message um function. So yes, that is fully that is fully strongly typed there.
00:34:22
Speaker
I can't, from what you've just described, I don't see the difference between a behavior and a regular function. Oh, because regular functions um are called synchronously and there is no message sent involved. Ah, it's that, yes. Okay.
00:34:39
Speaker
Right. Okay. So, so that would mean that an actor is typed in its messages, but open for extension at any point. What does open for extension mean? I mean, like I've used strongly typed actor frameworks where the the message is basically an enum of different types of message. And you have to change the enum as well as the actor as implementation.
00:35:06
Speaker
in order to extend the behavior. But you're saying if you just add a new behavior function, then that's the job done. Which to me, that falls under the extent. yeah i have i have um In some fashion, they yeah there's either extension when you have the source code available to you and extension when you don't. and In the Pony case, you need to have the source code available to you. But that's also in part because the Pony compiler is does whole-world optimization. So all the source that it needs,
00:35:41
Speaker
ah All the pony source that it needs has to be available to it in order to compile. okay theres There's no pony shared libraries or anything like that. Right. Then that leads into my next question, which is, um I was thinking about different CPUs and different servers. Is there is there a notion of multi-threading in Pony and is there a notion of actors on remote machines in Pony?
00:36:09
Speaker
ah There is no distributed pony that would be actors on multiple machines. um There was academic research done on how to do it. ah There's a paper called, it's a string of ponies, um I believe is it, which have have the basic ah has some of the basic ideas. and Years ago, Sylvan and I discussed some more particular implementation details um based in part on the work that ah we were doing at the startup where I was at at the time. However, um there's ah nobody who wants to do that work in the sense of um Pony is an entirely volunteer-driven project.
00:36:53
Speaker
so um and Nobody has stepped forward and said, I want to do that. Well, actually, for many, many, many things people have said, have stepped forward and said, I want to do that. And then usually you hear from them for about a week, and then they disappear because that is open source. um You get a few people who stick around, but you know, plenty of people step forward for things and then don't.
00:37:14
Speaker
So there is no distributed pony at this time. There's at least ah you know a paper and some other possible stuff if somebody was to talk to me about how to go about implementing that if somebody wanted to.
00:37:28
Speaker
um There are threads in Pony, but ah not threads in the way that most people think about them. The runtime um will ah starts up standard operating system threads as part of its startup. um The default would be to start up one thread, and we call them scheduler threads, one scheduler thread per available real CPU.
00:37:56
Speaker
as in hyperthreads don't count. right That's not a real CPU. Actual real honest-to-goodness CPUs there. and There'll be one scheduler thread per CPU, and if the operating system allows it, we pin that thread two that's to a specific CPU.
00:38:15
Speaker
macOS doesn't allow that, it's going to move them around on you. For example, Linux will let you do that. You can pin to a specific CPU and that scheduler thread will always be on that CPU. um If you want to get some really nice performance then at that point on like Linux, use CSET, et cetera, in order to carve out CPUs which are just for your Pony program, and then each scheduler thread will have a CPU which is entirely for itself.
00:38:45
Speaker
What happens then at that point is that actors are and hes a colloquius and are scheduled onto a scheduler thread and are run by the scheduler threads. The scheduler threads ah ah can do work stealing if they are out of work in order to do it. The basic algo is um If an actor sends a message to another actor and that actor is not scheduled, right then that actor will be put at the end of the queue for the scheduler, which is currently running the actor which which sent the message. right um So no coordination is needed. And if that actor is already scheduled, right ah the other one, then um
00:39:34
Speaker
Then ah nothing will happen because it's already scheduled to run on some other scheduler. And we like to say that there are no locks in the in the Pony runtime. That's depending. Depending on your definition, that's not entirely true. There's an awful lot of atomic bit twiddling that goes on ah down below in order to be able to do things like check to see if an a a actor is scheduled on a schedule or not in a way that ah is going to be safe and gets you a correct result. And a few years ago, um because based on your workload, you might not actually want to run, say,
00:40:18
Speaker
with 8 CPUs, right? Because with work stealing, ah work stealing can start to be more expensive than the computations that you're you're doing. um There are two strategies to allow us to scale down the number of scheduler threads as the program is running. And one of those involves a pthread mutex. Okay, yeah ah yeah, I'm with you. I think it makes me wonder how much of this Makes me wonder how it plays into references and how much is under user control. Like if I'm trying to send a message to an actor that's on another thread. do do An actor is not in an an actor is not in another thread. i Sorry, an actor that's running on another CPU. I don't even think about that. I just send a message to an actor and the runtime will try to do the most efficient thing possible. But how, okay. So.
00:41:17
Speaker
If it's on a different CPU, don't you necessarily sometimes have copying to send the message across? Oh, so you allocate.
00:41:28
Speaker
you allocate for a message, right? This is done by the runtime. ah There is a, there is a, and CLN, there is a struct, which is eight which is a message, right? Which has a small amount of memory allocated for it. And yeah ah that particular message um is,
00:41:53
Speaker
ah placed on a queue for an actor right to be processed at a later point in time. d ah The for actors are multi-producer single consumer.
00:42:08
Speaker
Right. So there's some atomic bit twiddling, for example, for the backside. um But there's no copy because it is it is a message. It's it's a message. it was It was allocated over here. It'll be freed when the message is no longer once the message is processed by the runtime. We know for how the runtime works.
00:42:28
Speaker
that it is guaranteed to be safe because those things which are in an actor's queue will only ever be being processed or attempted to freed by a single scheduler thread so and nobody else is going to be holding a reference to it once it it is once it is in there.
00:42:46
Speaker
So we have no data races etc for that and any of the memory which was allocated by users as part of their program has passed through all of the wonderfulness of.
00:43:00
Speaker
ah the reference capabilities, etc., and while we free them while we free the but we free the message, you know any underlying objects which might be being shared, etc., that's part of the you know the ah the garbage collector inside of a Pony, ah which is called Orca. There's a whole protocol for how actors communicate their um ah They communicate their their garbage collection information so that things can be garbage collected safely. ah There's a paper, if anybody's interested, it's the Orca paper. Okay, i'll I'll find that and link it to in the show notes. I think possibly I'm worrying too much about the message being in a particular CPU's cache.
00:43:47
Speaker
and thus having to be... Oh, that's ah that's a performance concern, right? um Yes, if you really want to go fast, then in your ideal world, you want things to stay on the same cache, right? ah you but You want them to be there. So, a relatively naive thing.
00:44:07
Speaker
Because in ah in ah in a complicated program, you have absolutely no idea what's going to happen for such things. um But ah the night that is that is why the scheduler, if an actor hasn't been scheduled, puts that actor onto the queue for the scheduler which it is which is currently being run. So then at least in theory for a program which is not super active and has very few actors, right you could have you could send this message, you could have this stuff and it could still be in the cache when that other actor runs. That is at least a possibility for a thing that could happen.
00:44:50
Speaker
but I'm not sure tendency for it to be sticky to the same CPU because of the schedule as as as as as the actor where the where where the where the cause it to be scheduled by sending it a message right um that said I mean when you have.
00:45:12
Speaker
to anything more than one CPU, right? And you have lots of actors. Reasonably speaking, there's probably other work that's going to be happening so that something else is probably going to step on that cache. And most of the cache concern that we really have is ah trying to run enough messages per actor right um so that when its queue is brought into cache right in order to do processing, um that you're not like doing, I'm going to do one message for this actor, one message for this actor, and then constantly stepping on caches that way. like there's ah There's a batch size, which is if you build your own pony runtime, you can change the batch size. I think it's 300.
00:46:00
Speaker
Maybe it's 100. Anyway, there's some number of messages that that is the default that it will that will process. We've done a lot of tuning with different simulation applications to try and come up with something which is the same default.
00:46:14
Speaker
Yes. But yeah if but if if you're at the level where you like, I need to control exactly which CPUs these things are going on, etc, right then you don't want to use the Pony runtime because the Pony runtime is trying to give you a really good high-performance scheduler, which will be good for most cases, but it's not guaranteed to be great for your particular case. and If you know exactly how to lay your stuff out,
00:46:43
Speaker
Where usually you have a very specific machine at that point in time, then that you're also running it on, right? You're not just going to be putting it on this machine or that machine, then you're better off using something which gives you more control. um But with the Pony One, hey, you're going to get really good performance where you should, assuming, you know, I mean,
00:47:04
Speaker
You can tank performance for anything. I can write a Python program that's faster than the equivalent C program if I do certain things in C. um ah But Modulo, you know you're doing slow things in your code, you should have a pretty well performing ah application for the vast majority of things as long as they're a good actor model fit. OK. This makes me think. So um zooming out to the slightly wider world around Pony,
00:47:35
Speaker
I can see that I might have a project that's in C but needs an actor-ish approach to some part of it. what Is there an FFI solution if I need to do hybrid Pony and C? ah There is, yes. um ah It is not Complete, right? um I say it's not complete because there were certain things that you simply cannot make type safe about C. Unions, for example, you cannot make type safe. And so we do not support unions right and interfacing with ah things that require a C union.
00:48:21
Speaker
um You have to write. And so for those things where it's not fully complete, you get to you have to write a C shim in order to ah interface with that particular bit of C code. And you and your C get to attempt to try and make this C.
00:48:39
Speaker
But at the pony level, it's going to be it's going to be safe. And as part of the FFI system, you define the types that your particular CFFI thing takes. You define the reference capabilities for it. If you get that wrong, things run safe. If you get it right, the compiler is going to help you to do safe things as much as possible. The biggest danger with um with the FFI system um is la is that you allocate something in Pony, right? um You do not maintain a reference to it in Pony, and then you have some C code that holds on to it for a long time and then tries to use it after it's been freed.
00:49:23
Speaker
ah yeah Now, we have not closed this potentially lack of safety hole because our assumption is that the programmer knows what they're doing. right In that particular case, that's a rather advanced one. Once upon a time, Go had a similar thing.
00:49:42
Speaker
ah with Sego, but they changed it so that you cannot actually share pointers to any memory which was allocated from Go, and you have to make copies for things that are going across that barrier, and it makes Sego rather slow on top of the other things that make Sego rather slow. We have not made similar decisions on the pony side. On the on the other hand,
00:50:07
Speaker
a This allows Go to potentially do a garbage collector at some point in time, which will allow it to, ah for example, compact memory. You can't do that with a garbage collector and pony, because the assumption is that this memory could have been shared with something in C, it has a pointer to it, you know, and we do not have any special thing where we can fix up pointers if we do compaction, etc. So it is very much the CFFI is very much Here be dragons. Here be foot guns. Be careful. Yeah. Yeah. Um, I, one of my favorite languages is pure script and it has the same thing with JavaScript. It's like, here's an FFI, but once you open that can of worms, you're kind of on your own.
00:50:51
Speaker
Yeah, um we we we try to give you know some level of support where for C functions you have to define like what they are and the compiler will use those. But if you tell you know if you tell it that this thing is, if you say, hey, the thing that I'm sending into this is a val, i.e. it is immutable, there's nothing to stop the C code from mutating it, and then bad things are quite possibly going to happen in your program. Okay.
00:51:19
Speaker
It doesn't sound like you'd be significantly worse off than just writing it in C.
00:51:26
Speaker
Yeah, no. I mean, i i i don't i don't I don't think so. It depends on if if you have a thing where ah you can treat the C stuff as a library, then you're you're good. There has been a thing where people have tried to do stuff where their main program was written in C and they they spun up ah a, I would call it like an actor subsystem or something. ah There used to be a C API to allow you to ah from C create actors and create messages and send the messages, et cetera. But we we deprecated and got rid of that years ago.
00:52:07
Speaker
um That was kind of leftover from how the first runtime that was at the bank was designed. And Sylvan carried that over to when he wrote an open source runtime, right? Because initially it was, I'm going to write the runtime, I'm going to exercise it from C, and then I will work on the i will work on the compiler and the programming language.
00:52:30
Speaker
um That's pretty much his standard approach and would probably be mine for doing any type of language that has a runtime. ah We got rid of all of that functionality because we started adding more stuff into the runtime where. um that API was a detriment towards being able to improve certain portions of the runtime. For example, years ago, I contributed a back pressure system ah to it. And the back pressure system simply wasn't going to work in the face of that other API. And and so that was one of the reasons that we got rid of it. Yeah, okay. Yeah, that makes sense. I can see it being a bit of a milestone around your neck. Yes.
00:53:16
Speaker
That makes me wonder where um what the current state of Pony is and what like what's being actively developed on it. um
00:53:27
Speaker
You mean in terms of ah what we are building into the the languages and compiler, et cetera, at this point? Yeah, yeah. yeah but what's where's Where's all the interest in the Pony community at the moment? um There has not been a lot of movement lately. um that We are a relatively small community.
00:53:46
Speaker
um Here we go. ah I, over the last two years, have had repeated problems with RSI type things. I have primarily been the number one contributor to the runtime of the compiler for several years, right? And ah when I start getting worried about being able to get paid to do my job that actually pays me, um one of the first things I do is I stop doing any sort of computer usage outside of work, right? That's your fingers, yeah.
00:54:15
Speaker
Yes, and I have been I had a flare up last year and I'm just slowly like easing back into doing stuff outside of my work hours. So there's been not a whole lot that's been done. We have a um We have an ah RFC process where people can suggest changes, etc. They can go through, but I mean, for the most part, there's a small group of us who are the core team. We do what we can. well we
00:54:51
Speaker
in between our lives where we get paid and most everybody else has like kids, etc. um and yeah if we just we We add what we can. i mean What's there is usable. There's tons of stuff we would like to change. right But you know they are they are very heavy lifting and require a lot of work. and most of the time goes to you know more mundane tasks for keeping the basic stuff working and keeping bit rot from setting in, et cetera, rather than doing large scale things.
00:55:28
Speaker
um I did several large-scale things a few years ago, fixing a bunch of bugs. um but ah you know An awful lot of this stuff happens now. There's a fellow, Gordon Tisher, who's one of the core team members. He's been working on trying to write... like ah ah I'm going to call it Pony and Pony, even though that's not really quite the right thing. but um like If you want to do a language server, for example, you have to interface with our stuff that create the AST, which is all in C. um It might be nice to have that in Pony. I say it might be nice because, well, nobody's really fully done it in C, sorry, in Pony, so we don't know how nice it would be compared to the C to actually use, et cetera. But um you know he's working he's been working on that in his spare time. um A decent amount of time gets taken up by our
00:56:26
Speaker
Staying vaguely up to date with our LLVM versions because LLVM loves to introduce ah breaking changes with every version and um ah LLVM generally introduces some bugs in every new version and Once upon a time, they did there was a decent amount of stuff where not Clang were really users of LOVM, but ah there are things that use l LOVM that are not Clang, but most of those things are not
00:57:04
Speaker
ah
00:57:06
Speaker
There is no testing of those things when LLVM is doing development. um At least it used to be a couple of years ago, I don't know if it still is, but Andrew Kelly, who was working on Zig, was an active LLVM contributor. um So I think Zig certainly had an easier story with LLVM upgrades. I know years ago, I don't know if they still do it, Rust had a whole system in place for being able to patch LLVM to fix bugs, which impacted on Rust.
00:57:35
Speaker
We have such a system in place as well. I don't think we have any active patches right now um ah for that, but we do have a couple bugs that we're almost sure are very weird esoteric LLVM bugs that we just kind of try to ride around.
00:57:52
Speaker
ah but Yeah, LLVM is a a wonderful project which gets you an awful long way and is a dependency which is going to come back to bite you over and over and over again in the future. And there is a a distinct trade-off there where particularly if you're a small project like we are, um keeping up with LLVM versions can be kind of a full-time job. Sometimes going from one ah LLVM version to another is you know a good
00:58:24
Speaker
It's measured in the low hundreds of hours sometimes in order in order to move things over. It depends on the change. Yeah, they changed how pointer stuff works um inside of LLVM a few versions ago. And that was a very heavy lift where we had to do some semi-invasive changes into into the pony runtime and the compiler in order to account for those. um and Periodically, there are things like that. and Otherwise, there are just small breaking changes on a regular basis that have to be accounted for. That's the kind of footwork I would not enjoy. there You go have to have strong immunity to get that stuff done.
00:59:11
Speaker
It's not the fun part. And in the end, you know we spend a lot of time doing those things. We definitely encourage people to come talk to us. We love when people write libraries of their own. We have a bunch of tooling to help them ah with the basic setup you would have for an open source pony library for being able to do testing, to do releases, all of that sort of stuff, et cetera. um But yeah.
00:59:36
Speaker
um We would always love more people, but you know we know what the numbers look like for most open source projects you know and et cetera. So we're doing the best that we can with ah what we got in the time we have. But we definitely welcome people to come and chat. just done They just got to be respectful. that's That's our one big thing as a community. That should should be a rule of the internet. Sadly, it's not, but it should be. You can be an asshole. I don't care. Just leave the asshole in this somewhere else.
01:00:09
Speaker
Okay, well, in that case, with with a view to encouraging a few people to use it, what's the state of like, you is there a package manager? or Is there a package repo? Is there a build tool I should be using?
01:00:21
Speaker
um ah There is no package repo, et cetera, ah for reasons including, I think, all of those things are an incredibly bad idea. okay c c ah See the war that's broken out between WP Engine and WordPress and some of the things that happened with the official repo there for ways that centralized repositories could be really bad.
01:00:49
Speaker
or attacks that have been executed against NPM, etc. There's a variety of problems with that approach. i I'm not a big fan of that particular approach. I feel like people took CPAN um and all of the problems that CPAN had and just moved it from one language to another and made a few things nicer along the way. But all of the fundamental problems of that model still exist. So ah basically, you find it on ah you find it on GitHub um or or wherever else.
01:01:21
Speaker
ah For stuff, there's not it's not like there's been tons of libraries that have ever been written for things, and people continue to maintain. Most people do some pony stuff for a while, they put it down, they don't come back to it, right ah so you have a lot of older libraries. We do have a number of them under the Ponylang org that we maintain for libraries ourselves.
01:01:43
Speaker
ae ah There was a tool called Corral, which we hope to eventually move into the compiler itself.
01:01:54
Speaker
which ah has a JSON file format where you can give it the location of ah different libraries. right um and It will pull them down for you and then it will run your compilation so that those libraries will be able to be found and used when you go to build your program.
01:02:15
Speaker
um it's vaguely similar to how some of the Go stuff works for Go modules. I say vaguely. If somebody is familiar with Go and the Go mod stuff, if they squinted, they would see a similar approach for how Corral works right now. um I have a thing that I wrote up like three years ago for how I'd like to redo all of this stuff. um It would be over a thousand hours of work to make it all happen. It has not been a thing that I have taken on. If I was still working at a company where we used Pony in order to make money, right then that would be a thing that I would consider taking on. But as it is with the volunteer and hobbyist thing, that's just way too much of a chunk to to bite off.
01:03:08
Speaker
okay Yeah, that's totally fair enough, especially for the smaller projects. What about um building? Do I just use Make for Pony? or ah You can. um Pretty much all of the Pony library projects have a Make file and an equivalent batch file for Windows stuff. yeah okay and And if people were doing a library, they can crib from it. You don't need Make, right? You could do the stuff yourself, right?
01:03:36
Speaker
but You have to run a command to fetch dependencies. You have to run a command to then actually build with those dependencies. I got two things to do there. Hey, and one of those things depends on the other being done to me. All right. That's where make comes in. So yeah, I mean, make is very commonly used. Yeah. Okay. That's fair enough. Um, so one final question, especially for people that want to try this out. I know there's a version of pony in the browser.
01:04:05
Speaker
How is that working? I don't think you're compiling Pony to Wasm as well.
01:04:11
Speaker
No, no that's um that's that's a very the the playground is very very simple. right ah Pretty much basically, ah there's a VM in Linode, and leno right um and it is running ah it is running Pony inside of a Docker container. right um and it's using the old ah Years ago, there was ah there was a Rust playground. I think they've replaced all of that code, but it lives on in the Pony playground.
01:04:41
Speaker
um Because we we took that and instead of plugging it into a container that has Rust compiler on the back end, it gets plugged into a container that has a pony compiler and it will it you can do ah execution of basic pony code where you're not going to be able to do anything which involves FFI and you're not going to be able to do anything which involves using external third-party libraries and the only thing that you can actually do, everything has to be in a single file. right ah But it is good for you know basic snippet or its the thing we love the most for it is, I can't get X to compile. It's like, can you turn X into a thing that goes into the playground so you can send us a link because that's always much nicer.
01:05:30
Speaker
and in right yeah 95% of cases people can do that, um etc. It's really good for that sort of thing. If you're trying to do simple basics when you're learning Pony or you can't remember how to do a thing if you already know Pony.
01:05:44
Speaker
um you can pop open the playground and and it works fairly well for that. It's definitely not designed to be a thing that's like running Wasm or actually doing any real pony program stuff. i The threading model for Wasm is one that's kind of actually difficult to fit pony into for reasons. um So somebody could try to make a Wasm thing happen, but it would probably be a lot of work.
01:06:15
Speaker
Always too much work and not enough time, the ah perennial problem in life. um Okay, so is is the playground the right place for people to go if they want to try pony out and get a flavour for it?
01:06:30
Speaker
if If they want to do little things, sure, it is. Otherwise, um on the on the Pony website, um there are instructions on how to get started with installing, et cetera, and everything. We ah we do nightly builds for the platforms which are fully supported. um And there are also semi-regular releases. The last release was last week that fixed a couple esoteric, but at least possible, seg faults that could come from the runtime.
01:06:59
Speaker
I don't know if anybody ever actually hit them, but they were at least theoretically possible, ah so we fixed them. um Basically, the supported packages are supported operating systems are Windows, ah the most recent version of macOS.
01:07:17
Speaker
um ah and ah Any of the Ambuntus, which are currently ah supported LTSs, um and I don't think there are any others right now. um We've had some in the past. There was somebody who had requested that we support Rocky, and we did for a while, um but they kind of ah They kind of disappeared for the community, but they were also in China, and they were always having a hard time getting through firewall and everything in order to talk to us. And I'm not i'm not a Rocky user. I'm not a YUM user. I spent an hour trying to get ah a it working with a newer version of, the newer supported version of Rocky at a point in time. Couldn't couldn't find anybody who was a Rocky user, gave up, and so we dropped Rocky support. that's
01:08:13
Speaker
pretty much how it goes for most of our stuff. if we can If we can fairly easily for any of the Linux distros, if we can build a container in order to build for that thing in a in a fairly straightforward fashion, where we install the dependencies, we build the compiler, run some Python to upload it to ah Cloud Smith, where the packages are hosted, then great, we will happily support the thing. But if it requires any, you know,
01:08:42
Speaker
any serious amount of work due to bit rot, then usually we drop it. yeah Okay, that's fair enough. Okay, so last question, um um a speculative one for you. If you were to look into the future, would you would you rather see Pony succeed in its present form and go forwards? Or would you rather see the big idea of reference capabilities being adopted by other languages?
01:09:14
Speaker
um
01:09:17
Speaker
I think that the second one sounds more interesting than the first one. I mean, okay the first one is there is one language all right and it has this idea. The second one is this idea spreads out into the world and starts getting used in many different places. The second one sounds more appealing to me. It sounds like you're going to end up getting more people. Even if Pony got a lot of usage, if you have many languages, which I'm going to assume are also getting a lot of usage in this, right? ah And they spread out and they have reference capabilities in the world. Then, yeah, I think the second one sounds were more interesting in the end. I mean, programming languages are tools. Use them to solve problems.
01:10:01
Speaker
We talked in the beginning of this about how the actor model was not perfect for all problems. right Pony is very tied to the actor model. I'd love it. you know More tools which provide which provide the safety of reference capabilities but are built to solve different problems like Rust in its very beginning had Actor stuff and stuff that looked then stuff that was a lot like reference capabilities and they dropped it because they were trying to build a browser and they wanted to do system level programming as well and.
01:10:35
Speaker
you would not You would not put Pony into the Linux kernel. It just doesn't make sense. It's got a runtime. you know it's It's as sensical as putting Go or Erlang into the Linux kernel. It doesn't make sense. yeah yeah You could you could you know You could run Pony as its own like hypervisor on bare metal the way you can with Erlang on Zen or some stuff people don't go. But in the end, like it's not suited for all areas. So if you had different languages which are suited for different things where people could pick them up and use them, but they gave you like a lot of the ah
01:11:15
Speaker
a lot of the value of reference capabilities, I mean i think that would be great. like like I prefer reference capabilities as in general for what it's trying to solve to the lifeline approach that Rust takes. you know um The problems that I generally try to solve, um I think reference capabilities are easier to deal with for the vast majority of those than the lifeline approach that Rust takes.
01:11:43
Speaker
But, you know, that's me, but I would love to see reference capabilities all over the place. I think it would be great. Well, hopefully Pony, Pony, well, people will check it out and be inspired to carry the ideas forward. On which note? That would be awesome. That would be good. I always like that as well, because like, ideas should be bigger than implementations, right? Absolutely. I mean, implementations can come and go. Yep.
01:12:12
Speaker
On that note, celebrating the idea. Sean Allen, thank you very much for joining me. Thank you. Thank you, Sean. You know, that kind of illustrates why I'm so interested in all these C-like languages, because C did so much. It's in the foundations of so many things we take for granted and use every day. But you have to keep hold of it, but ask how we can build on it, how we can take that style of programming and add in some new ideas.
01:12:43
Speaker
So, if you're hungry for more of Pony's ideas, as usual, you'll find links to everything we discussed in the show notes. As you're headed there, now will be an excellent time to click like if you liked this episode. Share it with a friend, share it with a social network if you want to spread the love, and make sure you're subscribed because we're back next week with another excellent developer with some new ideas to share with us. Until then, I've been your host, Chris Jenkins. This has been Developer Voices with Sean Allen. Thanks for listening.