Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Will we be writing Hare in 2099? (with Drew DeVault) image

Will we be writing Hare in 2099? (with Drew DeVault)

Developer Voices
Avatar
1.7k Plays11 months ago

This week we're back on systems programming with Hare. A C-like language for the ages. We talk to its creator, Drew DeVault, about what he thinks we can learn from the past 50 years of programming, and how we can build that hindsight into a new language that will last for the next 100. 

In among all that long-term ambition we talk cover everything from error handling, typed unions and linear types, to metaprogramming and Drew's microkernel operating system. It's called Ares, and it is, of course, built in Hare.

--

Drew's Homepage: https://drewdevault.com/

Kris on Twitter: https://twitter.com/krisajenkins

Kris on LinkedIn: https://www.linkedin.com/in/krisjenkins/ 

A summary of Hare’s features: https://harelang.org/tutorials/introduction/

Hare Community Resources: https://harelang.org/community/

SXMO Mobile: https://sxmo.org/

QBE Compiler Backend: https://c9x.me/compile/users.html

Ares OS Source Code: https://sr.ht/~sircmpwn/helios/

OSDev Wiki: https://wiki.osdev.org/Expanded_Main_Page

The Ares System [pdf]: https://mirror.drewdevault.com/ares.pdf

#programming #podcast #harelang #qbe #microkernel

Recommended
Transcript

Introduction to Hare Language

00:00:00
Speaker
Recently on Developer Voices, we took a look at the Zig programming language. And a few people got in touch to say, well, if you're interested in C-like languages, you should look at Hair as well. It's a language in the same space, but it's got quite a different set of goals. It has different ideas about memory management and types, about metaprogramming. And probably the most ambitious of its goals is to become a 100-year language.
00:00:28
Speaker
a language you'll still be using in the 22nd century.

Goals and Stability of Hare

00:00:32
Speaker
I can't even begin to imagine what programming will look like next century, but Drew DeVault, our guest this week, has plans to make sure hair is on the list. And if a hundred years sounds like too lofty a goal for you, he makes the point that sometimes it's hard enough to get a five-year-old project to compile. It might be quite refreshing to have a language that just guaranteed no breaking changes this decade.
00:00:58
Speaker
So, what are Hare's answers to the problems of C? What's Drew's strategy for the long, long, long term? And for bonus points, we end up talking about Aries, which is the operating system that Drew's been writing in Hare to prove you can write an operating system in Hare. Let's get stuck in. I'm your host, Chris Jenkins.

Hare vs C: Addressing Challenges

00:01:19
Speaker
This is Developer Voices, and today's voice is Drew Devault.
00:01:36
Speaker
Joining us today, we have Drew Devault. Drew, how are you? Oh, I'm doing great. How are you doing? I'm very well, I'm very well. I'm looking forward to Christmas. It's that time of year around here. Likewise, I just wrapped up my partner's presents and stuck them under the tree. Ah, so they're probably hiding under the tree, scratching at it with a fork, right?
00:01:53
Speaker
Yeah, well, the cat certainly is. So the other thing that's keeping you busy, apart from wrapping presents, you're one of those people that has a tremendous ambition here, because we're going to talk about your programming language, you're writing a programming language and an operating system at once, right? And yeah,
00:02:14
Speaker
You're either mad or a genius, so we're going to spend the podcast finding out. I'm not sure the two states of being are discernible from each other. That's true. It's generally in the results that you figure that out. Right. But let's start with your language, which is called hare, and it's firmly in the C tradition of languages, right? Yeah, more or less, I'd say that. What makes it not C? Why wasn't C good enough for you?
00:02:41
Speaker
Well, you know, C is very popular and it has a lot of staying power, but it wasn't exactly designed to, it just kind of accidentally ended up being a staple because it was attached to Unix and it was capable of doing a lot of important things. So I think what distinguishes here from C is 50 years worth of hindsight that C never had. And so it's got a bunch of features which address shortcomings in C, which is things like
00:03:08
Speaker
improved safety features, improved error handling, a dramatically improved standard library, things like that.

Safety and Error Handling in Hare

00:03:15
Speaker
Okay, well, I mean, the thing that raises is 50 years of hindsight, there's a heck of a lot to choose from. Oh, yes. And you could look at something like, I don't know, you could look at Haskell and Idris, for instance, and they'd say our 50 years of hindsight says, let's throw mathematics at it. Right. What do you learn from 50 years of hindsight?
00:03:39
Speaker
Well, you know, I would disagree with the Haskell group on the basis that functional programmers tend to imagine an ivory tower where everything is made of pure math. And I imagine the computer, which is sitting in front of me and try to think about what it can do. You have to excuse the jab at the functional programmers. I do appreciate that. I will neither defend nor be offended. I want to know what you think of the world.
00:04:05
Speaker
Yeah, yeah. I mean, so when I look at C, I see a lot of things which are important and which some of them can be attributed to its successes. Some of them are better perceived as failings. And I think you kind of see something in C, which is like it came around from Unix and Unix was kind of designed for a certain purpose and in a certain time.
00:04:27
Speaker
where you see these trends in older operating systems where there's a little bit less thought in the design and a little bit more about what can we make work. And at the time it was great and it made Unix possible, but you see a lot of these baggages surviving on. And so some of the things we can learn from C is a broad applicability to many problems.
00:04:49
Speaker
a really great sense of portability, a nice standard in many implementations, and access to low-level primitives, which is all very important. And we can also see things which we don't as much like. So we see a lot of the baggage of Unix and POSIX, which is a little bit iffier. Things like a global error no is kind of like an example of a weak decision. And we've seen in the time since error handling become
00:05:13
Speaker
understood to be much more important than it was understood then. So there's an opportunity there to revise that design with its importance in mind and with a more deliberate approach. So it's things like error handling and memory safety and some of the craft of the standard library, which was evolved organically more so than it was designed. These are great opportunities to improve. Okay, well, let's tackle error handling. What can you fix about that?
00:05:41
Speaker
Well, the way it works in C is essentially they have a global variable called Erno, which from the outset was quickly determined to be a problem as soon as multithreading got involved. And so now we have a pile of hacks to make that work. But if you do anything, that could fail, like opening a socket or opening a file.
00:06:03
Speaker
It will return, often with a negative number to indicate that it failed or something. And then it will set the error no value to something which describes the error from a global list of possible outcomes that affect everything that could fail. And then the type system doesn't really do anything to make sure that you're understanding the difference between an error case and a successful case. So it's up to the programmer.
00:06:31
Speaker
If they call fopen and it fails, they still get a file pointer in their variable. It's just set to null. And then if they keep using it without doing the extra work to check the error, then they're in trouble. And this is a very common failure of working with C. And then if you have the Riccardi libraries outside of the standard library, they might have their own error conditions and their own approach to error handling, which doesn't exactly fit into what the C library is doing.
00:06:55
Speaker
They can't really expand error node with additional domain specific errors, for example. And so everybody has to come up with creative solutions and none of them make it actually easy to do good error handling in your program. So what have you done? What's your solution?
00:07:12
Speaker
Well, I think that there's a number of ways that people have approached error handling outside of C. And the one that we favor is tagged unions. And the idea of a tagged union is it's a kind of type which can store values of different types. So a single value which is of the type tagged union can be one of several types and only one at a time. So you could, for example, have a tagged union of an integer and a floating point number.
00:07:39
Speaker
And it could store either of those, but it only stores one at a time and it indicates in the value which one it is. So it might say the tag is zero to indicate that the value is an integer or one to indicate its floating point.
00:07:51
Speaker
And we've created first-class support for tag unions. They're kind of ad hoc. So unlike Zig or Rust, where you have to predefine them, and they're kind of like a superset of UNIMs. And here, they're ad hoc. And what you do is you create a tag union of all the success cases and all the error cases. And error cases are set aside special in the type system with a flag that says this is an error type.
00:08:13
Speaker
And if you're handling a value, which is a tag union that could include an error type, there are some additional constraints imposed by the compiler on what you do with that value that basically forces you to do something about the error before you can use the value of the success case. Okay, so
00:08:30
Speaker
Have you got something where, like your file open example, it can return file not found or file not readable or file successfully opened and here is the handle. Yeah, exactly. So in C, the F open calls return type is a file pointer and it can return either a valid file pointer
00:08:50
Speaker
or null, and when it returns null, it separately sets the error number to the cause. Whereas in here, our OSOpen function, which is a similar equivalent, it returns either a valid file handle, or it returns
00:09:07
Speaker
file not found or access denied. And it encodes that information into the return types that the language can be aware of those different cases and tell the programmer they have to address these cases. And they can address it either by explicitly testing for each case and adding different logic. So you might want to do something differently on not found versus access denied. Or you can address it by passing the error further up the call stack and centralize your error handling somewhere else.
00:09:36
Speaker
Or you can use an error assertion, which is just the bang operator, which is an assertion where the programmer says, I promise that I've done the math and I don't think that this error case is actually possible under these conditions. So just assert it that it's not possible and give me the file handle directly, only handle the success case. But then the programming language automatically checks your work. It's an assertion. So if that error actually does come to pass at runtime, it'll terminate your program.
00:10:01
Speaker
Okay, so you can optionally say, I don't want the bookkeeping, but it's okay for you to crash at runtime if I'm wrong.

Hare's Approach to Code Simplicity

00:10:08
Speaker
Exactly. Yeah. Do you still have nulls in the language? We do have nulls, but you can only use them through a pointer type, which has the nullable flag set. And again, if you have a nullable pointer type, using it requires extra work. So you can't just directly dereference the pointer, you have to check for null first. Okay. And do you have exhaustiveness checking on that? Like,
00:10:28
Speaker
We do, yeah. So you can make sure I check all the different cases if I need to. I should clarify, we have some exhaustive testing. We have switch statements for testing against values and match statements for testing against types. And right now, switch statements do have exhaustivity testing. Match statements, the specification requires exhaustivity testing, but we haven't actually implemented it in the compiler yet. Okay, still a work in progress.
00:10:51
Speaker
However, we have implemented an assertion where if you get to the end of a match expression and none of the cases were matched, it will throw it a runtime assertion because that's not supposed to be possible. OK. How about my other big sticking point with C, mistakes made? Yeah. I don't think the macro system is terribly good in C.
00:11:14
Speaker
I agree. I don't think that's a controversial opinion. What have you done with macros? Nothing. We don't have them. You threw them away? Yep. There are some languages like REST which prefer semantic macros, I think they're called, where your macro execution phase of expanding the macro gets a copy of the AST and can do
00:11:35
Speaker
smarter things with macros. And I think that's significantly better than C. But we have decided like, as a goal, we're kind of assuming metaprogramming, generally, and macros are a form of metaprogramming, because we think that for a language where value, simplicity, and transparency, metaprogramming is kind of an antipatter for us. And so instead, we use code generation. Sometimes we have tools that can interpret a DSL or can even rewrite a hair file.
00:12:04
Speaker
an output generated code which fills some of the need where macros are currently used without actually having a macro system. Okay. Is the code generation like a tool built into here or a separate thing that you would use?
00:12:18
Speaker
Somewhere in the middle. So our standard library includes a hair parser. So if you're writing a hair program, you can import a parser for the hair programming language and get an AST. It also includes a type checker. So you can run the type checker and make sure that AST is consistent with the semantics of the language. And then you can also unparse the AST using the standard library. So if you can take an AST and turn it into code.
00:12:42
Speaker
So if you wanted to write a tool which did code generation or manipulated hair code in some way, there are resources from the standard library to help you do that, but there's not exactly like a plug and play code transformation tool. Okay. And is that like the part that turns your newly constructed or newly manipulated AST back into code? Does it turn it back into string, like a file that you write to the system or runnable code that this program can now execute?
00:13:11
Speaker
It just turns it back into a source code string. And it does that by writing it to the file handle abstraction. We have an abstraction where you can do, it can be backed by a file handle opened on the system, like Unix style, or you can implement streams in user space. So you could unpack the AST into a gzip stream into the network socket for some reason. So it's flexible in that sense, but not in the sense that it can produce machine code through this.
00:13:40
Speaker
Okay, what else about you said about the standard library and that's novel having the language sort of built into the standard library. What else have you added to the standard library? We have all sorts of things. We have basically everything you would get from a C standard library is present. So we have support for all of the Unix stuff like groups and parsing etc password files. We also have
00:14:05
Speaker
all the networking, TCP, UDP, DNS. We have a much better DNS API than libc-provise, where you can do arbitrary DNS queries. So we've kind of rethought those abstractions from C. And then we've also added a bunch of other stuff. We have a finite scope for the standard library, but it's a bigger scope than libc. And so we have things like hashing functions. We have FNV, CRC, those kinds of hashes. We also have a cryptography suite, which supports a whole bunch of
00:14:34
Speaker
monocritography primitives that we've implemented ourselves. We also have a fundraiser to audit our implementation, by the way.
00:14:43
Speaker
And we have some other stuff like buffered IO, which is, again, the abstraction has been rethought from the C approach so that those primitives are available for you to use in a more convenient manner. So there's all kinds of stuff in the standard library. I think it has a really good amount of batteries, as it were, especially compared to C, but it's still finite

Hare's Stability and Usage

00:15:02
Speaker
in scope. And then we also have the extended library.
00:15:05
Speaker
which is a collection of libraries that exist outside of the standard library per se, and they make different stability guarantees in the standard library, but they're still under the purview of HAIR, and they're considered important for the ecosystem. So, for example, the HTTP implementation is on the extended library. Right, yeah. That stability leads to another question. What status is HAIR at? Is it production ready? Is it a research project?
00:15:31
Speaker
It's somewhere in the middle. So we have this goal of becoming a 100-year programming language, which is something that we're working towards. So stability is one of the most important values of HAIR. But we are not actually achieved that yet. We want to get to the point where we release HAIR 1.0 and we say,
00:15:50
Speaker
we will promise to forever support backwards and forwards compatibility from this day onwards, which is a big goal. And we're getting there, but we're not there yet. That said, setting aside the overambitious goal of freezing the language forever, which most programming languages don't take on, I would say here is
00:16:13
Speaker
somewhere close to the stability of other languages that don't have that goal. It's still missing a number of things, and we have some plans which could introduce large breaking changes, like we're going to research linear types, for example, for memory safety. And those could break your program. And we also are still sometimes doing large refactorings in the standard library, which break people's code. But it's a language you can use to write serious projects today, and many people are using to write serious projects today.
00:16:42
Speaker
Okay, have you got any examples? What kind of things are people using it for?
00:16:46
Speaker
So the post-market OS project has some people specifically affiliated with the SXMO desktop solution for mobile phones, which are using here to build some tools for mobile users and UIs for mobile users integrated into their desktop environment. I'm also using it to work on a secret storage manager, which is like a generalization of a password manager and that's stable and released and
00:17:15
Speaker
to some extent in general use. People are working on build automation tools and UI toolkits and all kinds of things of that nature. Okay, okay. Let's zoom out because you've mentioned your 100-year language goal. Yeah. And this is something I find really interesting about here. You're saying we've got 50 years of hindsight to learn from, and we're aiming at still being the same language 100 years from now. Yeah.
00:17:41
Speaker
How on earth can you possibly look 100 years ahead and predict what's going to be needed? We can't, and we're aware of that. So something we've acknowledged is that Hair 1.0, we'll do 1.1 and 1.2. We'll keep maintaining it and working on it, but there will not be a 2.0. But Hair, the language as it appears on the day that 1.0 is released, is going to become a time capsule, and we acknowledge that.
00:18:07
Speaker
And we're happy to let other languages continue to develop and work on innovations. And I think that's something that's very valuable for the ecosystem going into the future. But this approach to stability is kind of a counterweight to what I think is a little bit more mainstream right now, which is kind of reckless instability, to be honest.
00:18:28
Speaker
If you, for example, try to find a binary that was built for Linux three years ago and run it today, you're probably not going to work. And that's something we kind of want to
00:18:42
Speaker
correct for. And so if you're writing software that you want to have that kind of longevity, I think HAIR is a solution that we want to be available for you. And it's true that HAIR will become more ideologically obsolete over the next 100 years for sure. But unlike a lot of languages which are going to keep with contemporary language design ideologies, HAIR is still going to work.
00:19:09
Speaker
So I think that's an important difference. You know, I think a lot of languages as they look today, those languages are going to keep evolving, but software which is written today for those languages has to evolve alongside of them. Whereas here, we're hoping that all of the software which is made for here will also have that longevity built into it. So you're saying if imagining you've released 1.0 and someone's written a large program in 1.0. Yeah.
00:19:37
Speaker
Now we fast forward 50 years, and I have to write a new hair compiler for a new architecture of chip architecture that's out there. But you think the program that was written today will still compile to that same spec 50 years from now? I think so. I mean, we have some constraints around exactly how those guarantees work. So for example, if this big program has components written in assembly, obviously you'll have to port those to your new architecture.
00:20:05
Speaker
But any program which is written against the standard library using portable code will still work. The pitch that we give is, on the day that Hair 1.0 is released, if you write a program in Hair in 100 years time, it will still compile for new systems. But we also say, if on the day that Hair 1.0 releases, you take the specification and you write a compiler based on that specification on day one, in 100 years, that compiler will compile contemporary code.
00:20:34
Speaker
Right. Okay. That's the promise. So that instantly makes me think, how on earth are you going to know when it's time to say this is version one? I would be tempted to always put that off just a little bit longer while we iron out the edge cases.
00:20:53
Speaker
Well, basically, we have a plan in mind. We have a fairly fixed number of research targets that we want to evaluate before we consider the scope of the language complete. And that includes things like the linear types example. Also, another thing we want to research is alternative approaches to memory managers. And we also have a handful of little design things that we're kind of playing with.
00:21:20
Speaker
which a similar caveat applies. They're fixed in number for the most part. That list of smaller things still grows, but we don't want it to grow forever. But once we run out of these predetermined design areas that we want to do research in that might change how the language works, we're going to arrive at what we think should be here 1.0.
00:21:41
Speaker
And then we're going to begin a process that we call acceptance testing, where we're going to create multiple teams within the HAIR programming team, which is about 100 people today. We're going to subdivide ourselves into teams based on areas of expertise and domains, and we're going to run over the whole language with a fine-tuned comb. We're going to have areas as narrow as things like
00:22:03
Speaker
networking support or IO, and also things as broad as like Linux support. And these teams are going to evaluate the whole language on those terms and produce reports and make recommendations for things that need to be changed. And this acceptance testing process is expected to take a few years. And that's going to be the majority of the work that goes into here for a while from the official team upstream, as it were.
00:22:32
Speaker
And once we have completed that process, we're going to release 1.0. And that's still probably going to have mistakes. There's going to be some stuff where in 20 years' time, we're going to be like, really, maybe we should have thought this differently. But we're going to do our best. And because this longevity is a goal, I think that we have to be willing to compromise on having imperfect formulas.
00:22:54
Speaker
So you're saying there will be, thinking of the human appendix, there will be parts that maybe in hindsight weren't actually necessary. Yeah. But you're placing above that the guarantee that for the whole hundred years of the language, there will be stability. Yeah. I mean, the compromise is that the language will probably have an appendix for 100 years, but it will also probably work for 100 years. Yeah. Yeah, that's fair enough. Much like us humans. Yeah, exactly. Yeah, roughly. Although not all of us make it to 100.
00:23:24
Speaker
Not all of us have an appendix either. So there we go. Okay, so that you've mentioned this twice now, linear types.

Innovations in Memory and Resource Management

00:23:34
Speaker
I think we have to dig into that because that's the other mistake thing we think we would do differently in hindsight about C is memory management. Yeah. And garbage collection is one solution.
00:23:48
Speaker
which has problems for systems programming. Rust has another, which is borrow checking. What's your solution to it? Presently, we have some features around memory safety, which are more features and less of a comprehensive approach like a borrow checker or a garbage collection. For example, all of our slices, we have slices as a first-class feature. Arrays of slices, indexing them is always bounce-checked by the compiler and will terminate the program rather than do a buffer overflow.
00:24:17
Speaker
We also have some features in the works where we have like working personal concept, but they're not done yet with respect to things like address sanitizers. And so this kind of thing we're working on, but it's less of a comprehensive model like Rust or Go garbage collected languages have.
00:24:34
Speaker
And our answer to the comprehensive approach, should we decide to try that, is to use linear types, possibly. And we might also research a barber checker. But I would say that we are satisfied with the degree of memory safety that is achieved by HAIR today, which is not a comprehensive system, but it's enough that I think you're not going to get nearly all of the mistakes that you get in C.
00:25:00
Speaker
which is a statement that some rust advocates might find controversial, but never flat. Regarding linear types, which is this comprehensive solution we're researching, it's not actually my domain is being led by other hair maintainers. But my layman's understanding is that linear types is basically a system wherein a value of a linear type has to be used exactly once and you can
00:25:27
Speaker
If you use it zero times, you get an error. If you use it two times, you get an error. And so if you need to use a value multiple times, you have to use it in a way which creates a copy of that linear type aspect, kind of like an SSA form. And if you free it, that's using it, so you can't use it after free. And if you forget to free it, like a memory leak, you've used it zero times and that's also invalid. So it kind of checks for those kinds of aspects.
00:25:54
Speaker
of your usage of values. And I think it does provide a relatively comprehensive memory safety. But again, that's not my area of research and some other maintainers working on it. And can you I'm going to risk pushing you on that, even though it's not your area of expertise, then. But can you can you explain how that would work in practice? Because I've dabbled a little bit in linear types. But I think most people won't know why this is an interesting idea.
00:26:19
Speaker
Well, again, with the caveat that this is definitely not my area of expertise. All I've done is read some of their emails about it. But for example, if you want to work with a file, you open a file, you write to the file, you close the file. And one of the possible errors that can occur under that condition is writing to the file after it's closed.
00:26:41
Speaker
And so this is a kind of a generalization away from memory management. This is other kinds of resource management that this technique applies to. And if you open the file, you do error handling, which is separate. But at the end, you get a file handle. And this file handle has a linear type, which means that you must use it exactly once. And if you pass it into the write function to write to the file, that's using it. And so the linear type value is then consumed.
00:27:07
Speaker
And then in order to use it again for another write, the write function will essentially return a copy of the file, which is a brand new value that you again have to use exactly once. And if you pass it into the file closing function, because you're done with it, it's different from write in that it doesn't return
00:27:24
Speaker
a copy of the file handle, it consumes it and then you don't have an opportunity to use it again. So through this approach of use exactly once types, you can't forget to close the file and you can't write to it after you've closed it. So it's like a kind of resource management that that adds these safety features through this kind of only use once approach.
00:27:44
Speaker
Yeah, that makes some sense because we do have this situation where we're managing different resources like network handles. And we've got solutions for managing generic resources like hyphens with, for example. And then we've got solutions for managing memory like Rust's borrow checker, but nobody's really trying to unify that.
00:28:05
Speaker
I think some people are maybe trying. I imagine the rest type system might be able to accommodate this, I'm not sure. But yeah, it's definitely something that I think we're appreciative of, that this generalizes beyond memory management, general purpose resource management.

Governance and Community

00:28:21
Speaker
Okay, so how does that play out in language design? If you're interested in adding that to hair, but you're not doing it, this presumably isn't a benevolent dictator for life model.
00:28:33
Speaker
Well, there is a benevolent dictator for life and that's me, but I try to be as benevolent as possible.
00:28:42
Speaker
That's actually, so my responsibility as the BDFL is explicitly outlined in doc slash bfl.md in the repository, which elaborates on what that role actually means. And then we have a separate document for me tears. But my BDFL approach is as hands off as possible. So I want to be used as a tool that the community can utilize when my feedback is necessary. So that includes, for example, being the financial steward of the project, I'm responsible for all of the money.
00:29:09
Speaker
as it were, and it's something that other people don't have to think about. They can just work on a code. And then I also have the vague role of providing the vision for the project. And I also have the vague role of if there's a conflict in the community, I'm the de facto mediator of that conflict. And if people can't agree, then I'm going to be asked for the final say. But this is something that the community asks me to do because it's a tool that they can use when it's useful.
00:29:38
Speaker
rather than imposing dictatorship kind of thing. I try to give as much autonomy and agency and authority to other contributors as possible. BDFL is separate from maintainer. There's, I think, four or five top-level maintainers with responsibility over the entire project. They have the latitude to make a lot of decisions.
00:30:01
Speaker
Without my input and we also have an RFC process which is informal and opt in. So if you want to make a change to hair, you don't need to make an RFC. But if you feel like it's a larger change or requires some discussion, you can opt into making an RFC or somebody can ask you to make an RFC where you can explain.
00:30:20
Speaker
the purpose of your change and have the discussion and seek consensus before you do it. And so I think the linear types research is an example, which is still early and pre RFC, but we have other large changes which go through that RFC process. And if they acquire consensus, you know, there's no explicit approval. It's just read the room if divided is that people like the idea and think it's refined enough that go ahead and write the pageants and not keep talking about it.
00:30:43
Speaker
So if you achieve that informal consensus through an RFC, you're likely to get a large change to go in. And you can go ahead and start writing the code and making the change happen and, you know, do all the planning you need in the RFC discussion. Okay, that sounds very, very soft. I mean, compared to something like an Apache project, that's a very loose approach. And yet you're managing 100 programmers, you say? Well, I've been managing, you know, there's
00:31:10
Speaker
That's the number of people who have commits. Right. It sounds like you're less the benevolent dictator and more the benevolent midwife. Yeah, maybe.
00:31:21
Speaker
Anything, any responsibility that the BDFL traditionally has is annoying because I'm a busy guy. If another contributor is good at doing those kinds of things and wants to take it on board, I'm thrilled about that. Even stuff like community management, we have a conduct enforcement team, which is not just me, it's me and two other contributors who are trusted by the community.
00:31:46
Speaker
BDFL is useful to have. Reserving the last word, as it were, is something that can be definitely useful to the project. But I think it's there because it's useful, not because it's imposed. And the project, for the most part, runs itself. And we establish a culture of consensus and of people taking responsibility for the parts of the project that they're interested in and want to work on, and then also
00:32:12
Speaker
giving them those responsibilities and rewarding their agency in that respect creates for a really healthy dynamic where things can get done in a way which is informal and rewarding for everybody, which is important because they're all volunteers. Yeah, yeah. I can see that working very well for like language capabilities and individual areas of language design. But how do you ensure that it keeps a cohesive whole as a language rather than becoming an archipelago of people's best ideas?
00:32:42
Speaker
So there's a couple of ways. We have, for example, a canonical style guide, and everybody is required to adhere to the style guide. And part of the consensus-making process involves getting your API design approved by the consensus on the basis that is conformant with the design approach that we take everywhere else. But also, there's this really nice thing about here, which is that a lot of people have remarked about this. The language is quite simple.
00:33:11
Speaker
It seems to have exactly the right scope because you don't get enough rope to hang yourself with, but you get enough rope to do what you need. And the consequence of that is that a lot of the time, if you're designing an API in here for the standard library or for your own library or for anything, it usually happens to be the case that the first API design you write is the right one.
00:33:37
Speaker
It's to the effect where there's almost just the one way of writing hair codes. If you have a problem, there's usually one way to solve it in hair, especially in terms of API design. Because of that, there's not a lot of conflict about design style and so on in the standard library because it all tends to converge towards the same self-consistent result. If you're finding that in practice, that's impressive. Yeah, I'm very happy with that. That's one of my favorite traits of what we've accomplished here.
00:34:05
Speaker
Well, maybe that speaks to the, the typical user of hair. If someone's, I mean, there's, there's competition in this space for low level systems languages. Um, and without, without saying anything about the competition directly, like why would someone be drawn to hair more than say rust, go Zig or just see.
00:34:29
Speaker
I think hair has a lot of advantages in that respect. It certainly has a lot of advantages over a seat.
00:34:42
Speaker
For example, somebody with a C++ background might prefer something with a similar level of complexity or tools that they can use. And hair is really not for that. It's for people who prefer the simplicity. So people who are using C and they're not held hostage by C because of their job or their target use case or whatever. They like using C or they have Stockholm Syndrome maybe. Those people, I think, are the target audience for C who value that simplicity.
00:35:09
Speaker
Also, it's just a really fun language to use. I think the standard library provides a perfect number of batteries where a lot of stuff you want to do is going to be accommodated by the standard library, and it's going to feel good to use. We have really nice tooling. So we have a command line documentation viewer, which is super pleasant to use as part of your daily workflow. We have super fast build times. You can actually bootstrap the entire tool chain, including the back end and the front end,
00:35:36
Speaker
from scratch, running all of the test suites in less than five minutes. So it's fast, it's simple, it's easy to use, it's fun to use. I think that appeals to a lot of people in our community. How many different architectures does it compile to? Presently three. It supports x86-64, ARM-64, and RISC-5-64. I'm just wondering, if someone wanted to add a fourth to that, how large a language does it turn out to be as an implementer?
00:36:06
Speaker
Um, so, uh, new architectures are supported through our backend, which is called cube, um, written by Quentin Carboneau. I think it's spelled Q B E. I've never actually said his name aloud. So I really hope that I pronounced it right.
00:36:22
Speaker
And it's a weird code base. I'll be honest with you. It's a bit of a strange code base. It's a perfect tool for hair, and it makes a lot of sense for a use case. But getting into it, we have a number of hair contributors who work inside of Cube as well, and it's definitely a different experience.
00:36:38
Speaker
But adding support for a new architecture is not super difficult, but almost all of the work is going to end up there. You have a little bit of work to port the hair standard library to an architecture and a couple of other places that might need to be updated. But mainly, you're going to be putting all your effort into Kube. And RISC-V support actually came about because we asked
00:37:03
Speaker
Michael Forney, a really smart guy, to actually implement a RISC-V back-end for Cube so we could use it in hair. And this took him, he's a really skilled programmer, to be fair, but it took him less than six months. And in the end, the RISC-V code is 2,000 lines of code or something. And so implementing a new architecture, non-trivial, but also not super hard.
00:37:31
Speaker
OK, yeah, because it must happen. If your vision comes to fruition over the next 100 years, there will be new architectures, right? Yeah. And we plan to port it to a few more ourselves. We want to do PowerPC. We want to do some 32-bit targets. 32-bit targets, in particular, are going to be a little bit more challenging. That will require not only adding it to Qt, but also doing some refactoring in our compiler. 64-bit targets should be pretty straightforward. I think if you just add those to Qt, that's most of the work done.
00:37:59
Speaker
And we're going to target for the primary hair standard specification, 32-bit add up for architectures that we support, but we're also going to make a supplementary specification which loosens a lot of the rules to support 16 and 8-bit targets possibly without guarantees of portability between the two. And for that purpose, we might do some other fun ports like Z80 or something.
00:38:22
Speaker
Okay, so there's still hope of me getting hair on my Commodore 64. Yeah, it could be in the future. But you know, you could also maybe forgive us for not prioritizing it. Yeah, I think I will. So one architecture I know you're going to be one architecture, what one target I know you're going to be compiling to is your other project, if you're not busy enough already, is areas your operating system.

Aries OS: Testing Hare's Strengths

00:38:47
Speaker
Yes, I'm writing an operating system in the hair and it has support for running hair programs in user space before it has support for running C programs in user space. I sort of hope. So, I mean, it's tempting to talk about the life management issues when you're already busy enough. But tell me about the OS. When you're already so busy, why add writing an operating system to your work stack?
00:39:15
Speaker
Well, in a way, working on this operating system is contributing to hair, because one of the explicit design goals for hair from the start was we want to be able to write an operating system. We want it to be useful for kernel development. And we have to actually write an operating system and it to prove that we've accomplished that, especially if we're going to freeze it for 100 years. And then 20 years later, we try to write an OS. Whoops, actually, it's not possible. Yeah.
00:39:40
Speaker
But also, like I said, another hair project almost runs itself. Like I don't have to do a whole bunch of work because I endow everybody with responsibilities and agency over the project. And I try to do this in all of my projects that have an alert of community. I try to empower the community so it's not a huge drain on my personal resources. And also I work full-time in free software. I have my own software company and it's part of our mission to work on stuff like this. So it's not like I'm
00:40:07
Speaker
doing this in the evenings and after I asked myself what the day job this is my job. But yeah, I'm also writing an operating system because it's really, really fun. How do you get to the point where it's fun? How do you learn the basics enough to get to because it's such a hugely scoped project? I mean, I wouldn't even know which side of the mountain to start climbing.
00:40:31
Speaker
I mean, the first step is to write enough codes that you can boot your PC up and it says Hello World. This is not too difficult, as a matter of fact.
00:40:44
Speaker
I think that there's definitely, writing an operating system is very ambitious, but also quite possible if you're willing to put in the time, and if you're able to put in the time, which not everybody is. But it definitely, it shouldn't be your first programming project, but I think somebody with five to 10 years of experience programming, especially systems programming, is totally well equipped to try writing an operating system.
00:41:08
Speaker
especially if they're comfortable reading CPU architecture manuals and the source material, they're not going to rely so much on wikis or blog posts or whatever. This kind of programmer would probably have success if they wanted to write an OS.
00:41:20
Speaker
Okay, so are you just doing it to exercise hair, or do you hope that your OS becomes, I mean, are you, can you see a future where you're the next Linus Torvald? I really don't think so, I think that it's as far too entrenched, but I am designing it seriously, because at least I want to use it. And anybody else who has similar needs to me might also benefit from using it. So it's general purpose, and it is a serious project, and maybe someday will be useful, but like,
00:41:49
Speaker
I'm not exactly setting like, be the next Linux as a goal, because I don't think it makes sense to do that. And it's also kind of outrageous. True, true. I'm not even sure Linus, well, I'm certain Linus didn't see it becoming mainstream when he started it, right? It was just a hobby project. Yeah. It won't become big and serious like a noob. Yeah, he did say that.
00:42:12
Speaker
So but are you are you trying anything new with it? Or is it just you just want to learn to do that? I am trying something new kind of but in the same spirit of hair and to some extent distilling a lot of other ideas into one package. But I think it's being done in a novel way. So I'm missing a lot of ideas that have never been mixed before and doing them in a way which has never
00:42:35
Speaker
been done before, but it is still somewhat classic operating system design. Although I will say that the operating system design components that I am drawing from never really made it into the mainstream. So it's drawing a lot of inspiration at the kernel level from an operating system called SE04, which is a formally verified operating system microkernel. And it's drawing a lot of inspiration at the user space level from Plan 9, perhaps, more so than Unix, but also doing some of its own stuff, which is different.
00:43:02
Speaker
And so it incorporates a lot of these ideas into something new. And I think it's distinct from those ideas, but also those are ideas that a lot of people really haven't been exposed to in the mainstream. So it's going to seem really powerful to a lot of people. Okay, we'll teach us a few of them. What what ideas do we not know about operating systems that we really should?
00:43:21
Speaker
Okay, so for example, from the SEL4 inspiration, the implementation of Herat is of Helios, is the name of the kernel of the project. The implementation of Helios is quite unique and diverges substantially from SEL4, but like the kernel user space API, it also diverges significantly from SEL4, but it's very clearly inspired by SEL4 in terms of the kernel API design.
00:43:48
Speaker
And the way that works is with capabilities. So it uses capability-based security, where a capability is an object which represents an unforgeable handle to some resource on the system. And this is analogous to a file descriptor in Linux. So if you have a Linux program which opens a file, you get a file descriptor, which is a handle. It's just a number. And it's backed in the kernel.
00:44:13
Speaker
by all of the memory associated with the file. But if you have that file descriptor, it's an unaffordable right to use that file. So if you call fopen on a file which is owned by root, you get a file handle, and then you, for example, set UID to a non-root user. Your program couldn't open it again, but you already have the file handle, and that's your right to use the file.
00:44:33
Speaker
capabilities are much more general. So everything in Helios, the kernel, is represented as a capability. And that includes things like memory. So if you want to allocate memory, you have to have a memory capability, which establishes your right to use some memory.
00:44:49
Speaker
But also everything else. There's an idea called endpoints. It's very important for a microkernel to have good inter-process communication support. So an endpoint allows you to exchange messages with another process. And if it's a driver, for example, it's running in user space.
00:45:04
Speaker
And if you have a file open, which is ultimately stored on a disk, you have a capability for that file, which is analogous to the file descriptor. And when you write to it, it sends a message to the file system driver. The file system driver has a capability for the block device that the file system is stored on, and then it forwards those writes to the block device.
00:45:22
Speaker
The block device has the capability for the IO ports and the PCI memory mappings that it needs to actually implement the device driver. And so each process in the stack from the user space all the way down to device drivers is running in user space in a sandbox where it only has access to the resources it needs through capabilities. So you're saying that
00:45:47
Speaker
That sort of implies that it's a very small kernel. Most things are in user space and the main job of the kernel is to manage who has the right kind of sandbox.
00:45:59
Speaker
Yes, the kernel, I think the Helios kernel is about 10,000 lines of code, and it's mostly feature complete at this point. It's not done, but it's close to feature complete in 10,000 lines of code, and it's responsible for enforcing capabilities by being the thing en route that does message exchange for IPC running in kernel space. It's also responsible for anything the kernel is the only one that can do so, for example, managing
00:46:26
Speaker
memory maps and virtual address spaces owned by user space, the actual configuring of those. It provides APIs for user space that has a memory map capability, for example, can ask the kernel to do a memory mapping. But actually doing that operation requires you to be in kernel space on x86 and so on. So that ends up in the kernel. But it has also the only other responsibility it has is bootstrapping the system because it's the first thing that runs on boot. So it does those three things and nothing else. Everything else is in user space.
00:46:57
Speaker
Right, so this is getting around the problem where I can, if I want to write, I don't know, a new device driver for Linux, I get, I can, I can accidentally or maliciously access an area of memory that I'm not allowed to and just muck around with it.
00:47:15
Speaker
Yeah, like if you have a floppy disk driver on Linux, it's compiled into the whole kernel and it's running in ring 0 with the rest of the kernel. And if there's a bug in your floppy disk driver, we can do anything it wants to completely compromise your system. On Aries, your floppy disk driver is running in the user space and only has capabilities for talking to the floppy disk device. And the worst it could do if it had a bug is overwrite your floppy disk. Right, yeah. That seems like it'd be very useful for security, right? Oh, yes.
00:47:47
Speaker
I mean, it's tempting to ask why that idea hasn't gone mainstream, but I suspect the answer is because operation systems are huge, and you can't change them very easily. There is a number of microchronals which have been built. And to some significant degrees of support, even SEO4, this microchronal, which I drew a lot of inspiration from, has a project called Genode, which provides a more or less complete POSIX environment on top of it.
00:48:16
Speaker
But I think the really main reason why we don't see this much in the mainstream is operating systems have a lot of inertia. Yeah. Yeah, you can't just re architect Linux and you can't come up with something that's Linux ish with a new architecture and expect people to install it tomorrow. And you have to acknowledge the fact that a lot of the software
00:48:37
Speaker
that people are using today is not going to be portable to a new operating system, especially because, for example, you could implement the Linux syscall ABI, and some people have done this. There's an emulator for it on BSD, for example. But at that point, you're designing an operating system within the constraints that Linux was designed under, and you're really constrained in your ability to innovate. But if you don't do that, then you lose access to a lot of contemporary software. So it is definitely a challenge to try and get adoption for a new operating system, which is part of why it's really not among my goals.
00:49:07
Speaker
Okay, fair enough. What about concurrency? Is that is that a goal of your microkernel? I mean, it's a thing that micro kernels have to do, I guess, because it facilitates multi processing, it's, I guess, this is the fourth thing it does is it facilitates multi processing. So
00:49:28
Speaker
We have support for threads and multi-threading and multi-processing implemented in areas, which is kind of a requirement for a modern kernel to work. You could do like cooperative multitasking, but it really doesn't make sense, especially on a multi-core system. But that is present. We do implement threads, or we implement tasks, which is kind of a generalization of both threads and processes. And then we have user space used as tasks to implement threads and processes.
00:49:57
Speaker
But yes, concurrency is supported by the system. Within that 10,000 lines of code? Yeah, yeah. We don't have SMP support, symmetric multiprocessing, but we do have preemptive multitasking. OK. So if I wanted to learn how to write my own OS, would this be a good reference guide?
00:50:15
Speaker
Possibly. It's very small and self-contained, and it's quite straightforward to follow where the code goes and see places where you could make changes. There's also a great resource for getting started with operating system development. It really doesn't take you all the way, but it definitely gets you started, which is the OS Dev Wiki.
00:50:35
Speaker
And the OSDB IRC channel is also a great resource to talk to people who work on operating systems. So anybody who's curious about learning an operating system and operating systems and techniques, definitely the Helios source code could be a great resource. It definitely demonstrates how to do a lot of things very simply. And then the OSDB Wiki for sure is a place you need to be depending on. OK, I'll link to those in the show notes. Cool.
00:50:58
Speaker
If someone's not feeling quite that ambitious but wants to tinker with this, what state is Aries in right now? What can I run on it? It's still very much under development ahead of having a useful user space, but I do have a laptop with Aries installed, and it supports ext4 file systems and fat file systems and virtual file system. A very basic virtual file system is there, so you can mount them and you can set up a boot process which gets somewhere.
00:51:25
Speaker
an early version of the shell is working, and I've ported the Ed text editor. And it's at least good enough that I use Ed to write a blog post about Aries, from Aries, on Aries. So you can definitely take care of it. I'm gonna, I ported Doom to it a long time ago, and it doesn't work anymore, but I'll fix the Doom port so you can play Doom with it if you want. Did you port Doom before you ported an editor? Yes. Hero. Nice.
00:51:54
Speaker
I might I might let you call me once you've got VIM support. I think that's my threshold. Okay, VIM is going to take a while. Especially because we don't intend to implement like a Unix style TTY that VIM depends on.
00:52:07
Speaker
Okay. But yeah, you can mess with it. And if you want to get involved in the project without having to necessarily commit to learning how to be an operating systems developer from scratch, a great place to do that is write a driver. Because the driver is very strange, you don't have to worry about booting the system or dealing with memory management, low level stuff so much.
00:52:28
Speaker
You just declare in your driver manifest what resources you need from the system, like PCI memory mappings and things like that, and then you write a normal program. It goes to main and runs, and you can use that to implement your driver. And that's a great target. Like the PS2 keyboard driver, for example, is like 800 lines of code max. Quality. I have a friend who works on emulation of the Amiga operating system, and he would probably find this a lot of fun.
00:52:57
Speaker
Yeah, a bit. Yeah, yeah. And pulling back to something perhaps a bit more production ready. Hair, is that something I could reasonably start to make production projects in?
00:53:08
Speaker
You can. You're definitely going to have to be aware that like, you know, subscribe to the hair announce and the hair users mailing list where we tell people like we broke this module in the standard library to make it better. Here's how you update your code. But people are using it already to make production ready stuff. And it's, there's a great tutorial on the website and there's a great community to help you learn how to use it. Definitely. If you're interested in the project, you can pick it up and write code today.
00:53:32
Speaker
Awesome. I'll link to those in the show notes. And I think it's time to get coding. All right, Drew, thanks very much for joining us. Yeah, thank you so much for having me. This was great. Thank you very much, Drew. So all of that leaves me with a new goal in life. What I have to do is learn hair, and then write a lisp compiler in hair, purely so I can refer to the project as my hair brain scheme.
00:53:57
Speaker
Before that pun sends you running for the hills, please take a moment to like or rate this episode if you've enjoyed it. Share it with a friend, subscribe, notify, follow, all those good things, because we back soon with another episode. Until then, I've been your host, Chris Jenkins. This has been Developer Voices with Drew Devault. Thanks for listening.