Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Making Software Crash Before It Breaks (with Isaac Van Doren) image

Making Software Crash Before It Breaks (with Isaac Van Doren)

Developer Voices
Avatar
2.1k Plays1 day ago

At 23, Isaac is already jaded about software reliability - and frankly, he's got good reason to be. When your grandmother can't access her medical records because a username change broke the entire system, when bugs routinely make people's lives harder, you start to wonder: why do we just accept that software is broken most of the time?

Isaac's answer isn't just better testing - it's a whole toolkit of techniques working together. He's advocating for scattering "little bombs" throughout your code via runtime assertions, adding in the right amount of static typing, building feedback loops that page you when invariants break, and running nightly SQL queries to catch the bugs that slip through everything else. All building what he sees as a pyramid of software reliability.

Weaving into that, we also dive into the Roc programming language, its unique platform architecture that tailors development to specific domains. Software reliability isn’t just about the end user experience - Roc feeds in the idea we can make reliability easier by tailoring the language domain to the problem at hand.

Isaac’s Homepage: https://isaacvando.com/

Episode on Property Testing: https://youtu.be/wHJZ0icwSkc

Property Testing Walkthrough: https://youtu.be/4bpc8NpNHRc

Support Developer Voices on Patreon: https://patreon.com/DeveloperVoices

Support Developer Voices on YouTube: https://www.youtube.com/@developervoices/join

Isaac on LinkedIn: https://www.linkedin.com/in/isaacvando/

Kris on Bluesky: https://bsky.app/profile/krisajenkins.bsky.social

Kris on Mastodon: http://mastodon.social/@krisajenkins

Kris on LinkedIn: https://www.linkedin.com/in/krisjenkins/

Recommended
Transcript

Introduction to Software Reliability Challenges

00:00:00
Speaker
We all want to write reliable software, but how? I often think of software reliability in terms of time. If there's a bug, you are going to find out about it eventually, and the pain of that bug is often proportional to the time it takes to bite you. The longer it takes to realise, the more it's going to hurt.
00:00:20
Speaker
The most painful time to find out about a bug is when lots of users have hit it and started complaining. Faster than that, slightly less painful, is when you get paged at three in the morning because the server crashed.
00:00:35
Speaker
That still hurts, but at least the users are asleep. It's not quite as public. It's going to hurt less. Tighten the feedback loop even more to before it goes into production, and you've got things like QA departments reporting the errors.
00:00:50
Speaker
And I find that painful. I find it kind of embarrassing that I didn't realize before it went out, but it's less painful. And you keep trying to pull that feedback loop as far as you can. And eventually you get to things like a Haskell compiler or a Rock compiler, which can tell you you've done it wrong almost immediately and won't hesitate to do so.
00:01:12
Speaker
It's like being slapped in the face privately. And that is about as painless as it's going to get. My central thesis is when you've got bugs, pain is guaranteed, but timing makes all the difference.
00:01:25
Speaker
Does that sound familiar? Does it sound jaded? Does it sound realistic? Let's leapfrog over that question and talk to someone who is at least optimistic about techniques for getting tighter feedback loops and in the end, more reliable software.

Meet Isaac Van Doren: Enhancing Software Reliability

00:01:41
Speaker
I'm joined this week by Isaac Van Doren, who is definitely a bit wounded from bugs in the field, definitely wants to make things better. And is a fan of techniques ranging from the familiar but somewhat overlooked ones, like assertions, through to whole system monitoring, to some more exotic techniques like the way Rock builds tailored platforms for really focused environments, or the way Zig does runtime bounds checking.
00:02:11
Speaker
How much should you do about software reliability and what should you ask your language and your environment to do for you? If you want more reliable software, or at least less pain when it's unreliable, Isaac has some suggestions.
00:02:26
Speaker
Before we get started on that, this episode was recorded live at CobraMix25 over in Miami. So thank you to Modern and OpenRewrite for being our hosts for this. And as if the gods were messing with us, ironically enough for an episode about reliability, my camera overheated partway into this episode and it cuts out for a few minutes. So if you're watching this as well as listening to it, my apologies, it does come back after a few minutes.
00:02:54
Speaker
No, I was not tempted to use AI to try and fake the footage back in. Yes, I was slightly tempted to try and do it with sock puppets for the laughs, but I think it's best that I resisted that temptation.
00:03:06
Speaker
Let's go and get to the real human beings.

The Impact of Software Unreliability on Daily Life

00:03:08
Speaker
I'm your host, Chris Jenkins. This is Developer Voices, and today's voice is Isaac Van Doren.
00:03:26
Speaker
Joined today by Isaac. Isaac, how are you doing? You've just come off giving your talk. How are you feeling about it? How'd it go? Doing great. I think it went pretty well. And it it always feels wonderful to be done with the talk. So I'm riding that high, knowing I don't have to think about it anymore. It's freedom from my brain. I'm glad if you like i'm glad you consider doing this to not be part of the stressful process.
00:03:48
Speaker
you know this i've I've gotten half of it done. so yeah'll i I'll go easy on you. I'll give you a little bit of a brillo. We'll see how it goes. No, it's a brillo. Don't go easy. Okay. so i I was thinking, you said in during your talk, you mentioned that you're 23.
00:04:06
Speaker
And you've which struck me as quite a young age used to be so jaded about the reliability of software that most of the stuff you're doing is trying to make it more reliable.
00:04:17
Speaker
yeah Yeah. Have you been bitten really badly or are there other constraints in your life? I don't know. I was thinking about that earlier today, actually thinking about why do I care so much about software being reliable? And I don't know that I have that great of an answer, but if I have to take a stab at it, I would say that even if it hasn't bitten me horribly, every day I interact with software that's broken in some capacity. And to give an example of this, my grandmother,
00:04:44
Speaker
cannot get into her medical records software. we we So we were helping her get set up so she could access her records. We created an account. it It was a little bit rough, but it worked. and then But it gave her this massive auto-generated username, a ton of characters in it. So my wife said, OK, we'll give you a nicer username that can remember.
00:05:02
Speaker
We tried doing that. Went with the same exact steps there, flowed to change the username. locked out. And it's been multiple months now and she can't go and to access her medical records. So she instead drives over to the clinic and asks them to look things up for her.
00:05:15
Speaker
So that's just one example. And and that was completely unsurprising. no No one was surprised. It's just, oh yeah, this is how software works. It breaks all the time. So I don't know. It's just like the common experience of using any software is that it breaks and crashes and does weird buggy things all the time. And I don't, I, that's an unacceptable standard for me. Yeah, it's definitely like I get the same thing with my father-in-law, right? He just expects software software to be broken all the time.
00:05:44
Speaker
And he comes to me for tech support and he's constantly saying, like, how do people that don't have a geek in their family cope? They just suffer. They just suffer. So and i think that's an important thing to point out here is that sometimes we think,
00:05:58
Speaker
the number of bugs in my backlog. It's just this sort of abstract, amorphous, you know, oh, we have some issues. But these issues really impact people's lives. You know, like I work at a company where it helps people pay for healthcare.
00:06:12
Speaker
So if we do something wrong, it could mean that someone can't get the healthcare they need. Or maybe if we charge them the wrong amount, you know, well, then they're losing money, then we're breaking the law, then we're giving them a horrible experience. So software being broken...
00:06:25
Speaker
often has a really severe consequences. Yeah, yeah, yeah. I'm trying not to... This is a big cultural dividing line between Europe and America. Paying for healthcare. care Yes, I can see that's super important. Chess has software that can break as well. Yeah, it definitely does break, but the billing is a separate issue. and Okay, but, yeah.
00:06:43
Speaker
So, if... i mean, we all experience bugs being horrible. I can see why in the in the healthcare industry particularly, you would be...
00:06:56
Speaker
doubly worried about that because it's an important issue whether you're paying directly or not. week What have you discovered? them

Beyond Testing: Advanced Techniques for Reliability

00:07:04
Speaker
let's Let's take it from this angle. So you are young enough to realize that bugs are always going to be with us and causing us pain.
00:07:13
Speaker
But what techniques have you found work for making software more reliable? Absolutely. Well, I... and as As a starting point, I think when everyone thinks about software liability, they start with testing.
00:07:27
Speaker
They think there is a bug. That means we need to write more tests or better tests. For a while, we were told that was going to solve all bugs forever. And it's absolutely true that testing is incredibly important.
00:07:38
Speaker
Many times, bugs slip through because we have bad tests or not enough tests. um But that's just like one one slice of the huge pie of software reliability techniques.
00:07:52
Speaker
And i think there's a lot to be gained from taking more of that pie and and introducing other techniques that work together for software reliability. So one that I've introduced recently at work is the use of assertions.
00:08:06
Speaker
And so we run assertions in production okay that if an unexpected condition happens, they will cause that code path to work. fail very loudly and explicitly.
00:08:18
Speaker
um And there there are other things I can talk about, but maybe we can start with that. Now I'm going to dig into assertions because i I see the use of assertions, but they are like... they'd be Somewhere beyond guardrails into they lie somewhere between beyond between a guardrail and just a seg fault.
00:08:38
Speaker
Right. Stop the world. Something I didn't expect to happen. Let's just crash the whole thing. Right. Do you find that in practice they catch things in a good way?
00:08:49
Speaker
Absolutely. you know and and Convince me. And i'm i'm I'm glad that you're skeptical because i think I think you should be skeptical because what I'm suggesting is that you take a bunch of little bombs and scatter them throughout your code base. That's how it feels. And that's how you should feel because that is literally what we're dealing with. I do that in Rust and I type panic. it's not right yeah but It should make me panic as well as the code.
00:09:09
Speaker
so ah to To motivate, it's part of the reason that I wanted to start using assertions is that I noticed ah scenarios where there was code that existed that I was looking at. Code new to me. i was trying to understand it, figure out how to modify it in some way.
00:09:25
Speaker
And I look at it. It's just a big blob of code. I don't know what's going on. And then after I dig into it more, I find out this chunk of code only exists here to handle some possibility that should never happen. If if that ever happens, it means there is a bug. That is the only explanation.
00:09:40
Speaker
So why do we have this extra code it gumming up the the code base that's only there to handle something that should never happen. And then there are other situations where, well, okay, so that's that's sort of the understandability piece, that when you use assertions, then you it makes it easier to understand your code, which we should come back

Implementing Assertions: Strategies and Results

00:10:01
Speaker
to that. But on on the reliability piece, the main motivator there for me is that if something that should be impossible happens,
00:10:12
Speaker
All bets are off. I have no idea what's going to happen. Something really bad could happen. It could be that ah now I charge someone twice as much as they should have been charged. that could That could absolutely happen. Stuff like that happens all the time. yeah I would much rather explode, stop execution in a loud way where I can say, that's a bug. I need to fix that instead of letting that code count going on.
00:10:34
Speaker
and and cause some worse failure. And the trouble with a failure like that is you might not even notice. And i lots of software is broken in ways that we don't even have any idea about. I am sure that there have been many times that I've been charged too much that I don't even know about. Because what way would I have to determine that I've been charged too much? Or I'd have to be combing through everything so closely to identify that. So part of this is is pulling these bugs out sooner instead of letting them continue on further.
00:11:04
Speaker
That means, okay, so that means the thing of two things. The first is the technical one. Is there a distinction between an assertion and just throwing an unchecked exception?
00:11:15
Speaker
Absolutely, yeah. So that that's one that's one of the main motivations is that when something impossible happens, I want to know this is a bug, not this is a normal error that I expect might happen.
00:11:27
Speaker
so in Okay. and that's that's part of the point is that...
00:11:33
Speaker
Lots of code bases will have tons of errors showing up because things that you expect to happen, happen, that you can't handle, that they're somehow problematic. And if you say, this is a bug, and I'm going to throw that same sort of error, just throw it into the pool, well, someone else might handle that error down the line.
00:11:51
Speaker
Or maybe it bubbles all the way all the way to the top, and then it just goes into the sea of red in Datadog or whatever. And you have no idea that. That's a bug. So I wanted a way to say specifically, if this happens, it is a bug.
00:12:03
Speaker
So that that's the way we use assertions. If any assertion ever fails in the code base, that means there is a bug. That's the golden rule of writing assertions for me. And so ideally, it means there's a bug in the code somewhere else.
00:12:16
Speaker
But it could also mean there is a bug in your understanding of the program. That assertion, it's actually OK for that condition to happen sometimes. And then the bug is the fact that you put the assertion in there. Yeah.
00:12:26
Speaker
So would you like to see something like, would you like to see language support for this kind of thing? Like Rust has unchecked exceptions where you can say, this is definitely a bug because an assertion has failed. but you can also say, this is a bug because we haven't written that code yet. To do.
00:12:42
Speaker
Right, right. Yeah, it's certainly useful to have language support. And I'm glad that languages have support for it. In in our case, we use Java. And the way we've chosen to implement it, so Java has built-in assertions.
00:12:53
Speaker
When one of those assertions fails, it throws an assertion error. That's an of type error, not of type exception. And so those aren't normally caught if you do catch exception.
00:13:06
Speaker
and so But we chose to roll our own for reasons I can get into. ah So we throw our own type of error. It's an invariant violation error is what we call it. So that it can't be caught anywhere else. And we know this specifically means there is a bug.
00:13:20
Speaker
I'm going straight into why write your own theme. It might have been fine to use the built-in assertions, but part of my... I was a little bit concerned because um ah oftentimes the methodology for using assertions is I'll write these, I'll put these in my code, I'll have them turned on in development, but then I'll turn them off when I get to production because i don't want these to blow up in my face or I don't want the performance penalty.
00:13:46
Speaker
That's totally backwards because then you're not getting any validation that these things aren't happening in production. yeah So my stance is instead you should run them in every environment. That's that's like rule number two.
00:13:57
Speaker
All assertions should run under any circumstances because then never have to worry about, oh, if they're ah you know inconsistent performance between different environments or inconsistent behavior because your assertions are actually somehow contributing to behavior in europe of your code in a way that you don't understand.
00:14:14
Speaker
And so given that there's that attitude of not running assertions in production, my concern was that some of the libraries we were using might have been using assertions. in a way that we would not want to be running in production, say in one that would impact performance too much. And I don't know, if that may not have been the case. It could have been fine, but it was incredibly easy to roll around. I mean, we just created a ah class that with a method that takes a Boolean and a string describing what happened and it throws the error if something went wrong. did you Did you suggest this at work? Did you get buy-in? Are you somehow policing the use of assertions?
00:14:50
Speaker
I don't know if I'd say I'm policing it, but I did. i'd say we have an RFC process where anyone can propose some sort of change to the process. You know get we have an argument for it, and then people can comment on that. And then once that goes if there's agreement, then you go forward and implement it. So I proposed one of those, implemented it. we We have two big main monolithic, might well, not necessarily. We have two big main products. so m west yeah It's been implemented in both of those products. And I'm certainly the person who's used it the most so far. We've had it been going for about six months or so. but i think
00:15:25
Speaker
An engineer from about every team, every of the eight or so teams has used them. So it's wherere we're getting we're getting them adopted. I have to ask, has have you ever had production code crash because of the assertion?
00:15:38
Speaker
I assume you have. yeah I wonder how that was taken because now the application has crashed and it's kind of got your name on it. Yeah, no, that's ah that's a great point. I've certainly thought like, man, I hope this doesn't cause some horrible incident or something. But no, it has not happened very much.
00:15:52
Speaker
And when it has happened, we've been glad of it. So early on, an assertion that I wrote... failed. And I was surprised because I was encoding this expectation I had about how the system worked.
00:16:04
Speaker
And I looked into it, and it uncovered this bug with payments that was and impacting like 500 users that had been in the system for three or four years. And no one had even noticed because it was small enough group.
00:16:17
Speaker
And fortunately, it didn't manifest in any way that caused real harm, but it certainly could have. It it was not ah not a great bug to have in there. so that And that happened kind of early on. And I think it's put things set things up in in a nice direction where i could say look this caught a real bug that was quite bad so now yeah you know you should be happy that we have these in place this always makes me think like sooner or later bugs in software will bite you and the worst possible time for them to bite you is you've heard about it from a user that's complaining and anytime you can pull that back to earlier in the cycle but that makes me think like
00:16:56
Speaker
you You, I know, are a fan of ROC and statically typed functional languages. So let me just say something deliberately and sendery to see what you say to it. You could argue that runtime assertions are a poor man's type system, and we should be catching these things at compile time.

Assertions vs. Type Systems in Error Management

00:17:15
Speaker
is What do you think of that? How do you feel about that? Well, it's... You, the statically typed functional programmer, of course, would say that. yeah I anticipated this beforehand.
00:17:25
Speaker
And there because there's a lot of truth to it, if you can catch something statically at compile time, you absolutely should catch it there. And that's, in the talk I just gave, that's one of the things I talk about a lot is, hey, look, we can build languages that catch more for us statically.
00:17:38
Speaker
So the idea with, I think of it as this sort of funnel here where at the top, The best thing you can possibly do is to rule out errors at compile time.
00:17:49
Speaker
They'll never show up. It's completely obvious. You don't have to worry about them ever. But that only gets you so far. So the next layer is that use assertions at runtime, because there's so many properties that can't be encoded in the type system.
00:18:01
Speaker
So I gave an example in the talk about, say, you know when when you use tag unions in a functional language, you can encode perhaps more than people think, like yeah explicitly saying which combinations of fields being present are allowed, for example.
00:18:16
Speaker
Then you never have to worry about a null pointer exception or something like that because you're encoding the logic about when these fields can be present or not with the type system. But there are so many properties that you can't encode with the type system, like most of the properties. This is where I think that the functional programmers go wrong.
00:18:33
Speaker
The thought is, look, we can build a better type system that catches more stuff. So we just need to build an even better type system. And until the type system catches all of the errors, I don't think that's at all realistic. you know I'm sure there's more room to innovate. you know We can have dependently typed languages that can check the lengths of list or things lists or things like that.
00:18:53
Speaker
But that doesn't mean the type system is going to be able to check all of these business rules that apply to your specific data model or you know that that ah the date is not and that must always be in the future is in fact in the future and not in the past or that you're doing your math properly or things like that. Those start to get outside of the the purview of what the type system ah can prove for you.
00:19:16
Speaker
Okay. So you you're on that spectrum of catch it early as you can but no earlier. ah I'm not sure how how you would catch. Well, i yeah, I'm certainly of the faning you said yeah of of the mind that you should catch them as early as possible using your type system.
00:19:32
Speaker
But the trouble is that then we we run out of room where the type system can't help us anymore. And then we don't catch the errors at all. And we just let them run rampant in the code base. Yeah. Yeah. That's definitely a mistake. And so so with with a type system, you get instantaneous feedback. You find about the find out about the error before it makes it out into production.
00:19:52
Speaker
Then I say, once you run out of room there, start using assertions. Then you can say, want to find out about this bug the moment it happens instead of a month later.
00:20:05
Speaker
and we can talk about even a third layer of this pyramid that I have, which is you can't even ah ah enforce all of your properties with runtime assertions for perhaps performance reasons or they need to check too much data.
00:20:18
Speaker
So then you should do asynchronous validation that happens maybe on a nightly cadence or things like that. and But but before before we get into this, there's more I need to say about about this direction.

Contextual Application of Assertions

00:20:28
Speaker
So yeah first,
00:20:31
Speaker
The way your assertions cause your code to fail should be very tailored to the domain you're working in. so explain um In, we're writing web servers.
00:20:42
Speaker
That's where we're using assertions. I don't, I definitely don't want the whole web server to go down if one assertion fails. So if an assertion fails, we have it implemented such that it'll be handled at the top layer.
00:20:54
Speaker
So that whole request will fail with 500, but nothing else in the server will be impacted. So this doesn't mean if something goes wrong, production is completely down. Everything is on fire. It means that one branch encountered something impossible, a bug. Now that request is going to fail.
00:21:10
Speaker
So, you know, hopefully your whole camera doesn't fail or something like that. Maybe if there are different systems, the subsystem in the camera that run into ran into that error will fail and restart. This is why we have a camera that can fail. We have backup audio recorders, right? Yeah. um And then the the second thing is that assertions are really just a name to use to talk about the the more general idea, which is thinking about the invariance that must hold in your system and enforcing them. Or you might say the properties. When when I say invariant, I mean some rule or some statement you can make about your system that must always be true under every circumstance.
00:21:47
Speaker
And that's really why I want to use assertions. I want to push people to think about what invariance apply in your system and then make sure that they're being enforced.
00:22:00
Speaker
Do you have guidelines, like even rules of thumb for what makes for a good assertion, and what makes for a good invariant to look for? Yeah, well, my main rule, my golden rule is that if an assertion fails, it means there's a bug.
00:22:15
Speaker
And so that that rules out, to unpack some of the consequences, that means that you should never be using assertions to validate data coming from an external data source. okay yeah nice say So you you expect that people are going to put bad data into your API. You need to have validation there that it does the expected path of error handling.
00:22:32
Speaker
But if you're using in some internal part of your code base that has entry points from various APIs, and you end up with some nonsensical data, That indicates that there's a bug because you're missing validation at one of those outer layers. So that is a reasonable place to use an assertion. Okay, yeah, can see that.
00:22:49
Speaker
you You are touching on two things, and I think the next one I'm going to go to is... um You talked about like a web server, putting assertions in a web server and like that, that particular request fails. So the first thing I have to ask you about is monitoring.

Monitoring and Feedback Loops for Assertions

00:23:07
Speaker
It's not enough to just put an assertion in. What are you doing about making sure someone notices? We didn't talk about this, but I'm so glad you asked that because that is absolutely the next question. So,
00:23:16
Speaker
When an assertion fails, because we have a a specific error type for that, we we handle that specifically. our case, we emit log message saying, hey, an invariant was violated, assertion failed. Then we have a monitor that will trigger a low priority alert in pager duty and you quietly page whoever's on call to investigate that. So the idea here is that then once this breaks, we will tell you loudly and clearly that it broke. So you have to go investigate it now. So is there an assumption that, or is there kind of a mandate that whilst you think assertions are good, they're not just, they're not enough on their own. need to have proper monitoring in place to...
00:23:56
Speaker
Well, yeah. if If you don't know when an insertion fails, then it's not doing you any good because then it's failed. The bugs are still happening and you're not looking at your logs and you don't know that they're happening. So you need to have that feedback loop.
00:24:06
Speaker
And in this... that they They work into a really nice feedback loop that also pulls in the other things you should be doing to build reliable software. So if I put an assertion in my code, I'm putting a little bomb in my code.
00:24:21
Speaker
I want to make sure that that bomb doesn't go off because if it does go off, I know that I'm going to get paged and then I'm going to have to but you worry about fixing this the next day. I don't want to do that. So I'm going to write more tests to make sure that that assertion is never hit.
00:24:37
Speaker
ah it it it's it i They force you to commit because you're saying, I'm confident enough that this should never happen, that I'm saying you can explode if it does happen. Sorry. And so if I'm going to make that commitment, I'm going to do the other things that I should to make sure that I am following through on that. Yeah, yeah. This may be an unpopular thought, but I think that's one of the strong reasons for having programmers be the people on call.
00:25:01
Speaker
Absolutely. Because just you need that feedback loop. Because if someone else worrying about it, then who cares? you know they or that's and The programmers need to be the people dealing with... Programmers need to be the ones responsible for your system operating correctly. Yeah. And surprisingly, I think that's a little bit of a controversial statement. Some people have the idea that there should be some sort of product support team, perhaps, whatever you want to call it, that will investigate, oh, there's some data issue or something went wrong, all this, and then they're going to fix it. And I'm busy shipping features, and that's what talking about. But that's ridiculous.
00:25:34
Speaker
You should be responsible for the thing you're building and making sure that it works properly. And once once you do that, then all of the incentives start working in a virtuous cycle where... you don't want to fix it. So you're going to make sure that it actually works the first time. Yeah. Yeah.
00:25:48
Speaker
You also get this thing. i think I've seen this in the past where in a company where the programmers get to throw their code over a wall and never worry about whether it worked or not, they overestimate how good they are at writing reliable software. Absolutely. Because we all need feedback. loop yeah Absolutely. yeah And another reason i I like using assertions is that We don't realize how bad we are at writing software.
00:26:17
Speaker
And it's nice when you have something checking your work, pointing out cases when you do something wrong. And there may be a lot more cases where you do something wrong than you think. Maybe they happen to not be relevant then, and so they sort of go unnoticed. But this draws them out immediately, so you know how to write when they happen. Yeah.
00:26:34
Speaker
Okay, so the other thread I was tempted to go down, um and i I'm going to get away with this because you're a language investigator, right?

Philosophies in Error Handling: Erlang vs. Assertions

00:26:43
Speaker
You've got a web server, you've put an assertion in your um web request, so it just crashes.
00:26:49
Speaker
You're going to let it crash. I pick that phrasing deliberately to invoke Erlang. It is actually the right thing. What what are you doing about, are you tempted to move everything to an Erlang-like language that has a supervision tree and crashes are expected?
00:27:03
Speaker
No, not at all. yeah I mean, well, and that's not, i well, Erlang is one that I have not explored much, although I'd love to. it It sounds very interesting. But my understanding of that idea of letting it crash and then recover is that you're expecting it to crash as a normal part of operation. Yes.
00:27:21
Speaker
I'm not expecting, an assertion fails, that's an exceptional event. that but That's not my normal mode of operation here. um i don't If an assertion fails, I don't really think that restarting that somehow and running the same thing over again is going to have any different effect. Because the reason it failed is there's a bug in the code I wrote.
00:27:41
Speaker
So why should I retry it? The code I wrote doesn't work. OK, I see that. Yeah. yeah ah so i think I think you might be seduced one day over to Gleam, for instance, which is very much in the family of Rock, Light, Candy, and Moist, but you're using Rock as ah as a kind of

Exploring the Rock Language

00:28:01
Speaker
hobby language. right You're using Java at work.
00:28:05
Speaker
do you wish What is it that attracts you to Rock and what you wish you could port across, assuming you can't persuade them start using Rock? Well, everything. I'll import everything to Rob. But the the fact is that I am not of, I think rewriting a large software project is usually a bad idea. and so I'm not going to say that we should rewrite all of our code.
00:28:33
Speaker
just now. And on top of that, Rock is young and the compiler is currently being rewritten, for example. So once it's matured more than there, I might have more thoughts about integrating it some places and starting to get some of those benefits. But it has to be really good before you should start using a production. I mean, it's excellent, but the I's have to be dotted and the T's have to be crossed.
00:29:00
Speaker
Okay, yeah, that's fair. So one thing I noticed from your talk about ROC, which I didn't know before today, was there is a new 64 type, right which I assume is an unsigned integer, 64-bit integer.
00:29:14
Speaker
And I also know that ROC is currently being rewritten from Rust to Zig. So you've got... One of the things that makes Rust and Zig distinct is Rust will do memory checking for you, whereas Zig won't, but Zig will do bounds checking on that unsigned 64-bit integer, whereas Rust doesn't.
00:29:37
Speaker
And those are like even in these languages that are so concerned with getting things right at compile time, you still have to make trade-offs over what's checked for you for free. Right. And where does that come into a language like raw, which is people checking things at compile time? Does it have bounds over flow checking?
00:29:57
Speaker
Yeah, no, that's a great question. You're very sharp. eye Because that that's sort of the one hole in rock where I can say all of these errors are caught at compile time. They're all made explicit. There are no errors you ever have to worry about secretly showing up.
00:30:11
Speaker
There's one caveat, which is the integer types, that they they can overflow and things like that. It surprised me that it had like a thick-sized integer type. So there are um methods that are checked and unchecked methods for those integers. So you can use a checked method if you want that returns result in that case.
00:30:28
Speaker
And yeah, i don't I don't know if there's that great of an answer to this because it's so vastly much better for performance to use the raw integer types rather than having an arbitrary length integer doing this checking. yeah um And Rock cares a lot about performance. the goal It's a memory-managed language, and the goal is to be at least as fast as all of the mainstream memory-managed languages, so Go, Java, Swift, C-sharp, all of those. okay And you can get much better performance if you use the raw integer types.
00:30:57
Speaker
Now, you could certainly implement a type that is arbitrary length or has the checking and can certainly use that in but that's just the choice that's been made and so and that's and i don't have a great answer for you it's unfortunate okay fair enough it's um we're always going to be making trade-offs because there is no lab one language that gives you all the different safety features you want and leaves the safety features you don't care about on the shelf I wonder if we might find 10 years from now that you're starting to build one that starts to weave these things together. Rock is certainly my ideal language at this point. and i haven't I haven't come up with one that is much different from Rock, but hey, and there's lots of room for innovation, so it might happen. Or you might end up building a Rock platform that has a supervisor tree. Who knows? Yeah, that a lot of people are excited about the idea of building a Rock platform.
00:31:52
Speaker
on the beam to do actors written in rock i don't think anyone's done it yet but you can absolutely do it and there are a lot of people excited about that can you tell me about that because i'm interested in the whole rock platform how wide it goes so for the sake of the audience explain what a rock platform is and how practical it is to create one right so in rock
00:32:20
Speaker
Every application you write exists for a specific domain. This is how we write applications in any language. I'm writing a web server, I'm writing a CLI, writing an editor plugin. You're only ever writing one thing, and yet you use the same language with all the same set of tools. those All of those primitives in the language may not all make sense for the different contexts. For example, maybe the JVM is really nice when you're building a web server, but you don't really want to have to deal with a JVM when you're writing a plugin.
00:32:50
Speaker
Or there are circumstances where if you're at some embedded thing, but you can't have a JVM in that circumstance. yeah So Rock says, let's focus the language on the specific domain that we're building for. or or let's focus our applications on the specific domain. And so how this works is you build a platform in a language that supports the C-A-B-I, and that platform knows everything about that specific domain. So this platform knows everything about building web servers, and this platform knows everything about building CLIs, or this platform knows everything about building editor integrations in this one specific editor. So that platform, the popular choices are Rust and Zig, but people also like writing them in Go. you can write them in any in many languages. And that platform deals with those domain-specific details.
00:33:34
Speaker
It but provides the concurrency model if there is one. it provides all And it provides all of the I.O. operations, the effects that are relevant in that ah context.
00:33:46
Speaker
So would I, hypothetically, I'm not sure I've got time for this, but hypothetically, I write a web server in Rock and I expect to get that platform for free, but then I might go away and say, I would like a platform that lets me write Vim plug-in this.
00:34:04
Speaker
And it doesn't have any web primitives available at all, but it has editor primitives available. Right, yeah. yeah If I did that, what would I actually do? Would I go and write some Rust code that integrates with Vim?
00:34:15
Speaker
Well, I am not well enough versed in what writing plugins in Vim is like. I think in Vim itself, it's like Vim script. Yeah, that's probably not great. i but What i imagine you would do is you would write...
00:34:27
Speaker
sort of a shell, not not a shell in the terminal sense. A thin wrapper. Yeah, yeah. In VimScript or whatever. And then that VimScript would take in input from the editor that you your plugin cares about, and then it would call the rock code.
00:34:44
Speaker
And you do the conversions of the data types and the rock data types and all that. So then you that would be the platform. And then you import that platform in your rock code. And now you have all of these Vim primitives that you can use to build your Vim application.
00:34:58
Speaker
And then that is how it gets embedded. Does that make sense? Yeah, yeah. And as a user, I'm just writing just normal rock code that just happens to have a different set APIs or APIs. Again, playing devil's advocate, why is that better than just having a Java, like Java has, it has web libraries, but I don't use them.
00:35:19
Speaker
Right, right. Why have like a constraint? That is an excellent question. Because yeah, you're right. Why does it matter? And that's not really the, that's, yeah, what I, that's not really the main point. The main point is that you get to have a context for writing or application that's very tailored to that domain. And I'm so glad you asked this because I wanted to bring up Elm because I know you like Elm. I like Elm.
00:35:41
Speaker
And Elm is magic... It's magical. It's amazing. It's such a joy to use. And part of the reason that it's such a joy is that it is build... It's laser focused on building single page apps in the browser.
00:35:52
Speaker
yeah And because of that, the language... can include all of the things you need for building single-page apps. the the you know It doesn't have a main function. Even it has the the the Elm architecture functions that you use for this sort of event-driven process for building the single-page apps. And it means that you can get a beginner in Elm app. And it can be really gentle way to a gentle introduction to functional programming because there are just these three functions, init, update,
00:36:22
Speaker
haven't been down in a while, but whatever. and i think it is the three. yeah and You sort of fill out those functions and and the the language does everything else for you.
00:36:34
Speaker
That's amazing, but it's limited to primarily writing in the browser. So part of the inspiration for platforms is what if we could have that sort of experience in every domain? So when you write a web server in Rock, you don't have a main function where you then call webserver.start or webserver.listen or whatever. You have a function called respond.
00:36:55
Speaker
And that function a HTTP request as its first argument. And that's your entry point for the program. you could have So you can tailor your application to the specific domain you're working It seems to me that that architecture is going to be great for people that are getting started. going have that great initial experience.
00:37:16
Speaker
My instinct would say eventually it's going to grow to the point where either you need a way to say, well, this application is mostly just a web server, but it also does some other stuff. How do we compose different platforms together?
00:37:29
Speaker
Or you're going to have to have people who are really good at building the platform underneath the language such that building a custom platform becomes the norm. Right. I think in in larger companies, it'll certainly become standard.
00:37:43
Speaker
i My prediction is that it will become standard for them to have their own platforms that they operate that can be tailored to their specific needs. But... ah I also think there's a good chance that most applications will be quite content being one thing.
00:38:00
Speaker
So I'm curious. You say you have a web server, and it's mostly a web server, and it's a little bit something else. what What could that something else might be? like What do you have in mind? I can pick a very simple example, which is um you might want a platform that's a web server plus Postgres.
00:38:16
Speaker
and someone else wants a platform that's just a web server plus Mongo, right? And you don't want Hibernate-style API abstracting away, because those are actually two quite different databases.
00:38:29
Speaker
That's a great point. So and I'm glad you bring that up because I can answer that quite well. So the approach there is that you would instead implement a client for Postgres in pure rock code.
00:38:40
Speaker
And then your your platform will supply TCP or whatever other primitives that client needs to interact with Postgres. But that client library can just be a pure lock you're a rock library. It can be shared across any platform.
00:38:54
Speaker
okay And the platform only has to provide that that little primitive of TCP or whatever you're going to be using for your database. So there already is and a post Postgres client implemented okay in that manner, and it can be shared agnostically across platforms.
00:39:08
Speaker
So you wouldn't It's unlikely that you would want to build, I mean, you certainly could build platform that had baked in support for Postgres. I mean, one of the web server platforms has baked in support for SQLite, but you don't need to do that. You could choose to implement your Postgres client as a platform agnostic library.
00:39:25
Speaker
but Okay, yeah, I can see that makes sense. But it also makes me wonder, like, am I then, this relates to the composition thing, am I then going to say, you know what, we are we are a reasonably large corporation, we just use web servers with Postgres.
00:39:40
Speaker
I don't want arbitrary network access. I want to build a platform that looks like how it would have looked if the Postgres client was just part of the platform. Is there going to be an easy way to say,
00:39:53
Speaker
to treat rock platforms like a menu where you choose the things go together. No, that my that's my understanding as of right now. Now, something could change perhaps, but there are a lot of issues that would come in with that because your platforms, your platforms are written in a particular language and that language, that could be any language. So it'd be, you'd have to do a lot more to make those sort of interoperable.
00:40:20
Speaker
Um, you You already can do some of that in that you could have a Rust platform that then use some shared Rust crates, for example, used by other platforms. Or maybe that Rust platform ah calls some other language.
00:40:35
Speaker
you know So you can do things like that, but it's not that's not a polished like planned experience this point. I wonder if it will grow into one. Because I totally agree with you that one of the joys of Elm is that it was totally focused on function.
00:40:49
Speaker
I can see that spreading into other domains, but then I can see the management of those domains becoming whole class of problems in itself. it's say Well, and I think this is a...
00:41:00
Speaker
mean, WebAssembly has some similar stuff with sandboxing and things like that. But this is kind of a ah new direction, I think, at least. I could be wrong. So I'm sure someone in comments was telling about how this was done in the 60s. Yeah, we used to have a COBOL platform that did this back in 63. But i we I don't think we really know how it's going to play out. So we'll just have to see what happens.
00:41:22
Speaker
Okay, on that speculative note, i'm going to ask you one more speculative question, see if I can get a completely candid answer from you. So Rock has recently announced it's rewriting the compiler from Rust into ZIG.
00:41:34
Speaker
What do you think that? I think it's an excellent decision. And why? i i Earlier in this conversation, I said, i think it's a bad idea to completely rewrite large code bases. So I know I'm aware of that.
00:41:46
Speaker
But Richard Feldman, the creator of Rock, he wrote up a memo, like a little internal memo just shared in a Rock Zulip channel, which is public, but about the reasons for why the compiler was going to be rewritten in Zig. And he just wrote it up quickly.
00:42:01
Speaker
And then someone found it and shared it on Hacker News. And it was number three on Hacker News for 24 hours or something like that. So people were very intrigued. But um over the last nine months or so, a ton of changes have come to the language that are extremely exciting and i believe will greatly increase the likelihood of the language succeeding because they maintain all these wonderful functional guarantees we want while making the language much friendlier and easier to use.

Rewriting the Rock Compiler: Zig vs. Rust

00:42:35
Speaker
Those changes changed a lot of core assumptions about the language, such that the existing compiler is going to have to be, much of it was most of it was going to have to be rewritten anyway to support those new features. One of those is you're changing syntax, right? putting braces in. Right, right. Yes, that's that's part of it. a lot in the Yeah, but a lot of other parts of the internals were changing. Major features being ripped out, major new features being added in. here So the thought was,
00:43:02
Speaker
Well, so it would be nicer to have a compiler written in ZIG. that That's the preference of the team rather than Rust. And so if we're already going to have to rewrite everything, we might as well, this is the best time ever to switch to ZIG. So you say, OK, why would it be nicer to have a compiler rewritten in ZIG?
00:43:21
Speaker
One of the biggest reasons is compilation speed. Rust has slow compilation speeds, especially when you get into a large project. And they're not slow if it's small project. But if it's a big project, it slows down a lot.
00:43:32
Speaker
And that really makes developing a code base difficult. So I'm not a core contributor by any means, but I've made contributions to both the old and new compiler. and In the old Rust compiler, I would make a change, write a test, it hit Rust test, and I'd sit there for six minutes or something. Really that much? Just waiting for everything to build. And i'm I'm sure there are things I could have done to to make things better, but this was a constant pain that the the developer experience was causing.
00:44:02
Speaker
was was really rough because of the compile times. Zig is extremely focused on fast compile times. They're doing a ton of work on that. And so it's so much more joyful to use. It was announced that it was decided that it was going to be rewritten in Zig, and that immediately and made me feel more excited about contributing to the compiler in the future because I knew that I could get faster feedback when I was changing because this is something I'm doing in my fun, not paid developer time. yeah I don't want to spend all my time sitting around waiting for the compiler in in that circumstance.
00:44:33
Speaker
Then the compiler is very focused on performance, and Zig is a nicer fit for the philosophy used to achieve performance.
00:44:45
Speaker
So one ah big... Okay, so part of this is that the compiler does some... Well, one of the... so Rust tries to... avoid you doing memory unsafe things altogether.
00:45:00
Speaker
But inevitably, there are some circumstances where you need to use an unsafe block so I can do this unsafe thing because what whatever reason, that's just what I have to do. Then once you do that, now you're in no man's land and you have very little support.
00:45:12
Speaker
And that experience is now worse than it would be if you're using a language that expects you to do unsafe things more consistently and tries to support you in that paradigm. It turns out that there is a fair amount of this unsafe sort of code in the ROC compiler.
00:45:29
Speaker
ah so So much so that ah part, well, this is part of the reason, I believe, why the the the standard library was rewritten in ZIG in the original. So the original compiler was rewritten in ZIG because ZIG was so much better of a fit for writing this kind of code being used there. So then rewriting the whole compiler in ZIG, that that philosophy can be used everywhere. And then also, the structure of arrays optimization is used a lot. And Zig has much nicer support for that than Rust. ah you know Again, this is my understanding. And um there are some other reasons I'm forgetting. But those are some of the the main ones.
00:46:10
Speaker
Okay, in that case, I'm going ask you final question. you know Because I'm working on a project completely unrelated to this, but it's in Rust. And i was tempted to use it. I'm very drawn towards it.
00:46:23
Speaker
this These are my scars from the past, but I am probably more worried about memory management than compile times per se. I...
00:46:37
Speaker
I hope I never have to deal with a segfolk again in my life. Does that side of Zig worry you? Is it proving a problem for the rewrite? Am I worrying over... I being too... po about it It doesn't seem that worrying to me. And and another... this this brings up another point about Zig, which is that...
00:46:57
Speaker
In the context of a compiler, you're taking in code, you're spitting out an executable. You don't really need to be freeing a lot of memory as you're going. Rust naturally wants you to allocate lots of little objects that are then freed as they go out of scope automatically.
00:47:15
Speaker
That's not necessarily a good fit for a compiler when you're going to take in code and spit an executable. You could certainly do all the deallocations there, but... Zig is much more aligned with the idea of here's a big chunk of memory. I'm going put a bunch of objects in it and then throw it away at the end. Yeah, OK. That's a very good fit for compilers. Right, right. So that works nicely with the compiler. And then also because it's the intent is that it's going to be ah an executable that's invoked quickly and then iss done it's the importance of memory safety is less, I would say.
00:47:49
Speaker
Yeah. Yeah. For a one shot program like that, the the best memory strategy is just allocate and don't ever release it until you're done. Right. Yeah. Okay. Okay. final question then, what does your future hold like computing wise? Do you think what's what's the most exciting software reliability technique that you want to look at next?
00:48:13
Speaker
That's very nebulous, but going to ask. Well, i go i' mean I'm going to switch, shift gears back to another one, back loop back to earlier in the conversation, the one that i haven't talked about yet, which is back to invariance.

Innovative Data Validation Techniques

00:48:28
Speaker
So the idea is we have properties about our system. I want us to think about these properties. We should start by enforcing these properties using the type system where we can. Once that runs out of gas, start using runtime assertions.
00:48:39
Speaker
Once runtime assertions run on the gas, start running queries against your data and checking if your data makes sense. And this is something we just started using at work last week before the conference, and it's already found multiple bugs.
00:48:56
Speaker
What form does that take? Right. So are you like running invariant queries over your log files? are we talking about like we should have a schema on our database or something?
00:49:07
Speaker
So our particular implementation, and and and this could very well change, um is we have a re-replica. created a directory in the root of the project called data invariance broken out by team.
00:49:21
Speaker
In each team directory, you put SQL files. Those SQL files must all return a count. And every night, we'll run those queries, each of those against the read replica.
00:49:34
Speaker
And if any of them return a result other than zero, then we'll send a Slack message telling you, this invariant was broken. We expected this to return zero rows, but it returned 74 rows.
00:49:46
Speaker
That's the implementation we have now. right yeah We could very well change that. We have a small enough amount of data, or or there are many queries we can run that are fine to run against a transactional database, MySQL in this case. if If that became a problem, we could change the architecture and start running these against BigQuery or something like that. But this was a ah very quick, easy way to implement it to test it out.
00:50:06
Speaker
This is a way to do like the kind of checks that database schema ah database schema generally can't look at values, just types. And so, you know, you should absolutely use database constraints. I love database constraints. They're amazing. I think lots of people like database constraints because they give you a lot of peace of mind. They do. And database constraints are runtime assertions in your database.
00:50:29
Speaker
Yeah. If you do something wrong, it blows up and it's loud. So that's, that's, it we're sort of just doing database constraints in your code with runtime assertions. But yeah. Again, yeah, there are properties that you can't enforce with those constraints. Because even if you write check constraint, usually those can only apply to like the current row or table. And you can't check properties that apply to many tables in your system. So one example, we make an employer benefit.
00:50:55
Speaker
once an employer, ah ah if if an employer has terminated their contract with us, then we don't want any of their employees to still have access to the product because they're the one providing product, they're sponsoring it. that makes sense That's very reasonable.
00:51:11
Speaker
But how, so you could express an invariant, every employer that's, to or ah for every employee, if their employer is terminated, they must be inactive.
00:51:23
Speaker
That seems like a very reasonable thing to say. yeah But that's it's a hard thing to enforce if you're just at runtime. because hard to impossible to do at the schema level. Right. Well, you certainly can't do that at the schema level with normal database constraint.
00:51:36
Speaker
You could write a runtime assertion where, say, when some request is processed, you load all you run this query or load all of the employees or something like that. That's obviously horrible because you you can't afford to be doing that.
00:51:49
Speaker
And then in this particular case, When we terminate employer, there is actually a window. The process of updating all of the employees is asynchronous. So there's actually a window where there could be employees that are still listed as active, even though the employer is terminated. And so how are you supposed to accommodate for that also? so But we can encode all of this in a query. So the query says,
00:52:11
Speaker
check it for all of the terminated employers within a, say, you know, some grace period to account for the asynchronous aspect, make sure that all of those employees are also inactive.
00:52:22
Speaker
And and The feature I mentioned at the beginning of my talk that I was working on that was a challenge, that I was distracted by these unnecessary pain points of of tools.
00:52:33
Speaker
Well, as part of that feature, there was one particular subgroup of employers that needed some special treatment. So I wrote a job to backfill some data they needed for them. And in doing that, I forgot to call a function that I needed to call on each of those employers.
00:52:49
Speaker
And it was completely not noticeable at all. you You would have had no idea because everything still looked as though it was working. The data all seemed fine. And then i implemented these checks and I found, oh, one of those checks failed. And it said, ah, I forgot to call that function on those employers. I need to go go back and fix that.
00:53:07
Speaker
And this is the sort of thing that It could have literally gone years without ever being noticed because it's one little corner of the data set. You have so much data. How are you to know if one little corner of it is wrong? But now I have feedback and confidence. And if if anything slips through the crack anywhere, well, that can still be caught by this check over the data because...
00:53:26
Speaker
it it checks data at rest. it doesn't It doesn't matter if that data is currently being processed to have ah an assertion go through it it it. It'll keep checking it consistently. Yeah, yeah. it It's amazing how often it just comes back to, do you want feedback? Do you want it sooner?
00:53:41
Speaker
Right. That's the whole, maybe it's not the whole of software reliability, but it's certainly ah um a pillar of it. Right, right. And and and on the topic of feedback, with with my my pyramid here,
00:53:53
Speaker
Type system first. That's the best. Fastest feedback, no impact. Then runtime insertion, because you still get quite fast feedback. The bug's going to be caught quickly. You know immediately.
00:54:05
Speaker
then once And then at the bottom is these checks over the data itself. yeah I'm running that nightly, so it could be several hours before I notice the bug. So I'm getting less feedback. But now I have the most expressive power, where I can express any sort of invariant that maybe I couldn't in either of the top two layers. Yeah, yeah. i'm Right at the bottom in the basement where nobody wants to go, are the user's just complaining. Right.
00:54:29
Speaker
i encourage you, if you're listening and you have access at your company in some capacity, just start snooping around in your data and and write these. Think about what properties hold must hold in your system, what the rules are, and start writing queries to check these properties.
00:54:44
Speaker
And I suspect that you may be surprised at how many of them fail in some edge cases. Now, it may not be the case that everything is completely broken, but it could be that half a percent of your your rows are wrong and and you have no way of noticing that because it's not clear that it's wrong when you look at your system normal. Yeah, yeah. If you go looking for problems, you will will almost certainly find them, right?
00:55:09
Speaker
But it's probably better than the alternative. Absolutely. On that note, Isaac, we need to stop recording and go look for some new problems. I guess so. Cheers. Thanks for having me Pleasure. Thank you, Isaac.
00:55:21
Speaker
lot of food for thought there, and I suspect some people are going to have their own two cents. So if you want to add your thoughts to the discussion, i invite you to head to the comment box. One topic I noticed listening back over that, that we really didn't touch much at all, was testing.
00:55:36
Speaker
And I think that's a phenomenon of test-driven development. I think in a post-TDD world, we take testing as a given. And that can only be a good thing. But if you're hungry for some testing specific discussion, have a look in the show notes for a link to the episode I did with Oscar Wickstrom, which was all about property based testing.
00:55:58
Speaker
I still think property based testing is the most interesting but underused technique in all of software testing. I've also I like it so much. I've got a walkthrough video for it on YouTube. You will find both links for that in the show notes.
00:56:13
Speaker
While you're deciding where to click next, maybe one of those links or maybe somewhere else, please do start by clicking the like, rate, share buttons. It really does help. I await my judgment in the next life, in the great beyond. But in the here and now, the algorithm judges us all and you can help it to look favorably upon me.
00:56:34
Speaker
We'll be back soon with an episode that takes Isaac's idea of querying a whole system and says, actually, how do you query a whole system? Sometimes that's much harder than it seems.
00:56:47
Speaker
I leave you pondering that thought. We'll be back soon with details. But until then, I've been your host, Chris Jenkins. This has been Developer Voices with Isaac Van Doren. Thanks for listening.
00:56:58
Speaker
okay