Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Automate Your Way to Better Code: Advanced Property Testing (with Oskar Wickström) image

Automate Your Way to Better Code: Advanced Property Testing (with Oskar Wickström)

Developer Voices
Avatar
2k Plays10 months ago

One of the most promising techniques for software reliability is property testing. The idea that, instead of writing unit tests we describe some property of our code that ought to always be true, then have the computer figure out thousands of unit tests that try to break that rule.

For example, you might say, “No matter which page you visit on my website, there should always be a login button or a logout button.” Then the test’s job is to try to break that rule, but clicking around until it finds some combination of clicks fails that assertion. Like, maybe it finds the 404 page, and you realise it was missing the website’s normal header.

At its best, property testing takes far less work than unit testing, but is far more thorough, because it lets us write the rules and has the computer write the examples. The downside is, it often seems theoretical. It can be hard to apply property testing to real-world cases. Let’s fix that.

We’re joined by Oskar Wickstrom, who’s been building all kinds of different systems and bringing property testing with him wherever he goes. We discuss the basics of property testing, then he goes into the advanced and cunning techniques that go beyond the ordinary into testing databases, webpages and more. With a bit of thought, he can help us test a ten times as much code with a tenth of the effort.

--

Oskar’s book, Property Testing a Screencast Editor [ebook]: https://leanpub.com/property-based-testing-in-a-screencast-editor

Quickstrom: https://quickstrom.io/

F# for Fun & Profit: Property Testing Series: https://fsharpforfunandprofit.com/series/property-based-testing/

Linear Temporal Logic: https://en.wikipedia.org/wiki/Linear_temporal_logic

The Quickstrom Paper: https://arxiv.org/abs/2203.11532

TodoMVC (One frontend app, many implementations): https://todomvc.com/

Oskar on Twitter: https://twitter.com/owickstrom

Kris on Mastodon: http://mastodon.social/@krisajenkins

Kris on LinkedIn: https://www.linkedin.com/in/krisjenkins/

Kris on Twitter: https://twitter.com/krisajenkins

--

#softwaredevelopment #podcast #programming #testdrivendevelopment #propertytesting

Recommended
Transcript

The Dream of Reliable Software

00:00:00
Speaker
Wouldn't it be nice if our software just worked? If it just did what it was supposed to do? Some days that seems like a distant dream. But we can hope.

Test-Driven Development: A Double-Edged Sword

00:00:10
Speaker
We have to hope. We have to stretch ourselves for the day when software is reliable. How are we going to get there?
00:00:17
Speaker
I'm told that the answer to this is test-driven development. And I do like that technique, and I use it. And plenty has been said about how great it is. So let me risk excommunication and tell you my problem with test-driven development.

Automating Testing with Property Testing

00:00:33
Speaker
You do end up with an awful lot of test code. And because that code is just code, of course it comes with a maintenance overhead of its own. And it has reliability and readability problems of its own.
00:00:48
Speaker
And on one of my more dark and cynical days, it seems like our code wasn't working, so we just doubled the amount of code. And in my heart of hearts, I don't know, it just seems like we're doing things the hard way, like that's almost too much work. And my job as a programmer is to be lazy in the smart way. I see that many unit tests and I just want to automate the problem away.
00:01:12
Speaker
Well, that's the promise of property testing. Write a bit of code that describes the shape of your software, and it will go away and create 10,000 unit tests to see if you're right, if it actually does work that way. And this week, we're going to look at property testing, what it is, how it works, and we're also going to address my biggest disappointment so far with property testing, which is that it only seems to work in theory.
00:01:39
Speaker
It's great for textbook examples. I'm sold on the principle, but I've struggled to make it work on my more gnarly real world code.

Oscar Wickstrom's Insights on Property Testing

00:01:48
Speaker
Well, enter stage left riding a white horse, Oscar Wickstrom, who has some sneaky techniques for making property testing practical. Stuff he's been using to test things like databases, system migrations, video editors, and even the really messy world of user interfaces.
00:02:08
Speaker
And this discussion has me genuinely looking at my test suites in a new light and wanting to try to do things differently. And I really hope it does the same for you. So let's get started. I'm your host, Chris Jenkins. This is Developer Voices. And today's voice is Oscar Wickstrom.
00:02:38
Speaker
I'm joined today by Oscar Wickstrom. How are you doing, Oscar? Very good. Thank you. How about you? Very well. I'm intrigued by the musical instruments sitting behind you. I wonder if we should have a jam session instead of a podcast. Yeah, maybe so. This is a shed in many respects, I guess. Your drum shed. That's a nice thing to have. Drum shed, programming shed.
00:03:05
Speaker
So when your code isn't working, you can just bang the drums and get your frustration out. Yeah. It's after every meeting. I just flip around and go. We all need that. But, but by some measure, your code should be working more often than most. Right. Yeah. Gauntlet thrown down. Yeah. Because, uh, you're going to talk about prop we're going to talk about property testing in this one. Cause you have, uh,
00:03:35
Speaker
We'll get into a definition of what it is, but I think you've taken this as far as anyone I know in the practical sense. There are lots of theoretical people working on it and people building more frameworks for property testing, but you've been using an anger in a really interesting way. Before we get into that, I think we should get everyone on the same page and you can give me your personal definition

Property Testing vs Unit Testing: Key Differences

00:03:57
Speaker
of what property testing is.
00:03:59
Speaker
Yeah, good. And I haven't prepped any. That'll keep it honest rather than marketing. Yeah.
00:04:08
Speaker
When I try to explain this, I usually start with what people are comfortable with normally and what's the state of affairs. Most people have been doing some kind of unit testing or example-based testing, if you want to call that. I try to start there. With example-based testing, you have something you want to test and then you come up with
00:04:38
Speaker
at a high level inputs and outputs. And it might seem a little weird to think about it as input and output. But if you have a function, it's pretty clear how it's an input and output that you're testing. But you might be testing some kind of database storage thing, and the output is more like the effect of storing something in a database, for instance.
00:05:01
Speaker
But you have these at a high level, you have these input and output pairs for each of your tests. You say like, okay, if I do this, then I should have this result. That's the input and output. So I say my create user function, if I call it with oscar.wixtrom at google.com, I ought to be able to select a row out of the database afterwards. Yeah.
00:05:24
Speaker
Exactly. So outcome is sort of output in that. OK. And then you come up with a bunch of these, and you write tests for them. And you might have a bunch of these and feel like, OK, it's getting a bit repetitive, and I have a lot of duplication here. And then you can refactor, and you can move things around and improve that situation.
00:05:52
Speaker
There is this technique called data-driven tests, or tabular tests or whatever you want to call it, where you have just one function doing this assertion or whatever, and then you supply a table where each row is a bunch of inputs and the output.
00:06:14
Speaker
I've done that without naming it that. I'm not sure if that's the proper name, but it has a lot of different names. But you can then just add more rows, and it's easier to add more tests in that sense. Still, you have sort of fixed some of the maintainability problems of having a thousand
00:06:39
Speaker
copy-pasted tests. But you have a different problem still, which is you have to come up with these inputs and outputs. And humans aren't that good with coming up with all the interesting edge cases and combinatorics of what to test.
00:07:00
Speaker
Yeah, I wonder how many bits of code out there take an email address, and they've never tested it with Unicode characters. Yeah, that kind of thing. And so on. So yeah, strings are one good example of these. We have sort of a domain of strings that are valid. And yeah, it can get tricky pretty quickly. And you have to, if you have different inputs that have some kind of relation to each other, it sort of explodes quickly.

Adversarial Thinking in Testing

00:07:30
Speaker
And so yeah, that problem is you don't solve that very well with this tabular testing thing. And that's where property testing comes in because there's like the next step here. If we could keep that function, but instead generate this whole table, generate all the input and output pairs and express our tests as a function still.
00:07:59
Speaker
but then it becomes tricky because, well, you can't, this, this sort of relation between input and output has so far only been in your head, right? It's something that you have some kind of intuition about some idea and like, yeah, of course this input goes along with this output. That makes sense, but you haven't really pinned down formally. Why do they belong together?
00:08:25
Speaker
So you're saying like, if I have a function that's supposed to pull the domain name out of an email address, if it's Oscar at Gmail, I can say that and I can say the output should be Gmail, but I haven't really defined what it means to pull the domain out of an email address, right? Why is it just Gmail?
00:08:44
Speaker
Right. Yeah. Yeah. When it's something else. Yeah. And then you can maybe, so if you do that and you can generate these inputs, email addresses, for instance, and you want to say, okay, I have a function that pulls out the domain name. Um, then you have to, uh, in order to be able to test it, if you can just generate the inputs, you have to come up with a way of formalizing the relation to the output.
00:09:08
Speaker
So you have to think of what's the general rule on how do I go from input output in a proper way. Yeah, because your unit tests are kind of hinting at a pattern you now want to make more explicit.
00:09:22
Speaker
Yeah. Exactly. So unit tests or example-based tests are small pieces of narrative, small stories about how things work. But you have to generalize that. So going from these small examples, stories of evidence of what's going on, observations almost, you have to raise that into a general
00:09:50
Speaker
generalize that relation basically between input and output so for your example you might see if I can come up with something on the spot but if you say like okay if I want to extract the domain name out of an email address if I then could
00:10:07
Speaker
Also extract say the the part before the ads I don't know what it's called, but you know the the user names or part of it. Yeah And then piece together those two results and then I should end up with what I started with maybe yeah Yeah, and that test is general you can feed it whatever whatever email address you want and it will do this sort of
00:10:31
Speaker
test your function that extracts the domain, but also do something else, be together and check that the output is the same as the input maybe. So then you have thought about your test differently in order to be able to test on any input.
00:10:49
Speaker
This is what I like about the idea of property testing, that the way we're doing the testing is somehow made of slightly different stuff than the code that's under test. It always feels like unit tests are made of exactly the same material as the code. And that's always felt somehow wrong to me.
00:11:09
Speaker
Yeah, it's easy to end up there, like your tests are one-to-one reflecting the implementation. I'm sure many have seen that. And property testing is you can end up in that situation if you do property testing with sort of the, I don't want to say wrong type of property, but certain type of properties don't fit certain type of systems or functions under test.
00:11:37
Speaker
then you can end up in the same kind of situation. But if you do find a nice pattern for writing a property for a certain thing you want to test, you're sort of forced to not do that, right?
00:11:50
Speaker
Yeah, yeah. I'm kind of reminded of a discussion we had with Simon Payton Jones a few weeks ago, where it was like, there's an algorithm that computes square roots. And it's very easy once you found a square root to test if it was actually right. But the way you're testing if it's correct is very different than the way you're calculating it. Yeah, it's easy to go in one direction, but not the other.

Oscar's Journey to Embracing Property Testing

00:12:12
Speaker
Yeah. And so when you're going in both directions, it's kind of almost adversarial in a positive way. Yeah. Yeah.
00:12:20
Speaker
Exactly. And that brings up my third point on what property testing does in a different way than example-based tests, where my colleague said this to me, it's nice in the sense that when you've written a function, you have probably already thought about your
00:12:42
Speaker
the things you want to cover, the things you can imagine could go wrong or should be successful and so on. And this forces you to express it in a totally different way, which, as you say, it becomes sort of adversarial. It's thinking differently and coming up with different paths or different examples that could break your assumptions that you had when you wrote it. So it's hard to do that mind shift, just going back and forth between
00:13:08
Speaker
being your own friend and being your enemy back and forth. Yeah. So what is it about property tests that makes that easier, do you think? Well, it's really hard to, in most cases then, as I was going into before, it's hard to actually do a property test.
00:13:30
Speaker
and describe the implementation, I think, because it's so different. It's hard to describe, but you have to think differently about how you test it when you don't know, for instance, don't know the output. You only know the input. What should the output be? I can't say something concrete about the input because I don't know what it is. I have to find this generalization, and that forces you to
00:13:56
Speaker
think differently and that in combination with inputs being generated, not by you, your brain, but by some other semi-stochastic thing that tends to uncover things you haven't thought about. Right, so you're saying I write a property test that takes, let's stick with email address,
00:14:18
Speaker
I'm going to get a huge number of randomly generated email addresses, and because that input is so varied, I kind of have to treat it in an abstract way. I can't think of it as a concrete thing, so I have to think about the whole testing problem differently. Yeah. That makes sense. So the big problem with this always is
00:14:42
Speaker
you go to a conference talk on property testing, and they give you lots of examples on lists and maybe a couple of on email addresses. And it kind of I've run up against this, it kind of feels like it's only really going to work for obviously mathematical stuff, or obviously abstract stuff.
00:15:05
Speaker
Yeah. This is why I wanted to get you in. Cause you've done this with a lot more hairy code. So take me through to how, how can I write better and more general property tests? Yeah. So you're describing me circa 2016, I think I was at a conference talk and someone gave a great talk and all, uh, all that, but I was like in denial and I was like, nah, that's not going to work for my stuff. It's too much too academic, too theoretical.
00:15:37
Speaker
I started there and then I think I softened to the idea by we had this rewrite kind of project where we shared the database between two implementations for a while during the rewrite or
00:15:59
Speaker
It was sort of slicing apart of the system and rewriting. And we can compare those systems, how they acted on the same input with where the input was actually the database. And so I wasn't actually looking to learn property-based testing, but I was doing this and I sort of backtracked from there. I was like, this is sort of a variation on property testing in a sense.
00:16:26
Speaker
So the property was sort of these two systems should be in sync or they should give the same output for the same input basically. And that is a common way of expressing a property test. If you have some system that is
00:16:48
Speaker
very complicated in a, not in its functionality, not in its essence, but in a sort of non-functional way, maybe has a lot of performance optimizations and stuff, then you can write another version which is really simple.
00:17:03
Speaker
just in memory, super naive, whatever. And then you can run them side by side and see that they produce the same output. Anyway, yeah. I'm D really. So that was sort of where I started, I think. And then for a totally different project, I was doing a screencast editor application. Oh, yeah. You wrote, right? Yeah.
00:17:33
Speaker
So I was doing my own screencasts around Haskell programming, and I thought it might be a good idea to also do an editor for doing screencasts editing. So recording first and then pulling all the stuff in there and having my own workflow and so on. That was a fun project. That sounds like a lot of fun, but really going the hard way. Yeah, sort of a sidetrack.
00:18:02
Speaker
Yeah, but that was also an experiment sort of driven from this angle of learning property-based testing because I quickly realized that it was getting complicated because I had this, partly I had undo, redo, and a lot of operations that could be undone and redone. They were all implemented sort of in a functional way. So you had a big state and you had some operation that sort of proves a new

Practical Application in Screencast Editor

00:18:29
Speaker
state.
00:18:29
Speaker
The architecture was nice but it was hard to verify that everything worked as it should because I shouldn't go into too much detail but I had this kind of tree structure of all the clips and all the audio and video and pauses in between and so on. Everything was small segments of different types and then I had sort of
00:18:50
Speaker
First I had VIM bindings to navigate this thing and also something, what's it called from Emacs? I have this lisp editing thing, I've forgotten the name.
00:19:06
Speaker
Oh, parrot edit. Structural editing. Yes. So you can sort of shift entire expressions sideways or upwards, split them and ungroup them and so on. Oh, so you're swapping chunks of video like their S expressions. Yeah. Nice. Very geeky. It was strange. And all these operations could be undone and redone.
00:19:35
Speaker
And it was really hard to test all these operations with loads of edge cases. So are you saying you've got this big tree structure? This seems like a classic thing that I would find very hard to property test, because I don't want to rewrite video files 1,000 times an auto-generated test. And I don't really want to generate a lot. I don't want to really hand write large, meaty starting states.
00:20:06
Speaker
Exactly. So how did you teach me how to solve that with those problems? I'll try to remember because I wrote this, this became a series of blog posts. And then later I rewrote it sort of as a short book, which is called, I think, property based testing in a screencast editor, very down to earth. And
00:20:31
Speaker
OK, so if I recall correctly, the properties were some of the interesting ones were sort of using this undo, redo as a way of discovering other problems. OK. So I had all these operations, but I didn't test them explicitly. Like, oh, how should move up in the tree structure work or whatever operation? But I could generate just random operations
00:20:59
Speaker
a sequence of them. And it was a bit more complicated because you can only know what operations are valid once you're in a certain state. So it was more a back and forth, like generate one operation, apply it, get a new state, and then see what can be the next valid operation and generate one of those. And you step forward like that for a certain length.
00:21:26
Speaker
Okay, so you're almost treating it like a state machine, where each new state you ask which are the transition states? Yeah, what are the valid transitions from here, and then you can generate from that list of possible transitions. And do that for a while, then you end up with a trace, basically, of state, operation state, operation state, and so on.
00:21:51
Speaker
So I did that, and then I undid, or like, if applied the same number of undo operations for all of them, and then it would end up where I started.
00:22:03
Speaker
Okay. So, but to, to answer your other question, like how did I, you don't want to write these big sort of timeline, um, starting states to test against because it's a sort of maintenance problem and boring. So with property based testing, you, you normally have,
00:22:25
Speaker
like you write small generators. There are a bunch of built-in generators for all the regular types, the built-in types, and so on. But then you can build your own generators that sort of piece together other generators. So I could define a generator for this whole timeline. And that, in turn, used generators for audio and video clips, and so on. And they weren't actually real audio and video clips.
00:22:54
Speaker
generic type parameter somewhere I don't remember exactly but so they weren't actually generating files on disk okay in memory structure so what is it generating a gigabyte's worth of binary randomness or wod
00:23:10
Speaker
No, that part didn't actually hold any video data. It was just a structure. I create an empty clip with some metadata. Yeah, because the operation didn't touch the actual videos. I can ignore that part and just parameterize that with unit or something. And then this undo, redo, I had some other properties on that.
00:23:38
Speaker
in that sort of style. So, like, do a bunch of things, undo all of them, and then redo all of them. Then you should end up at the furthest end before you started undoing. That makes sense. And then, yeah, you might think, like, okay, you're only testing undo and redo. That was going to be my next question, yes. Which is useful, but...
00:24:03
Speaker
because of how I implemented undo and redo. So each operation had its inverse as a separate implementation. So if you could, let's say, if you had a clip which was called sequence, and I had a bunch of audio parts in the sequence, if you could split that at a certain point, you could also join it back together. And that was sort of the inverse of splitting was joining.
00:24:32
Speaker
and by doing this whole undo-redo thing in random ways, all these inverse operations were also executed and they had to agree in order to produce the same. So I had loads of small
00:24:55
Speaker
like off by one errors and stuff like that. And those were uncovered by doing these sequences.
00:25:04
Speaker
I could probably find most of it by just doing the inverse round trip kind of property. But doing it together with Andrew Redo sort of did the whole thing in this catastrophic mess that uncovered a lot of bugs. So it was a nice experience. It took like two weeks to just fix all the bugs. It didn't take so long to write properties.
00:25:33
Speaker
But I was just scratching my head for like two weeks trying to figure out the bugs and Getting getting all that That's the interesting thing when it turns out bugs you couldn't expect to see Yeah, right. I had experienced the bugs while using the screencast tool. I Knew there was something going on. I couldn't really pin it down. It's like what what what happened there? It is what did I do like it's hard to
00:26:04
Speaker
Should I record my screen on my keyboard to be able to reproduce? And that was really cool to find them that way.

Advanced Techniques in Property Testing

00:26:15
Speaker
So I just did bug fixes and following the tests for basically two weeks. And when I came out the other end and I started the screencast editor, everything just worked. It was just...
00:26:28
Speaker
Perfect. That's nice, that's nice. Because you would have spent those two weeks the hard way, right, spread out over the next year. Yeah. Which is bug reports or whatever pain and trying to figure out what was going on. So that was one part. I did property tests for other things. I had, I could bring that up as well as I think a nice example.
00:26:53
Speaker
I had this, the workflow was based on just recording screencast video. And in between sort of scenes, I was just silent because of this. I did part of the screencast and then I was silent and didn't touch the keyboard. And I was just thinking like, okay, what's the next part? Maybe a little script or something. Yeah. And then
00:27:19
Speaker
When I imported this into my editor, I wanted everything to be chopped up by those silent parts because I don't want them. Yeah, makes sense. So I had this sort of processing part that
00:27:35
Speaker
it analyzed the video and audio and found out, like, here's a pause. And let's just clip or, like, trim it down to the end of the audio and trim the other part to the start of the next audio and so on. Like so, if it's completely silent for five seconds, assume it's a pause and find the end of that silence. Yeah. Yeah. Makes sense. So that was sort of the scene classifier.
00:28:04
Speaker
I wanted to test this with a property test as well. And I couldn't really figure out, because your example there with the square roots, and that is a good analogy, because I had the same problem. If I generate an input, what should be the output for this classifier? Well, that's the problem, right? That's what the classifier does. And I couldn't really figure out how
00:28:30
Speaker
how should I express this without just writing another classifier again in the test? Yeah. I mean, I would instinctively say, well, at least I can write a test that says the output should be no longer than the input. Yeah. And that's a good start. There are loads of this incomplete or sort of naive properties that do find actual bugs as well. They might not be that cover.
00:28:57
Speaker
all of it and be very precise. But these sort of constraints are really powerful. And I do recommend just doing that for a good while. If you can't come up with something else, that's just better than nothing because it's you might find unexpected things with just those basic sort of invariants or constraints. Okay, okay, I feel better about my test that seems kind of obvious. Yeah. But is this what did you do? Is there something more sophisticated I can do than that test?
00:29:28
Speaker
Yeah. So what I did, it was after I think a lot of being out on my bike and just thinking, I realized that maybe I can generate the output instead and go backwards.
00:29:42
Speaker
So that was sort of the mind twist that happened. Because I can generate the output, which should be a list of scenes, basically, classified and done. I can go backwards and say, if I had these scenes, I can transform them into something that moves in video or audio that is sort of noisy or something. And then in between, I can put just same frame all over again.
00:30:13
Speaker
Okay. Or audio that's just blank or something like that.
00:30:18
Speaker
and then piece together that into an actual video or actual audio. Then run the classifier on that input, which I've... So I generate scenes, which are just basically from two timestamps that says, here's some video going on. And then another scene saying, okay, 10 seconds later, there's some video going on, which is also a scene, and so on. And there's silence between, which is five seconds or something.
00:30:46
Speaker
I can take that, map that into actual input. It's the same thing as you said with the squares. It's easier to go in the other direction. And then I ran the classifier on that, got some stuff out, and I can just see that the scenes align. Okay, I've not heard of that technique before. So, normal property testing, you get randomly generated inputs. You're saying, get randomly generated outputs.
00:31:14
Speaker
reconstruct the inputs the easy way. Yeah. And then check your software reconstructs the outputs the hard way. Yeah. That's Oh, that's a Jedi mind trick. I like that. Yeah, it's, it's a nice one.
00:31:32
Speaker
So on this topic, maybe you can put a link somewhere or something. There's this F-sharp for fun and profit website, which has a lot of F-sharp content, of course. But there's also this series on property-based testing. And there is one part, which is, I forget the exact name, but it's something like property patterns for property tests or something like that.
00:31:54
Speaker
has a bunch of these common templates on how to think about stuff. You have the round trip property, which is very nice if you have two functions that are sort of inverse, as I said before, like if you have a render function and a parse function, for instance. You can take whatever you start with, apply the render function, get a string, then apply the parse function to the string and get back your original input.
00:32:24
Speaker
If you're writing a refactoring tool that wanted an idempotent, then no change in refactor.
00:32:32
Speaker
Yeah, well, that is actually a different kind of input or a different kind of pattern, which is sort of an item potency property, which is if you apply the same operation twice, you should apply the operation once, get a result, then apply the operation once again, you have the same result and you just keep keeping there. But the round trip properties sort of you have to have two functions that are
00:32:56
Speaker
Oh, yeah. I see what you're saying. So I was thinking a refactoring tool that parses the source code, does nothing, and then prints it back out. But you're saying you could also have the test where it's like, apply this refactoring five times in a row, and it shouldn't take place more than once. Yeah. Sort of a fixed point. Yeah. Fixed points. I can tell you're a Haskell programmer. You got fixed points into the conversation. Revealed. That's the tell. Yeah. And there are a bunch of others.
00:33:26
Speaker
I don't know if you can just come up with them right now, but I think this sort of, it's called something like hard to prove, but easy to verify, which is your example with the square roots. And down there is the really kind of big brain things. And there's one technique called metamorphic testing, which I haven't actually used it much, at least I experimented with, but not really found.
00:33:54
Speaker
where I needed it yet. Just for kicks, can you explain it? Yeah. Metamorphic testing is, if you have an input and you apply your system under test and you get an output, if you could slightly modify that input in some way,
00:34:18
Speaker
So kind of, let's say, make it larger in a sense, smaller or different in some way that you can know. Then you can apply the function or the system on that input and you get an output. And if you compare those two outputs, then you know that the output should be, let's say, smaller or bigger or something, should have the same relation. If you can express in a relation or transformation between
00:34:44
Speaker
One input and another then you should you can say that the output should also have a sort of matching Relation so the canonical example there would be if you have a search engine It's really tricky to test that with properties like what should it give back right? Yeah, yeah tricky but if you can if you do search you get some results and you do a research a new search with
00:35:13
Speaker
I don't know, a date filter on it, which wasn't there before. Slightly more constrained search than a result, which is a subset. If you can observe the entire result, of course, but the output then should be a subset of the other output, which was a bigger search.
00:35:32
Speaker
Yeah, and then you don't really care what the input is, and you don't really care what the system's doing as long as the relationship between the changing input. I feel like I'm differentiating a graph in high school maths now. Yeah, it's getting me. It's the rate of change in curve A, the same as the rate of change in curve B. Yeah, exactly. In this case, you don't care about either input or output, basically. You just know that, well, you have some way of modifying input.
00:36:03
Speaker
and seeing that the output behaves as you would expect. But otherwise, you don't know much about either of them. The sense I'm getting here is if you want to use property testing well, you need to be sneaky. You have to think differently, at least. It's a shift of mindset in many ways.
00:36:24
Speaker
But that's good. Again, as we said, it's like thinking, you write, this is one of the reasons why testers work as a separate department, you write some code, and there's someone thinking about it in a completely different way to you, putting it under test, it will be so much faster and more efficient and sometimes less embarrassing. If we could be that completely different person thinking in a completely different way and testing our code. Yeah, I think this helps to get us more into that.
00:36:53
Speaker
situation where we can test it not as we would think when we implement it.
00:37:02
Speaker
I'm thinking about your screencast editor. You have set this up so that it seems to be largely pure functions. You're not doing writing stuff to disk, right? Is that a necessary part of this technique? Do you have to try and find a way to extract the side effects of writing to a database, writing to disk before you can do

Quickstrom: Revolutionizing UI Testing

00:37:25
Speaker
this well? Right. It's a very, very good question because I'm sort of still in
00:37:31
Speaker
Pure function land, right? Yeah. And that is nice if you can keep your architecture and your system under test in that space, and your test can be faster, and it's just nicer to work with. But it doesn't always work, right? So as you said, the screencast thing was I didn't actually run the full UI. I just ran a model of the UI.
00:37:57
Speaker
or the representation of the UI, and didn't actually have real video in memory or on disk granular for properties. So yeah, it was sort of still purely functional. But you can absolutely do property tests with side effects. But there are some things you have to think about. You have to keep it isolated, for instance.
00:38:26
Speaker
So each test has to be isolated from the next. And I do this at work a lot, actually. There's this thing called test containers, which is available for a lot of different languages. But you can say, for this test, I want a Postgres Docker container running. And you can make it so that you always have a fresh database, basically, for each test.
00:38:55
Speaker
isn't the tricky thing there though, like property testing, you're often generating 10,000 test cases. Yeah. And you can work around some of these things with like having transactions that you always roll back each test. Yeah, it makes it a bit faster than running new Docker containers all the time. But still, you won't be able to run
00:39:19
Speaker
I don't know, a thousand tests below a second won't be possible. Maybe there are other queries you can do, but maybe we run like a hundred tests. But over time, because this isn't deterministic, over time you test more and more different cases, which is both good and bad. I can see why it's good. Why is it bad? It's perhaps not bad, but it could be frustrating when a test fails after two months.
00:39:50
Speaker
Yeah, we discovered that edge case that we didn't think about. Yeah, because you've been chipping away at the problem 100 test cases at a time. Yeah. Yeah. And that particular time in CI when someone did something completely unrelated, my property test failed. This is why I think it's kind of essential in property tests that you get this seed that gives you your receipt so you can replay, you can make it deterministic. Yeah.
00:40:19
Speaker
Yeah, you get a magic number that says play exactly the same set of random tests.
00:40:23
Speaker
Yeah, so that is useful when it happens in the AI, you can at least reproduce it locally. The person, I don't know, doing the PR might not be too pleased, but some other tests, you know, they don't touch the system, whether it's just start failing now. You've got unlucky to catch the bug that Dave wrote two months ago. Yeah, that could happen. That is perhaps the downside. But
00:40:50
Speaker
It does absolutely work with side effects as well, but you have to be a bit more careful or think about how you should set these tests up. And then there's the whole UI bit because I didn't run the full UI in the screencast editor. It was a GTK UI.
00:41:10
Speaker
But that was also nagging me, like, oh, does the GTK stuff work? I don't know. The heavy lifting stuff underneath tends to work because I have all these tests, but do all the buttons connect properly to everything and so on.
00:41:26
Speaker
Yeah, because you know that real unit testing department, the first thing they're going to do is mash the click button all over your UI and try and crash it, right? Yeah. And I have like zero test coverage for that code. And yeah, that that was knowing me a bit as well. Like I want to test the whole thing.
00:41:46
Speaker
end-to-end, black boxy kind of testing, which led me, after a few detours, into what later became this project called Quickstrom.
00:41:59
Speaker
Which is an excellent name, Oscar Wickstrom. I was forced to take this name for the project, but I do take responsibility, of course. Well, as an open goal, you might as well kick it in. This project is about doing this sort of end-to-end property style testing, but on stifle systems or UIs more specifically.
00:42:27
Speaker
which is what we did. Can I just stop you and ask why not Selenium? Because that's most people's answer to this, Selenium or something like it. Yeah, so mostly for the same reasons as any example-based testing is limiting. You have to come up with the examples. You have to think of, and especially with Selenium or whatever scenario testing that you automate,
00:42:58
Speaker
There are a bunch of problems, but one is that when you have this stateful system where you do sequences of actions, the combinatorics of that just blows up very quickly. And it's very tricky to come up with enough sequences that are interesting to catch a lot of bad behavior and bugs. And so that is one part.
00:43:25
Speaker
It's also kind of annoying to maintain because... I haven't experienced that. Yeah, you have timing issues which have gotten better now with other frameworks, I think with like Cypress and there are a bunch of them and they have more like utilities to wait for stuff and not have like fixed sleeps all over the place. You do have the problems with being sort of very...
00:43:54
Speaker
very tightly tied to the structure of the web app, for instance, which, to be fair, you still have some of that in QuickStream, but not as much.
00:44:07
Speaker
You don't have so much of the timing problems. You don't have so much of the coupling to the structure of the web page and so on. So what you do in Quickstrom instead, and which is like, I didn't want to write so many Selenium tests, is that you sort of write a property, but that is a bit different because I had this idea, like, I want to use a type of logic, which is called linear temporal logic.
00:44:37
Speaker
Right. I feel the rabbit hole has just opened before us. Yeah. So if you've heard about TLA+, this is sort of the common foundation. So you have the sort of the logic that you know.
00:44:55
Speaker
the propositional logic with and or implies all that. But then you extend that with some operators that deal with time. So you can say something like x and next y, which means that x must be true in this state. And in the next state, y must be true.
00:45:16
Speaker
Can you give me a concrete example for a web page? Yeah. So maybe if I click a button, let's say. So I say that, OK, the button is visible. And that's like x. The button is visible. And in the next state, if I have clicked, I could do like, if the action is click, then that implies that in the next state,
00:45:40
Speaker
I don't know, message should be shown, something like that. But maybe another more interesting one would be there is an operator call always, which says that the sub-expression should always be true in all states.
00:45:59
Speaker
So I could say something like there is always either a login or a logout button on every page. Yeah. Yeah. Should always show, I don't know, a link to a support email or something could be sort of, you can start thinking about it sort of as kind of business rules or requirements for your page. Yeah.
00:46:21
Speaker
Or it could be there's an operator called Eventually, so you can say, in the future, sometime this must happen eventually. Give me a concrete example. Okay, so if you click a button which launches, I don't know, an HTTP request, you might see a little spinner going on, and then you want to eventually get some result back, some data shown, or maybe an error message.
00:46:47
Speaker
Okay, I've written some code this morning, actually, that eventually should show a chart. Every async thing in a UI. That would be one example. And by using this logic with these temporal operators, you can express these requirements.
00:47:06
Speaker
And if you do this in a certain style, you can express your web app as the state machine, which fits some web apps. And you can say, basically, your specification says that this web app always goes from one state to the next in a valid way.
00:47:32
Speaker
That's the state machine kind of definition. And then you say, okay, what does it mean to go from one state to the next in a valid way? Well, you sort of, you just list all your valid transitions and you combine them with OR.
00:47:48
Speaker
And this is a very sort of TLA plus way of describing a state machine. So if you're saying if I, I mean, I would expect a webpage to, there's always a sign in button and then I do some stuff and hit submit. And then I get to my account page and that should, and then with the software, then go and try and find ways to get to the account page without me doing that.
00:48:13
Speaker
It wouldn't be so direct that it would be. So when you have the spec, you can just run quick stream basically. And it basically just does random things, but random possible things. So in a web page, that's the sort of the neat trick because you can inspect the web page and see what buttons are available right now.
00:48:37
Speaker
which are disabled or enabled, which links are there, what can I do, basically. So that sort of reflecting capacity is already in the browser by talking to the browser. And then Fixroom goes around and does random things. You can constrain that as well, but you can be very open. And then it checks while it runs, does the behavior of the web app agree with your specification?
00:49:02
Speaker
So this is like your video editor where you say, okay, I'm now in this state, what are the possible transitions out of this state, which in pace of web page are buttons. And then you're just clicking random buttons, going along random timelines and trying to find and checking that it doesn't break any of the rules as you go.
00:49:19
Speaker
Yeah. Yeah. Okay. Yeah. Exactly. So the nice thing is that you don't have to list all the valid transitions because the web page already sort of embodies the valid transitions. Yeah. Yeah. It's advertising its own state machine from a certain point of view. Yeah. It's the, what do you call it? Like hypermedia aspect of web apps. It sort of encodes its own state machine in the output of the HTML.
00:49:48
Speaker
Yeah, I mean, I've thought of web pages as being like state machines, but I've never thought of them advertising their transition states to you. Of course they do. Yeah, that's neat. Or you could build them not doing that, but that would be kind of bad. Like if you have a lot of loads of buttons, you can click and then like, no, that was a wrong button. That's not a valid transition.
00:50:08
Speaker
Yeah, we just give you all the buttons in the entire system and then complain when you hit the wrong ones. Yeah. I bet you there's one website out there that does it just for your bad design. Probably some time reporting thing.
00:50:22
Speaker
Yeah, but that's sort of the principle of quickstream. So you write this back and you say, what is correct behavior? And you can be very detailed or you can be very loose and abstract. You can just say, like, I only care about this login logout button. That's all. Do other things. Go around the website. Do whatever you want. I just care about this login button being correct, for instance.
00:50:46
Speaker
Or you could be very complete and say, these are exactly the transitions that are valid, and this should be the result after each transition and so on. Yeah, I think I've got this right. Isn't there a law in Germany that says you must have an impressive contact details page on every page? Well, I'm not a lawyer. We can imagine a rule like that where the government says, this must appear on every page, and you would just write the rule for that.
00:51:16
Speaker
Yeah. And that's it. And then it just goes around doing random stuff and checking that your little contact thing is there. And just to be clear, is it checking? I can set up multiple properties and it will test them all as part of the same journey. Yeah.
00:51:32
Speaker
So if I've got a thousand properties I want to check, it doesn't multiply the execution time by a thousand. No, you compose different logical expressions into one spec. Okay. So this is always a tricky question to ask in a podcast, but we have to try. What's that actually look like when you write it? What's the coding language ask you to say?
00:51:57
Speaker
Yeah, so there are two parts to these quicks from specs. One is this, it's called the proposition, which is sort of the, what is correct, the expression that describes the correctness, which has all these, the temporal operators and all that.
00:52:18
Speaker
You also have in those expressions you can write selectors, which are like CSS selectors, so you can get access to an element in the DOM.
00:52:28
Speaker
And then you can pull out attributes, properties, styles, stuff like that from those elements so that you can say, OK, this element on the page, this should have a text content that is food bar and the color is red or something like that. Or it has an attribute, which is x. So you can express all these assertions
00:52:55
Speaker
logical truth statements about elements in the DOM. And you just compose all that up into this proposition. The other part is that you can declare
00:53:09
Speaker
actions. So out of the box, Quikstrom has knowledge on a page. What are valid actions? But you can also say, OK, don't click all the buttons. Click buttons in this part of the page only when this condition is true in the current state.
00:53:35
Speaker
So you can constrain it and say, you can also change stuff to say, like, if I did this before, then do this afterwards. You can sort of direct it in certain ways. Okay. But most of that is if you have a webpage that very clearly encodes what is desirable sort of behavior, next step, you don't have to say much. You can just say, click anything, do anything. But if you want to be more detailed, you can be, be that.
00:54:01
Speaker
And it ends up looking very much like code or... I've maintained Selenium tests and it always felt a bit like I was poking around inside the back end of a UI that was supposed to record things. But does this look like more of a programming experience, declaring properties? Kind of, yeah. So Quickstrom started out with the first version, it was actually a pure script DSL.
00:54:32
Speaker
And it was a lot weirder than you might think because I actually built my own PureScript interpreter. So I just used PureScript to parse PureScript as a sort of front-end language and wrote this PureScript interpreter in Haskell for it. You don't like to do things the easy way. No, it was kind of a good fit.
00:55:01
Speaker
bit tricky as well. But the nice thing there was, at least in principle, you could use PureScript libraries. So PureScript has these kind of stubs or native parts that you have to write in the runtime language. So JS and the normal
00:55:18
Speaker
usage of pure script. So those I had to implement for certain libraries, but you could use like weird monad stuff or whatever you wanted to do. Common pure script libraries in your specs. That was nice because I got a lot of things for free, like string manipulation and stuff like that just kind of worked out of the box. But then we rewrote Quixram in a
00:55:41
Speaker
in a different version in which there were some limitations to the PureScript EDSL thing. So we decided to write a custom language for it. Okay, so there is Quickstrom the language as well as Quickstrom the tool. Yeah.
00:56:01
Speaker
And that has some shortcuts, and it does some trade-offs to be able to analyze the language or the spec for certain things. So we know statically all the attributes you ask for, or all the properties, all the styles, all that stuff, that you ask for on certain selectors. We can analyze that statically and optimize the queries, basically.
00:56:32
Speaker
Okay, yeah, so we needed that for a certain reason and couldn't really figure out how to do this with pure script so and Just syntactically it's a bit nicer, but it's sort of like a functional language in a sense. It's just one big Boolean expression, but yeah, okay, so you write these small specs and
00:56:55
Speaker
So we have one where I work, and it's like 90 lines, I think. And we have barely touched it in a few years. So it's pretty nice. But of course, the documentation is lacking. And we're in no way an editor support. I have some basic editor support. So it works in Emacs and IntelliJ. But yeah.
00:57:21
Speaker
Okay, depending on who you are, that either covers all the ones you want or none of them. Because that's editor wars for you. So what's the license for this? It is a BSC3. Okay, so it is an open source tool I can just download and use. Yeah, when I started it, I had some ideas of doing sort of a business side of it, and I would like a dual license thing. But then it's
00:57:44
Speaker
took another direction, so that is the current state. Sounds like our gain. Have you ever tried running on other people's sites and found bugs with it? Because presumably it's doing a lot of testing. Have you ever attempted to do that? Yeah, so when we did this second version of Quickstrom, when I say we, it's me and Liam O'Connor, which is an academic, and we
00:58:13
Speaker
We found each other on Twitter talking about, what was it? F-star and proving certain theorems around temporal logic and stuff like that. And we realized like, okay, this quick-stream thing is a nice academic project if you want. And we started working on that together. And this new language that we did and sort of the infrastructure and the model of it all
00:58:44
Speaker
that turned into a paper. Oh, what's the paper called? I'll put a link in the show notes. It's called like quickstrom testing. What is it called? Testing with linear temporal logic, something like that, though. That sounds like a properly academic title. Yeah. Pull them in there somewhere. Yes.
00:59:04
Speaker
And then you go to the web page. It says, Qwikstrom for not tearing your hair out over selenium. Exactly. As you advertise to two completely different sectors. Marketing language does not have a place in academia. So I got into the academic realm, the back drawer, in a sense. I haven't done much. There's better funding that way from what I hear. Maybe. But that was really fun. And we did a case study or a sort
00:59:34
Speaker
evaluation in that paper where there's this old project called TodoMVC, where it's been run for many years. Hundreds of different implementations of it, as people prove their ideas. Yeah. So, well, just short, it's TodoLists in like a hundred different front-end frameworks and languages.
01:00:00
Speaker
So I thought like that should be a pretty cool testing ground for our team. So we wrote one spec for all of them and just tested all of them.
01:00:13
Speaker
And that spec was really, really detailed. It was like, if you're in this state, this is a valid transition. It's just like, if you change the filter from viewing all to-do items to completed items, only the completed ones should be visible in the next state and so on. A lot of different. And I didn't even know to-do MSE has this edit mode. You can double-click items and you
01:00:37
Speaker
turns into an input, then you can edit them, and so on. And we just ran loads of tests on these apps, and we found that more than half of them were broken in some way. So that was fun. That's nice. I mean, it's not nice. It's initially terrible news, but in the long term, it's nice.
01:01:03
Speaker
Yeah. And I mean, some of them were broken and it's like, it doesn't even load anymore because someone had brought the server down or something, but most of them could actually be run. Uh, and they had all kinds of strange, we have, we had this table of 12, 13, 14 different types of problems they had. And some of them were really like, uh,
01:01:26
Speaker
kind of complex. Like if you did this, had this filter on, started editing, pressed escape, then change the filter, then it broke. Something like that. I don't recall all the details, but...
01:01:37
Speaker
able to generate something like that without having to specify, as an example, that set of steps. That's nice. Do you get this property that most property testing things have? I shouldn't double use the meaning of we wear property. Do you get this receipt? Do you get this token that says, and if you want to reproduce these steps that crash this app, rerun the test suite with this magic number?
01:02:03
Speaker
We don't with Quickstrom. We don't have that seed. Can we have that in version 3? I mean, we could add that, but there's actually no guarantee that you can reproduce it.
01:02:15
Speaker
And you could end up in that situation with any property-based testing framework, really, if you have non-determinism. But there might be sort of timing aspects for stuff so that even if we apply the same actions in the same sequence... I see. Do I at least get like a report of what sequence of steps led to that point?
01:02:35
Speaker
Yeah, you get there's this, there's a textual one in the console, but there's a more usable one, which is an HTML page. It spits out where you see state action state and you can sort of next, next, next. And you can go through this trace and see what happened. You see the state of all the elements that you've queried and so on. So you can inspect what went wrong. Okay.
01:02:58
Speaker
And it always, it always ends the test once it's found a problem.

Reflections on Property Testing

01:03:04
Speaker
So it doesn't always run like 100 steps. When the spec fails, whenever that happens, it ends there. So you know, sort of in the end of the trace, there is something going on probably. Okay. And how does it know when to, when to stop if it doesn't find a problem? You just say run for five minutes or
01:03:25
Speaker
Yeah, you can specify these subscripts, but you can say, if you have an always operator somewhere, you can say always, at least for 100 steps. So you can specify some time constraints. Or if you have an eventually, you can say, wait for at least 10 steps before you give up. And then there's an interesting aspect of,
01:03:54
Speaker
with eventually, if you have a condition that you expect to eventually hold, and it doesn't, it might not mean that it would never hold, it just means that you gave up to Zoon perhaps. So the results of the tests are definitely true or false, or maybe true or false.
01:04:16
Speaker
It's a bit tricky. So if you say that eventually the spinner should turn into a result and it doesn't and the test gives up, then it says, well, maybe that was false.
01:04:28
Speaker
Yeah, like eventually Spotify should show that this podcast has a million subscribers, but I might have to wait for the test to run a bit, right? Yeah. I think that probably gives me all the information I need then to at least go and test my code that should eventually show a chart.
01:04:51
Speaker
That'd be a good little property to start with, right? We could do that request from, I think. Yeah, OK. I'm going to give it a spin. Yeah. Groovy. Oscar, thank you very much for taking us through it. Yeah. Thank you for having me. It was very fun to reminisce about some stuff that I've forgotten, but some details. But it's very nice to bring up.
01:05:13
Speaker
Yeah, we are all both the programmers we are today and the history of the things we've programmed in the years gone by, right? Yeah. Yeah, absolutely. Cheers. Catch you again. Thanks. Bye. Thank you, Oscar. And I have to say, Oscar's my kind of geek. He's got a bit of academia.
01:05:31
Speaker
He's got a bit of business and the thing connecting the two is just someone that wants to build and tinker with stuff and learn. The world needs more people like him, so yeah. In fact, I think the world needs more people like us. We're all that kind of person, aren't we? I think we are around here. By the time we've reached the end of the podcast, I think we're all like that. So if you want to feed those parts of your soul, head to the show notes where you'll find links to papers, tools, sites, software,
01:06:00
Speaker
All the ideas we just discussed. That will give you something to chew on for the coming week. While you're down by the show notes, please take a note of the like, share, rate, subscribe buttons. I was looking recently. Of course, I look at the analytics. I was surprised to see how many people share this podcast. So thank you to those who have done. That's a heck of an endorsement. When you send something to a friend and say, hey, look at this. Thank you. I appreciate that.
01:06:28
Speaker
We'll be back next week with another episode. Of course, I'm actually playing catch up a bit because I just spent the week in Montreal at a really excellent tech conference. Confu would recommend, there'll be another one in 2025, so take a look. Full of ideas from that, lots of potential guests, but lots to catch up on, so I better get on with it. Until next week, I've been your host, Chris Jenkins. This has been Developer Voices with Oscar Wickstrom. Thanks for listening.