Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Recording and Replaying the Browser (with Justin Halsall) image

Recording and Replaying the Browser (with Justin Halsall)

Developer Voices
Avatar
2.4k Plays5 months ago

RRWeb is based on a simple idea: If you capture all the DOM events in a browser session, and when they happened, you could play it back later. Play it back for diagnosing error conditions, for understanding your user’s journey, or for creating demo videos that can be edited element-by-element instead of frame-by-frame.

Unfortunately, the simple idea gets tricky when you try to implement, for a whole host of browser specific glitches, differences, and places where the HTML5 spec ran out. It’s exactly the kind of project where might want to use it, but you want someone else to maintain it!

Joining us this week is Justin Halsall—a chief contributor to rrweb—to teach us about some of the more barren corners of the browser spec, how he’s fought through them, and what the benefits are on the other side…

RRWeb homepage: https://www.rrweb.io/

RRWeb on Github: https://github.com/rrweb-io/rrweb

RecordOnce: https://recordonce.com/

Support Developer Voices on Patreon: https://patreon.com/DeveloperVoices

Support Developer Voices on YouTube: https://www.youtube.com/@developervoices/join

Justin on Twitter: https://x.com/juice10

Kris on Mastodon: http://mastodon.social/@krisajenkins

Kris on LinkedIn: https://www.linkedin.com/in/krisjenkins/

Kris on Twitter: https://twitter.com/krisajenkins

0:00 Intro

3:10 What is rrweb Doing?

6:12 Beginning With A Naive Implementation

9:49 Supporting Canvas Tags

13:05 Exotic HTML 5 Tags Like Midi

14:31 The Internal Data Format

17:39 How Reliable Can This Be In Practice?

23:04 Cross-Browser Support

24:32 Exploring The Use Cases

30:17 Privacy Issues

33:46 Analyzing User Interactions En-Masse

36:40 Is The Spec Greater Than The Tool?

38:20 The Practical Benefits Of Contributing To Open Source

44:45 Updating Recordings After The Website Changes

49:55 Playing Well (Or Badly) With Popular Frameworks

53:21 The Runtime Burden

54:17 What's Coming In The Future?

1:01:02 Outro

Recommended
Transcript

Challenges in User Behavior on Registration Pages

00:00:00
Speaker
Every web company I've ever worked for has wished it could capture what was going on in the user's browser for lots of different reasons. The first one always being, we've just launched, people are going to our registration page of our new site, but they aren't signing up. Why? Is it broken? Is it too confusing? Is it too long? Are people clicking around for ages and not being able to submit? Are we asking them for 18 different fields including their date of birth and their gender when all they're trying to do is buy a light bulb? It's probably that actually. But it can be maddening to know that something's going wrong with the formula and not to know what. You wish you could peer over their shoulder as they were doing it.
00:00:45
Speaker
As an extension of that, you know you really wish you had more context if the page crashes.

Introduction to Justin Horsall and RRWeb

00:00:50
Speaker
If you're using one of those services that sends JavaScript stack traces back to a central server, you're doing better than a lot of people, but you still wish you knew what your user was looking at, what they were clicking on, what their journey was at the point it crashed. you might wish you knew the state of the dom the the more data you can get about a crash the better and in theory that data should be really easy to capture the dom is just data css is just data there's a mutation api you can tap into you should be able to record this stuff.
00:01:27
Speaker
In practice, it's a can of worms. It actually sounds quite horrible. And this week's episode is all about how someone is doing that difficult work, and I'm glad it's not me. and Joining me to wade through the morass of subtle browser differences is Justin Horsall. He's a major contributor to RRWeb, which is a project that lets you record and replay what's going on on web pages. For any reason, from gathering information for debugging purposes, whether they're technical debugging or usability debugging, to creating things like content aware demo videos, where you can tap into the DOM as the video plays.

Privacy Concerns with Web Interaction Recording

00:02:07
Speaker
It's a very handy project. It's a potential privacy question, which we'll get into, and it's an implementation that's filled with war stories and interesting browser trivia. So let's get into it. I'm your host, Chris Jenkins. This is Developer Voices, and today's voice is Justin Horsall.
00:02:37
Speaker
I'm joined today by Justin Horsall. Justin, how are you? I'm doing well. How about yourself, Chris? I'm good. I'm good. You're over somewhere just over the English Channel, aren't you? Somewhere in North Europe. Yes. I'm in Amsterdam at the moment. Amsterdam? Yeah. Ah, that explains because that's where we met. You didn't have fun. Yes, that's true. yeah Yeah, no, I came by bicycle, I think. No, that's not true. Normally I do, but does that was the the exception that night. One day I caught someone who lives in Amsterdam not using a bicycle. Yes. Awesome.

Cross-Browser Consistency Challenges

00:03:10
Speaker
So you you you sound far too casual for what it is you actually do, which sounds to me like a real
00:03:19
Speaker
Sisyphian task, which is pretending that all browsers are basically the same just to try and get what you're doing done. but yeah take me so So you've created RRWeb. Take me through what it is and how it does what it does. Let's start there.

How RRWeb Records User Interactions

00:03:37
Speaker
Yeah, for sure. So ah what RRWeb does is is it basically records anything a user does in a browser. and And so that's user interaction, like clicking, typing, that sort of stuff. But also, all of the HTML and CSS, all of the mutations to the HTML and CSS, and then what it does on the end is it replays all of that um in the browser again. So it creates an iframe, and then everything that you did and it recorded initially, it will it will replay that. There's a bunch of companies using it for analytics.
00:04:16
Speaker
It started out as kind of like a hot jar ah cloned by one of the other creators of our web. And we use it for for creating now video tutorials, basically. But yeah, ultimately, it's it's like just recording absolutely everything that goes on in the browser. And and then I'd seem like we're doing it the first time. When I first heard about that, I thought, that set on on the surface, that sounds really easy, right? Sure. There's a DOM event API. You tap into that. You just record what it says and then replay it. And I just have a feeling like it's one of those things that's simple in theory and in practice. It's an absolute misery of slight browser differences.
00:05:02
Speaker
Oh, for sure. yeah yeah and even like Even some browsers, they will um they will lie about the state of what they're how they're in. Oh, and have fun. Yes, and then and then they'll give you back something else. And if you put that back into the same browser, then then it will break. so it's like And this is specifically has to do with CSS ah values, where um you give Chrome like a prefix dash webkit CSS value, or CSS attribute, I should say. And then it gives you the, oh, you didn't use the dash webkit piece.
00:05:40
Speaker
um And then once you go ahead and play that chromatic doesn't actually understand it without the without the dashed work value. So it's like it's so it's a total mess of of just all of these weird edge cases and. Yeah. So that's the browser side of things. And then there's the there's someone's building a website and they've got a bunch of different libraries in there, which totally mess with the with the whole state of the browser. So that's that's also pretty yeah pretty tricky. I can believe that. We need to unpack all of that. yeah sir start Start me off with a naive implementation. if i how far What would I do and how far would I get before I gave up in despair?
00:06:23
Speaker
So you'd start with the mutation observer on the reporting side. um So yeah, like you said, mutation observer, um it will give you, you can ask it to kind of subscribe to all of the different changes to the DOM that happen. And it will it will give you that. You can start recording that and then and then start replaying it on the other side. That will give you kind of like a naive implementation. um And then really quickly, I'm going to bump into like, okay, well, what happens when there's an iframe there? And okay, with same origin iframe, then we can put another mutation observer inside of that. And then what happens with shadow DOM and what happens if it's cross origin iframe, you need to run the same script inside of that again. um And
00:07:11
Speaker
Yeah, there's and then so that's just the HTML and changing over time part. You need to record the CSS. um Is there a similar API for that? No, no. No, lovely. it says So it's like, ah well, actually, it's not completely true. You can use the mutation observer, but only if um only of it the CSS has been, um if it's put in with a ah style element and and the CSS is has been added as a text, as text node. And if that gets changed, then you do get
00:07:52
Speaker
the mutation observer will pick that up, but that's not the only way you can add CSS. um and there's yeah There's a whole bunch of different ways to do it, and and you need to also catch all of those different different ways as well. See, if I were, if I were building this for something like unit user testing, I would just say, okay, we're supporting one very specific browser for a very specific set of use cases. yeah And I would just despair of actually trying to support multiple browsers for all possible use cases and all possible libraries that the end user, that the company is putting in. to Take me through your madness.
00:08:35
Speaker
Well, that's also kind of the power of open source as well. it like

Enhancement through Open Source Contributions

00:08:39
Speaker
So i'm I'm working on this use case that works for me, and then somebody else is working on that use case that works for them. And then someone else is like, well, let's just run it inside of a test environment in like JS DOM or something like that, which isn't a browser, but just mimics a browser. And then that will get us all kinds of other weird use cases. and then And then as long as somebody else goes and solves that. and i I'm more than happy to accept it. and and
00:09:05
Speaker
yeah and and So what you're doing is very much like community on the brute force side yeah trying to make things okay. Yes. yeah yeah Yeah. And so I was naive enough ah because I'm not the creative, the initial creative of our web, although I've I've added probably more code than anybody has in the last couple of years. um But I was like, oh, let me just build this myself. and then i found out Luckily, I didn't go ahead and start building myself because, yeah, it's it's it seems doable enough. And then and then you just start like bumping into all kinds of different edge cases and
00:09:45
Speaker
use cases and things. Yeah. Yeah. I can believe I'm thinking I want to test you on some of the other edge cases. So canvas tags. Is that just a black? guy It was, um, until we started doing kind of like a man in the middle attack, uh, with regards to, uh, canvas elements. So, um, for, especially for WebGL, it does kind of, it does kind of work. Um, so what what we do is, um, we will ah monkey patch all of the whole interface to to the canvas elements, and then you send it a command. You say, okay, well, I want to paint this and that over there. um and Then what we do is is we go ahead and we we save that command, and then on the playback side, we just replay it as just the way you called them, and then actually you get how you get something that looks so it looks pretty good. Do you get something that looks pretty good in theory or are you actually saying it works in practice?
00:10:44
Speaker
It does. It works in practice. The problem with it is people call this every single frame, probably with some pretty big bitmap images or or objects. Um, so the file size, so it's growing crazy. Um, so that's one issue with it. Another issue is that, um, WebGL is not like every single browser. can have slight differences to it. So you're recording in one browser might not actually actually work that well in ah in a different browser. So that's that's definitely a ah downside to it. So we created another way of recording Canvas, right which is so you interrogate the Canvas object and you say, OK, well, give me back um you make everything you have in in pixels. And then we just do that.
00:11:32
Speaker
every, like we do that 15 times a second or something like that, basically just crazy shots. Yeah. Recording a flick book of what it was showing. Exactly. yeah Yeah, exactly. That feels like it would still bump up the file size a lot, but perhaps not quite as much. Yeah. So in certain cases where you don't have constant animation, that's a lot that's a lot nicer because we can just check, okay, well, was this the same as the last one? Oh, so you do this as well. Yeah, we do this. yeah okay And if it's the same, then we don't want to sleep. We don't want to send it to the server. The downside of that one is um if you have a big canvas element and you say, OK, well, two data URI, it then freezes the whole browser of for the moment while it's doing that. So it's like, OK, well, we can't do that. Let's do this on a web worker. um And then you have the whole web worker overhead, which means that we can't actually do it. like
00:12:31
Speaker
15 frames per second. We can can probably do it a little bit less anyway. So that's that's the downside of that again. And then there's a third way with WebRTC. Yeah. Anyway, so so like you said, Canvas is a black hole, but in this case, which requires the time.
00:12:56
Speaker
Okay. I'm already instantly glad that you're trying this project and not me. and Let me try another one. Push back on me if this is too obscure, but it's on my mind and you just mentioned time and timing. There is a MIDI API in the browser to pick something exotic where you can send data to and from musical instruments. yeah And that that's simple data. It's low bandwidth, so I believe you can capture that, but it's very timing sensitive music. yeah do you Do you have anything special handling that?

Complexities with MIDI API and Salesforce

00:13:31
Speaker
Have you tried to handle it? We haven't really, but the way that I would do that is we would, but so in certain cases where the browser is like, well, you know, we're not going to help you with that, like Canvas. um Then we would we would go ahead and and monkey patch the the interface to it. um So in the case of mini controller, we'd probably do that as well. We'd just monkey patch and and then whatever you would send to it, we would root save him and grab the timestamp and then send it on. But of yeah, i I'd be curious to see how far we're off every so often. you know And so for visual playback, it doesn't really matter that much. But for something like MIDI, it might.
00:14:20
Speaker
yeah Yeah, I guess you've got more wiggle room on visuals. Yeah. Yeah, that makes exactly. Because the browser is like doing its thing anyway. So it's like, yeah. Okay. Maybe since you've talked about file sizes for, um, for canvas tags, maybe we should dive a bit below the hood and think about how you're storing this data. Give me the internal file format and we can pick it apart from that. Yeah, for sure. So, uh, we're storing this whole thing in Jason. So it's, uh, Yeah. So ah yeah, so basically we're turning all the HTML, we're turning the whole DOM tree into a tree of nodes. And then for certain other things, we have to certain other ways of of saving it. But we also have additional nodes that get added or attributes that were mutated. And we basically have like an ID ah that tracks the the location of all of the different nodes.
00:15:20
Speaker
or another location, but I like what that note is. Um, and, uh, we kind of have something similar to video where it's like a false, um, like a key frame or we call it full snapshot. And then the changes are happening in like incremental snapshots. So it's like a diff on top of the, on top of the, uh, the full snapshot. Right. So capture the screen, diff for a bit, and then you can resynchronize if you have to. Yeah, exactly. How often are you resynchronizing it? We don't. Um, I do know some, uh, some people that use the open source project, they'll do like every second or something like that. But I think it has to do with, with certain use cases where, yeah, our web is just, um, uh, just buggy for that use case. And then this is their way of kind of like working around it basically.
00:16:17
Speaker
Okay. That's on the recording side. Yeah. So you end up with this large JSON file, which yet must look like, like an MP4 file under the hood, like semantically. Yeah, I'm guessing. yeah Yeah. Um, and it's turning the dome into an event, an event system you're capturing. it at This suddenly is reminding me of one of my friends, Kafka. Yeah. Yeah. Okay. Exactly. Yeah, and we, the moment that we get the event, we we send it through to the server. People just something else with that, but we try and send it as quickly as as quickly as we can get it, get it out, get out of the browser because the moment that you, uh, for example, do a page navigation, boom, your context has gone and, um,
00:17:05
Speaker
All the events that you still had, ah well, you know, if they weren't sent to the server and then they're lost basically. Okay. That must make it enormously difficult to get reliable recordings of a session.
00:17:19
Speaker
Yeah, there's, uh, it can be. Yeah. We spend a lot of time ah creating like, uh, power to your tests and and all that kind of stuff to mimic certain, certain situations. So that, uh, Yes, so that we can make it as reliable as possible. Okay, so I have to ask, how reliable is it in practice? I've played around with your web demos, but out in the wild. it So it depends, it really depends on the website. So certain websites, it's like, it's flawless, pretty much. um But there's certain other ones where it's just
00:18:00
Speaker
Yeah. Uh, just not very reliable. So for example, um, but that has more to do with, uh, the, what technologies people are using. So if they're using technology that we haven't anticipated yet, um, yeah, that's that. Um, but for example, sales force, uh, they have this, uh, this thing called LWC, uh, lightning web components. Um, and they, they, they basically use, um, ah What's it called? o good Shadow DOM. Except they've monkey patched absolutely every API for it, and they're doing something else. So so if you interrogate it and be like, okay, well, you know what ah like how many children do you have? or what Who's your parent? Whatever. Then it will give you something else back, to and then it's actually happening in the browser. so
00:19:00
Speaker
oh Is that because they've monkey patched it badly or? Yes. Okay. Yeah. Fair enough. Yes. And then I, I, I've been going through that project through the code of that project. And there's a lot of comments like non-standard, but probably won't matter. That's the comment about it matters to me quite a lot. Yeah. Excuse me. But that must come up a fair bit. Right. I mean, like. the node package library. There are so many packages out there of quality ranging from superb to ridiculous, right? yeah for sure yeah do you What do you do? Do you just verify it with the most popular ones? do you how do how do you How do you monkey patch without both stepping on other people's toes and have them constantly stepping on yours?
00:19:55
Speaker
yeah so
00:20:00
Speaker
we
00:20:02
Speaker
ah so we Well, one is when we monkey patch, we try and not change any behaviors. We're really just there to kind of like read what the messages that comes in and then and then send it through to to the browser. So if there's, for example, maybe there's two of our webs running, then and we we'll do that. We'll read it and then send it through to the next one, for example. okay because that happens as well sometimes. We've become pretty popular right now, so there's multiple tools that are sometimes run on the same website. So but use us so for example, Sentry might use us for error tracking and then it could be like
00:20:45
Speaker
post hog, which is using us for analytics, and then it might be running at the same time. okay So we try not and we try and yeah just to like don't change anything. Don't change any behavior. It's just like grab grab it and send it on. But for in the case of somebody else, like monkey patching it badly. ah For example, date. We mentioned time before. um Some people might monkey patch the date object, so it gives you back something completely different. It's like, yay. all right So well what I recently did is is i um injected an
00:21:29
Speaker
an iframe into the into the DOM, which is ah pretty dirty, but it was the only way to do it. And um on I got all of the all of the elements and all of their kind of like all their um ah getters and all their functions that are part of the element object or the node object. And I copied a reference to them, and then I removed the iframe. And then that way, whenever we need to actually go ahead and ask, for example, an element about who's your parent node or parent element, instead of just calling top parent elements on it, we divide that reference that I recorded from the iframe.
00:22:16
Speaker
im yeah They often haven't gone through the trouble to monkey patch that one, um because there'd be a lot of extra work and nobody would but yeah ah get any benefit from it. So that's that's one way of kind of like working around some of these nasty monkey patches. Right. Open a claim room and just grab references to everything you know is untouched. Yes. yeah Oh boy, it's just this is reminding me of the early days of the web when people use jQuery, not because it was like a great framework, because it was here your only hope of maintaining a stable, sane interface across different browsers.
00:23:00
Speaker
Do you get that? what's what What width of browser, do you not width is the wrong word, what spread of browsers do you support? Are you just like the big three or are you going into like phones? Yeah, we we

Browser Support and Practical Applications

00:23:13
Speaker
basically do. Well, we basically do like, can I use has like, Or there's an open source ah library which tracks compatibility. um And then you we're basically on the ah default version, which is the late 96% support or something like that. um The only one that we don't do from that list, so that doesn't include By the way it doesn't include it into the explorer anymore so we don't support that anymore but we did until pretty recently. um But the only one that we exclude from that list is Opera Mini. um Because Opera Mini, everything's running on the server so we we can't monkey patch. We that we have no mutation on server as far as I know. So there's there's no way of of us doing doing it so it's fine.
00:24:09
Speaker
five that's for in and yeah In theory, we do support everything and and even if somebody has some sort of weird use case, some edge case that they're happy to fix themselves and send a pull request for them, we'll be happy to accept it, basically. yeah yeah The joy of open source, you can always tell people to fix it themselves. Yes, yeah exactly. Okay. You've hinted at this, but give me, give me some, cause the obvious way I would think this would be used is like recording user testing sessions and then playing it back and realizing, oh, sign up form is ridiculous in practice. exactly yeah That's got to be one of them, but how are people using this?
00:24:53
Speaker
Yeah. So, so that's, yeah, that's definitely one of them. And that's definitely, I think, um, that was for a lot of people, that's the main, uh, main use case. But, um, we're also like century, as I said, they're using it for whenever a bug happens. So, so say error message pops up, then, um, I believe in their use interface, they're going to give you the last, um, 60 seconds before that ah bug happened to know they'll show you. I think this is also the way that Datadog is using it. um There was a project which um
00:25:33
Speaker
Oh yeah, so there's a couple of projects which um basically are helping the elderly. So they get stuck and they're like, help, I need help. And then what it does is it streams their browser live to somebody else who can then see you know what's what's going on and and then help them in that way. having this ah so Is that two-way out of interest? Can they then send back moving the mouse for them? Yes, they can. yeah Yeah. We didn't know my father-in-law, but that is that is something that some people did. Yeah. yeah Okay. and Then for customer support, this is also a great use case. It's not just the elderly which show which could benefit from that. um and Then the thing that we're using it for is um we want we want to make it really easy for you to create video tutorials. um Yeah.
00:26:26
Speaker
And then what we end up doing is, because we know a lot more about what's going on, ah we can actually do a lot of editing for you. And we can actually identify that, hey, you made a typo when you were filling in that form. So we can give you a nice paste animation like you did it correctly on the first time. And we can also change a lot of the DOM after the fact. So for example, you you you're showing your actual email address, and you don't want that in your customer-facing videos. Or you can just I'd say okay well i want to replace john at dole.com ah with something else ah you can do that and then our app will be able to to to fix that basically.
00:27:09
Speaker
okay so For that to work, the suite of it must be something that records all the events to your big JSON list. Something that plays that back as though it were the original website. Presumably you've got some editor that says, here's all the events. Do you want to regret some of them? ah Yeah, yeah, pretty much. Yeah. Yeah. Yeah. Yeah. So we have, we have kind of like a two, two part editor. So exactly that. So, uh, what, For example, text nodes, would we like to replace or edit or whatever or translate and because that's also something that pops up. So oh yeah so some of our customers are like, OK, well, we'd like a video, but in 15 different languages. OK, well, we can do that.
00:27:55
Speaker
So that just replaces all the text nodes pretty much. um That's a simplified version, but that's that's pretty much what it does. And then the other side of it is is editing editing the the video, which is like cutting the pieces out, which were basically there was nothing happening. or slowing parts down or identifying other things which which are not very interesting and moving them in. So that's the that's the video editing part, but it's really just going through ad JSON files and and identifying these different situations and stuff like that. Yeah, I can see that term, what happened into data, and then you can reject that data back out in different interesting ways. Exactly, yeah. Doesn't this
00:28:44
Speaker
Doesn't this bring up a problem that that you've mentioned? Like I record the screen, me typing in my email address and I decide I don't want to use my personal email address in the final thing. yeah So I go to the editor, I change it to bob at bob.com, which is fine for about two seconds before the snapshotting mechanism that you mentioned, reverts the DOM back to exactly what it was a second after I was typing. How are you handling that case? Well, that's fine, because ah we we basically track the the string that you wanted to replace. um And we just go through the whole, all of the events, and we replace them replace it everywhere. So for example, if you're typing in a form field, and and you'll type it, and it maybe you hit Enter, and then it pops down to the, maybe it's to-do list or something like that.
00:29:42
Speaker
Oh, of course, yeah. And then we need to make sure that we don't just replace what was in the form, but we need to replace what's in the to-do item, for example. Right. So we really got to just go through everything and think that the waste doesn't work. So you have to be kind of contextually aware of the way these different forms get used. Yeah, exactly. OK. So what about within all that? Let me think. Where do we go next on this? The obvious problem that next brings to mind is this could be a privacy issue, right? I mean, how much can people use this to record their users?

Privacy Issues and Implementation Challenges

00:30:28
Speaker
And what's the ethical boundary on that? Yeah, yeah, for sure.
00:30:33
Speaker
um So one of the things that we do out of the box is password fields we just don't we don't record. So if you're typing and anything to that, we we basically anonymize anonymize that. um And there we do have some privacy mechanisms to say, OK, well, i I don't want to record this type of thing. ah You can add like an RR block or our mask class to different ah form fields. Okay, so you can get cooperatively. Yeah, exactly. um But at the end of the day, it's ah it's a tool and um and you could use it for whatever you'd like. So yeah, if you if you have bad intentions, you can totally create a privacy privacy issue with it. The thing that I must
00:31:23
Speaker
um ah most kind of like concerned about is that we haven't had enough time to really we delve into getting the best sane defaults. So I'm i'm afraid that um that people might use it out of the box and not think about it and record things. that patientt that's That's the thing that I'm most afraid of. That's on my list, but not I haven't gotten to that, unfortunately. Yeah. Yeah. I suppose there are things that like almost impossible for you to guarantee. Like you record the events, you're sending them back to their server. They really should be putting that over HTTPS, but I guess you can't guarantee that. No. Yeah. Okay. So caveat emptor, I suppose. Yeah, for sure. yeah Yeah. And there was a study ah not too long but ago about like the top. What was it?
00:32:21
Speaker
100 websites and all of the different tools that they were using and and how many were we basically recording information that they shouldn't have. and I have no idea how much of this is our web, but I'm sure that ah that this can also happen with our web. It's something that I do tell people about when they start using it. It's powerful yeah in many ways. so I suppose in the end, as a user, you are giving that data to the website you're on anyway. I mean, there's an implicit trust from the very moment you join the website. Yeah. And that's going to be one of the bad things. Yeah, I think it really has to do as well with like, if you're paying for a service, then generally, and this is different for a person, but this is kind of something that I noticed people generally okay with the service, kind of like tracking what they're doing and improving on, on top of that.
00:33:19
Speaker
Um, but a lot of free services where we'll be tracking what you're doing. People are not just not happy with, uh, with, uh, with that basically. ah Yeah. I guess it comes back to that thing is if you're paying them, you're buying a product and if you're not paying them, you're the product. ah Exactly. okay Well, let's think of something a bit equally wide scale, but perhaps less, mait hopefully benevolent, yeah right? Because this is a problem I faced like with web startups. You've got, I would like to be able to record people using our registration form, right? yeah And maybe I've got a few thousand of those. I'm not planning to look through them all at once. What I'd like to do is say, give me
00:34:07
Speaker
In theory, I should be able to do this, can I? I say, okay, give me a summary mouse pointer cloud of all the places that people clicked on my registration form, and then split that out by the people that successfully registered, those that clicked the wreck the submit button but didn't register, and those that didn't even click it. Because I have a feeling that Earth 3 cohorts would teach me something useful. Can I do that? yeah Um, it depends. So with certain tools that use our, you definitely can. Um, it's because we're more like just the recording tool. Um, the data analysis part is, it's kind of up to, it's up to other people. Um, but something that a lot of, something that's very easy to implement this is just track how often someone clicked and if they click very often.
00:35:02
Speaker
in In sequence, then probably they're really frustrated and and you should check out the recording. ah that he had rights click yeah So that's that's one thing that a lot of the the the the tools or the companies that use our web, but that's the first thing I'm going to see. It's not because we very easily pull them with the data. okay So we really are getting into a kind of a Kafka space where you've got this big event log and then you can do all kinds of interesting stream processing on it. It's up to you to find the question you've got to ask.
00:35:39
Speaker
Exactly. Yeah. Okay. Okay. So I, I could in theory come up with a transformation that went through all those log files. And what would I do create a synthetic file in the same format that just seemed to have lots of different mouse pointers? Can I either do that? A lot of people, um, yeah. What a lot of people do is is they will add a custom ah ah custom event with the same timestamp as the like the first moment when, for example, one of these rage clicks happened and the last moment. um And then that would, in the player, you can kind of highlight these highlight these different events. and
00:36:29
Speaker
And then that would yeah that would that would be one way of kind of like tagging um when something, something happened. But yeah, like you mentioned, there's a lot of different ways of doing it. Okay. So that this this makes me wonder, to in your mind, to what degree is the product, the tool, a thing that reliably captures the browser and turns it into events or I mean, you could almost imagine a situation where the data description, the event log, that was the real product. just Will there be a spec for the data format? Will that be published in the thing that people build on top of independently of you? a good that's a good question That's a good question. At the moment, because we're using TypeScript, thoughll all at the whole event format is is just, but we have a lot of types for it. so
00:37:24
Speaker
you couldn That's my of my preferred spec. Exactly. You can kind of use that as the spec. um So that's that's what most people do at the moment. But but we don't really have an we don't really have an official ah official spec. And so far, it's always been we've we've always added to it. We've never kind to like break broken backwards compatibility with it. So yeah, anyway, but's that's all but but that's the depth we've gone into to kind of make make it a spec. But yeah, you can turn it into all kinds of all kinds of different things and people do. Yeah, I can see potential for that. But you've already ah like giving you more work when you already you've got plenty of work just capturing it reliably.
00:38:09
Speaker
So tell me a little bit more about because you you're a major contributor to the open source project RRWeb, which is this record and playback.

Justin Horsall's Journey with RRWeb and RecordOnce

00:38:18
Speaker
And then you've got record once is the kind of alternative to doing videos of my website, right? Exactly. Yeah. Tell me a little bit about that. did was it Is that your business and was it like business first, then open source, or is this your attempt to say, hang on, there's a commercial angle here? Well, um it was really business first, open source. kind of ah I kind of found the project while I was building record once. I was like, okay, but the way that we the way that we do these recordings is is kind of broken. There's too much kind of like human
00:38:57
Speaker
non-creative human tedious effort that's going into these videos we should find a better way um so that's what i started out so so i'm doing a record once i found an hour and then. Initially, I was like, I'm building a startup. I'm not raising any venture capital. I don't have money. I don't have the budget to contribute to open source. That's that's what initially what I thought. um And so when I found a bunch of bugs in our web, I was like, well, the fastest thing to do is just fork it, fix the bugs, and then that's it.
00:39:36
Speaker
um And so that's what I did initially. And, um, then, uh, somebody else fixed exactly the same books and somebody else added some features that I had and ended up needing. So I was like, okay, well let me. merge what I have with with upstream. and Then now I'm trying to figure out who solved the bugs in a better way. um and It's nearly never me because the other one the other side had had like gone through a pull request process and um and and more angles were considered and stuff like that. and ah so The initial bugs that I fixed ended up
00:40:20
Speaker
taking so much more time just on on being on merge issues, basically. And so I was like, OK, well, this is clearly not not a strategy that's working. So instead, every bug I found, let me just you know send send a pull request for it. And then at a certain point, I get into a conversation with with the the the creator of our web, and he's like, you know quite a few requests coming in there. Do you maybe want to be part of the core team? And I was like, yes, that'd be great. oh um Yeah, so that that that was basically how I got into
00:41:00
Speaker
and
00:41:03
Speaker
yeah contributing to open source in a major fashion with with our web. um and Since then, i've kind of my my belief of a small company, we don't have we don't have money for that. We don't have time for that. We don't have resources for that. It's completely changed. so like Every single feature I create ah goes into main first before it ends up in in our product. That's how we've really switched it around, basically. That's the first time I've ever heard a kind of reversal of the tragedy of the commons argument. Yeah. You know, where it's actually easier for you to become part of contributing more generally. Oh, that gives me hope. It gives me hope that the universe might not be falling into a fiery heat death anytime soon. Yes. Yeah. Yeah. But it does take, uh, it takes, uh, yeah, as I say, it takes a village. So, uh, so now I'm also reviewing other people's pull requests and there'll be times when
00:42:02
Speaker
when I am too busy and and and I can't, similarly, because I've got a lot of core team members, but you know like life happens. So there's there's moments where where I'm sure people contribute absolutely amazing things, and we just don't get to get get to reviewing it or we get to fixing it. One thing that I will say, if you're looking to fix something in in an oversource project, um the thing that we check first before we check anything else is that the pull request. So if you have an issue, it's faster for you to, you know, if you can solve it, to to solve it and send a pull request for it, then that will get accepted long before we actually go into the issues and see, okay, what what's something?
00:42:44
Speaker
that we should fix. You deal with the solutions first and the problem second. Exactly. Yeah. It's a path of least resistance in life, I think. that Makes sense. You know, whenever I want to contribute to an open source project, I usually start with fixing the documentation. Oh, yeah, that's good. It's a good way to get myself known to the developers and to see how they're going to respond to people coming in with changes. Yeah. Plus, I'm great at nitpicking on grammar, so that always happens. You've got to play to your strengths, really. Absolutely, yeah. And no one wants to fix documentation, so that's usually a well thought thing. No, no, no. And we we also have, so the the creator of our web, he's Chinese, is his name is Arjan. And um so we also have all the documentation in Chinese.
00:43:41
Speaker
But you know, I'm not great at your Chinese isn't as sharp as it used to be. Yeah, exactly. that Yeah. Did retain most of it from primary school. So um so a lot of our and my translation goes through like, chat GPT or Google Translate and then goes to one of the other members who's like, you know, sometimes it's good enough. And sometimes it's absolutely horrendous. And then they'll tweak it. But it's like, you know, you you come with the solutions first and then with the problems. Because ah if I, if I just come with like, a okay, well, his, his to-do list for you to solve, then, uh, then it's just going to add to the workload and I'm going to be happy. so Yeah. Yeah. I think when all the dust settles, like the, one of the great use cases for artificial intelligence would just be the first draft.
00:44:31
Speaker
really the polished thing, but the first draft is really helpful. right yeah yeah yeah for sure and So let me see if I can think of interesting ways to break your video system. So I'm recording um So I go to my company's website on my whatever I want to record and I record a session and I edit out all the mistakes. And then the classic problem I found with recording websites is you post that on your website as a kind of demo and then three weeks later you turn decide to update the design and now all your videos are out of sync.
00:45:07
Speaker
So can I replay, can I resupply the new CSS over the old DOM event recording with any hope of it actually looking good? Well, you could, in theory, not in practice at the moment. OK. I love the honesty. Give it to me. Yes. What we can do very easily is because, so I'll give a tiny little history lesson. Anyway, since 18-something, a video has been just frames and either
00:45:48
Speaker
pieces of image or or pixels um and because we are really looking at the ah the events and the whole thing is Jason we've kind of had to rethink what are the basic building blocks of the story of the narrative of of a video tutorial. And so what we've come up with are basically these kind of like user actions. So if you click on something, um you're going to have to explain to your customers or your users that they're going to have to click on that as well. um And if you fill in the form fields, if you type something, you're probably going to have to explain that as well. So these are all these kind of like user actions.
00:46:28
Speaker
and um We break up the whole story into these these different actions so but that also means that if certain actions don't relevant anymore or for example you have a form and there's like a new version of gdpr which you know you need an extra checkbox or something like that then um you could very easily say okay well. this this bit here not relevant anymore let me just delete that and then i'll re-record that little block and then the voiceover that you had attached to it you can just add to it very very easily so that makes the whole maintenance part of it a lot
00:47:04
Speaker
Easier whereas before video feels more like um kind of like something that's set in stone and it's not valuable, it's not clay, it's it's it's like baked. um and with us so With record once, that's really not the case. You can really um yeah really tweak it, change it, update it, change the voiceover after the fact. you know people People think it's confusing, let me just change it again. and keep doing that. Whenever you hit save, every place that you've published it will get updated automatically. So that's why we try and tackle the maintenance piece. We actually started with making a system that will auto-update the videos, and we found that
00:47:52
Speaker
we So we have all the code for that, and that's possible, except there's a bunch of setup that you need to do. You need to get your application in a certain state. And we found that recording takes much less time than than actually setting up the state. So to force this kind of on people that they need to have this this really rigorous setup and this really rigorous teardown piece would basically make it so that so that it becomes a lot harder to record. so we've kind of We've gotten out of people's ways and in that regard. and and And yeah, that's how we've kind of tried to make it as simple as possible. Okay. So what are the what are the limits? what are What's the point at which I can't make a simple tweak and I have to rerecord the session?
00:48:40
Speaker
and Yeah, so major UI changes, for example, like ah yeah like you said, to you go from color blue to color red or something like that, you know change your logo, then then you'd have to re-record everything. um But you can still keep the voiceovers, which saves a lot of time. ah We have one customer who has been recording 90 videos over the last 12 years, um not with our system. And they were like, okay, well, let's create a five year plan to rerecord everything because I went to a rebranding.
00:49:17
Speaker
right They use our software and then they found out that they can they can basically do a video, or re-record a video from scratch in 15 minutes. So that's what they've been consistently being able to do. um And then if it's only 15 minutes, then it becomes less of a problem to to have to re-record it, basically. Okay. yeah Yeah, I can buy that. Yeah. Because after one or two re-recording of a video, your heart sinks every time you think about changing the shape of a button. Yeah, exactly. yeah Yeah, I've been there. Yeah. um Okay, so I'm trying to think of other ways, other interesting ways to break this. You've kind of hinted at this, but are there some frameworks which are better supported? Like if I write my website in React, am I going to have a better time than if I write it in jQuery?
00:50:06
Speaker
a React has been doing some some evil stuff recently with monkey patching certain things. so oh um maybe the i'm To be honest, I forget what exactly it was that they that they monkey patched. It kind of exploded. Oh, he didn't hear that one. I making music so they think they reversed it. but ah If you're using some stuff that's really, really cutting edge, only supported by the one browser or something like that, then there's a good chance we won't ah denise we won't have created any support for it yet. um If you have a really kind of like old-school, non-single-page web app ah type of application, then there's a good chance we're going to record absolutely everything perfectly and that you won't ever bump into an issue. but it's kind of
00:51:05
Speaker
it's it's It's kind of like that. And then at the moment, yeah if you're using Salesforce, um my pull request isn't merged. so So when that gets merged, then it'll be fine. But at the moment, it's it's not. But Angular seems to do a lot of these a lot of this monkey patching as well. um But we have a lot of fixes for that, and then the Salesforce pull requests. and that i've I've got out there, so is that will fix it for once and for all, basically. Yeah, I hope you're right, but it feels like you're going to be yeah you're going to be maintaining small differences and problems introduced by people for years. Oh, yeah. yeah
00:51:48
Speaker
That's part of the job, unfortunately. Yeah, we recently had this thing where ah certain situations were and with with certain CSS strings would create a and would create an infinite loop on playback. um And then we solved that. um And then that broke certain of the situation. So it's like it's it was this thing where it kind of went around in circles. And we had three different releases in short succession that all were really painful for a certain people, certain specific cases. And that was also one of those moments where we're like, okay, well,
00:52:36
Speaker
we should b ah we shouldn't be maintaining our own CSS parts. That doesn't make any sense to us. um yeah And in this case, we we went to PostCSS and then that's kind of like solved for once and forever. um And then we just need to keep PostCSS in that case. But there's certain things like hover ah where we have to go through style sheets and we have to add a new hover class to it because a pseudo class, of course, we can't trigger hover on on playback.
00:53:09
Speaker
Not normally, because we can't commandeer your mouse. um So we have to fake it, basically, by adding these these hover classes to everything anything that was hovered upon. Does this introduce like a runtime burden on the site being recorded, then? a noticeable one not for the Not for the hover piece, but for but in general, yes. most <unk> it's another to suppose Sometimes there's certain websites that add a lot of different, a lot of nodes, like hundreds of thousands of of DOM nodes at the same time. And then if you have very advanced funds
00:53:56
Speaker
filtering and and masking and blocking settings, settings then it will have to check every single one of those and then that can kind of that and slow it down. Yeah, yeah, I can see that. So it's still, I guess it's always going to be use case specific. How well something like this is going to work. Let me ask you, are you at all tempted? Is it in the roadmap? Would it be a problem for you to think, well, what if we made this file format and, and like, if we went to the browser makers and said, let's have a new standard about how mutation observer works.
00:54:37
Speaker
and all the browsers supported a specific spec, would that be good for you?
00:54:45
Speaker
One thing that would that would really be helpful for us is apparently there's not the whole style sheet piece. Browser vendors for good chunks of it, ah they don't have a spec for it. So they're kind of just doing something. ah So that there's a good example is like, um if you do ah margin and in CSS you do margin and then ah you say, okay, margin should be this variable name.
00:55:20
Speaker
um which you can do nowadays. ah Yay, that's that's great. But then you ask um ask the browser, like okay, what which CSS style styles are applied? It will say, okay, well margin left is this CSS value, margin right is this so CSS value, margin um top and bottom, the CSS value. But that's not what you told the browser, you told the browser margin is is this the CSS value and it kind of just extrapolates it in a correct way. As it unpacks the shorthand versions of the CSS name. Yes. Which you then can't recapture. Exactly. Yeah. So we kind of have to try and figure out what happened and then kind of re rebuild it. And I ended up creating some some
00:56:10
Speaker
Some tickets in the in the kind of bug tracking systems for like crony and stuff like that And then I got yelled at a bunch where it's like no the browsers. It's not browser bug It's there's no there's no spec spec for this. So So I was like, oh okay. Still broken though.
00:56:32
Speaker
So yeah, so that sort of thing. There's there's certain cases where we really, um for example, I'm on MDM ah documentation site, like like a lot of developers are, and and I'll end up into these pages where the links are read basically means that there's no documentation for that for that piece. So um that' that those are the issues that we're really bumping into. it's like this There's no documentation for it. it's so There's no spec for it. browsers are just kind of doing something. And and and they're doing it incorrectly often. And that's that's really that's really yeah really painful.
00:57:12
Speaker
um The rest is like we can it's not ideal, but we can handle that, basically. Yeah, okay. So, I mean, I feel like compared to 2000 and something, like browsers are much more similar and much more stable than they ever have been. yeah Maybe I just don't hear deeply enough into the abyss. Yeah, there's, Safari has been, I know everybody likes to complain about Safari and and it's my name browser and I do really love it, but it's some certain things. It's like,
00:57:50
Speaker
i yeah didn't support certain things with regards to web workers and stuff like that. There are definitely certain issues where browsers just do it completely different and then we kind of have to figure out what we need to do. You guys are just fighting the good fight as your future. Yeah. Do you have anything larger than that to look forward to in the future? is that What are your plans? Well, um it's cut it's it's it's it's nice because the the project is like it's growing. We can see more and more companies kind of like um kind of enabled because of the open source project. So that and that and that's really that's really nice to see basically.
00:58:46
Speaker
and to Yeah, and then a growing our own company, that's that's of course ah also something that I'm looking forward to. I've been working on it for four years now, so it's it's nice to yeah yeah make that bigger and stuff like that. And there's certainly some some bigger changes in with regards to privacy that I would really like to see. and And it will happen someday, but not today, unfortunately. Yeah. Do you want to give me one example? Oh, just better standards, better defaults, basically. Right. Yeah. So I would like us to kind of identify this personal information somehow, like email address, sort of very clear pattern. You know, if it's an email address, I just don't want us to record it, for example. Right. Yeah.
00:59:41
Speaker
maybe the same with IP addresses and stuff like that. So just identifying that that sort of personal identifiable information, that'd be great. And we've done some big changes on the replay side to create a whole virtual DOM and then do some virtual DOM diffing so that the browser doesn't have to. Interesting. so Yeah. And um so one thing that I would like to do is is run that on the server and and um they basically be able to recombine or basically flatten some of the ah full snapshots with the diffs. And then say, OK, well, actually, I only really care from this is our recording. I only really care from here to here. So instead of having to have that full snapshot with all the stuff in front, I can ah just flatten it. And I'll recreate that. And that's something that I need to do.
01:00:36
Speaker
to to building for the project as well. That's that what that's going to save people a lot of storage space and stuff like that. so yeah Yeah, I can see that. And it's a nice little data processing project. Yeah, for sure. yeah Cool. Yeah, definitely. Well, I'd best leave you to cycle off and get back to that. Yeah, thank you. For now, Justin, thanks very much for joining me. Yeah, my pleasure. Thank you for having me, Chris. Cheers. Thank you, Justin. I'll put links to RRWeb and RecordOnce in the show notes as usual. And if you want to check it out, one of the RecordOnce demos that jumped out at me was a game of Tetris. You can play Tetris and then rewatch the session from the events that you've just spawned by playing it. It's a really nice demonstration of quite how far and how complex it can get.
01:01:27
Speaker
But before you go and play Tetris, if you've liked this episode, please click like, rate it, review it, maybe share it with a friend. It always helps, the feedback is always appreciated, and if you share it, the feed forward is always appreciated too. If this is your first time here, you may not know that Developer Voices is weekly, so make sure you're subscribed and then YouTube or Spotify or wherever you're hearing this will let you know as soon as we're back. If you're a regular listener and you're already subscribed, I will just let you know that there's a Developer Voices Patreon account if you want to support a long and healthy future stream of episodes. And thanks very much to everyone who's already signed up. There is now some prospect of this becoming a real sensible and sustainable job, which is both unbelievable to me and to my wife, and yet something I'm quite ready to believe in.
01:02:20
Speaker
With that dream of the future in mind, it's time for me to go. I've been your host, Chris Jenkins. This has been Developer Voices with Justin Horsall. Thanks for listening.