Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Jupyter’s Architecture Unpacked (with Afshin Darian & Sylvain Corlay) image

Jupyter’s Architecture Unpacked (with Afshin Darian & Sylvain Corlay)

Developer Voices
Avatar
2.8k Plays8 days ago

Jupyter’s become an incredibly popular programming and data science tool, but how does it actually work? How have they built an interactive language execution engine? And if we understand the architecture, what else could it be used for?

Joining me to look inside the Jupyter toolbox are Afshin Darian and Sylvain Corlay, two of Jupyters long-standing contributors and project-steerers. They’ve going to take us on a journey that starts with today’s userbase, goes through the execution protocol and ends with a look at what Jupyter will be in the future - an ambitious framework for interactive, collaborative applications and more.

Support Developer Voices on Patreon: https://patreon.com/DeveloperVoices

Support Developer Voices on YouTube: https://www.youtube.com/@developervoices/join

Jupyter Homepage: https://jupyter.org/

Jupyter Xeus: https://github.com/jupyter-xeus/xeus

Jupyter AI: https://github.com/jupyterlab/jupyter-ai

Jupyter CAD: https://github.com/jupytercad/JupyterCAD

Jupyter GIS: https://github.com/geojupyter/jupytergis/

Jupyter GIS Announcement: https://blog.jupyter.org/real-time-collaboration-and-collaborative-editing-for-gis-workflows-with-jupyter-and-qgis-d25dbe2832a6

QGIS: https://qgis.org/

ZeroMQ: https://zeromq.org/

Sylvain on LinkedIn: https://www.linkedin.com/in/sylvaincorlay

Darian on LinkedIn: https://www.linkedin.com/in/afshindarian

Kris on Bluesky: https://bsky.app/profile/krisajenkins.bsky.social

Kris on Mastodon: http://mastodon.social/@krisajenkins

Kris on LinkedIn: https://www.linkedin.com/in/krisjenkins/

Recommended
Transcript

Introduction to Jupyter Notebooks

00:00:00
Speaker
I have to start this week with an omission. I first saw a Jupyter Notebooks several years ago, and I sized it up and I put it in two smaller boxes in my mind.
00:00:10
Speaker
I looked at it and I thought, this is a very nice tool if you need to learn Python or do data science. And I didn't actually need to do either of those things, so I didn't explore beyond that.
00:00:22
Speaker
And I've realized now that was a mistake. I was vastly underestimating it. or if I'm feeling generous to myself, maybe it grew vastly while I wasn't looking. Because whilst it is a nice tool for learning many languages, and it's a great tool for exploring and understanding data, and it's probably the world's most popular literate coding tool, there's still a lot more besides.
00:00:46
Speaker
I know now, as you will know by the end of this episode, that Jupyter is much more like a generalised language execution engine, coupled with an interactive collaborative user interface framework.

Meet the Guests: Sylvain Corley and Afshin Darian

00:01:02
Speaker
And it's being used for things like data exploration and data sharing, but also some really unexpected stuff like collaborative maps or real-time multi-user CAD applications.
00:01:14
Speaker
It's going to take a whole episode to fully unpack the scope of Jupyter. So joining me are two of Jupyter's contributors. Sylvain Corley, he's part of the Jupyter Steering Committee, and he's been particularly focused on building out Jupyter's back-end infrastructure.
00:01:30
Speaker
And Afshind Darian, a co-creator of JupyterLab, which is the front-end IDE framework. Between the two of them, we're going to go through the whole stack and how it's built, from Jupyter's protocols to its possibilities.
00:01:44
Speaker
And we'll get a better understanding of how you can use it, how you could extend it, how you could add a whole new language or build entire applications on top of it. There's a lot to get through and a long way to go, so let's head to Jupyter.

Jupyter's Evolution and Multi-language Support

00:01:57
Speaker
I'm your host, Chris Jenkins. This is Developer Voices, and today's voices are Afshin Darian and Sylvain Corlea.
00:02:17
Speaker
Darian and Sylvan, how are you doing? How are things over there? I'm all right. thanks for Thanks for inviting us on your show. And yeah, things are good in cloudy London. You're in London and we've got Sylvain coming in from France, right?
00:02:32
Speaker
That's right. Thanks for having me here. Pleasure. Absolute pleasure. Let's see. So we are diving into the world of Jupyter Notebooks. I think we've got to start this with my own ignorance, right?
00:02:43
Speaker
Because I get the sense that I haven't appreciated what Jupyter is properly. I think of Jupyter Notebooks as that thing that people learn Python with.
00:02:54
Speaker
or that thing that data scientists use to poke around data sets, which I think is true, but possibly a narrow picture of the world. Darian, why don't pick on you to give me a wider view?
00:03:08
Speaker
yeah i think that's a great place to start. um One clue as to what Jupiter's doing is actually the name. Jupiter came from the IPython project, and at some point,
00:03:23
Speaker
the fact that they were working on language agnostic things made them want to not have Python in the name. And they went for the three biggest languages they were targeting at the time, Julia, Python, and R. So Jew, Pyth, R, right?
00:03:41
Speaker
Oh, is that where it comes from? Oh, cool. That is where it comes from. It has turned out that Python is the biggest community for sure. ah But if you think about Jupiter as the front end to all of the analytical tools that are out there in this ecosystem, and a way of sharing exploratory analysis, whether that is sending somebody a notebook file using Jupyter analytical tools to generate a dashboard with low code or ah simply firing up a console that lets you immediately get a REPL interface but with a much sort of richer output.
00:04:28
Speaker
All of these things are ways people interact with Jupyter.

Diverse User Base of Jupyter

00:04:31
Speaker
So broadly, if you think about that idea you started with, notebooks, data science, all that, but extend it to education, exploratory coding, and multiple programming languages, you're you're in the right space.
00:04:47
Speaker
Okay. Do you think this the user base is still, like want to say this sensitively, to people who maybe programming isn't their primary job, I mean, what I mean by that is I get the sense that like you use a friend and casual user friendliness is a priority more than being like an Emacs wizard, for instance.
00:05:14
Speaker
Yeah. Yeah. i think this one maybe. Yeah, Sylvain, what do you think of that? I think there is a broad range of people whose job is to write code and who are not software developers.
00:05:28
Speaker
And they write code for science. They write code because they teach, uh, computing or math and it's still but their main proric professional activity, but they are not you know advanced software developers building applications for the larger public. They are writing scientific code.
00:05:52
Speaker
And this is the main audience of the project. Yeah. So let me let me take it from that angle then, because if you're saying you' it's born out of the three target languages of Julia, Python and R, that really does seem to suggest your user base is still largely academic slash data science-y.

Jupyter's Academic Origins and Language Focus

00:06:16
Speaker
Well, that's what the name comes from. Right. But is is that is that where you've come from or is it also where you're going? It was at the time in 2014 when that name was minted.
00:06:31
Speaker
It wasn't obvious which of the commonly used analytical languages was going to have the largest mind share. Nowadays, it really does seem like Python is that language, but we've got this powerful language agnostic architecture. And so we...
00:06:50
Speaker
are very committed to keeping it that way because it's a really powerful set of tools. um What I would say is this. I have worked in environments where couple hundred people with mathematics or physics or computer science PhDs are doing the bulk of their work inside Jupiter.
00:07:12
Speaker
Now, those people might not be software engineers, and they might not have the habits of a software engineer, you know, Git and software architecture and those things, but they're very good analysts, and Jupiter lowers the overhead of them jumping straight into what they're actually good at.
00:07:34
Speaker
And they can do things that software engineers can't do and vice versa. Jupyter isn't meant to be a software engineering tool. It's a tool that allows you to explore data and come out of it with conclusions, often using code, not always even.

Jupyter for Scientific Communication

00:07:51
Speaker
i don't know, Sylvia, maybe you have a better way of putting this than I do.
00:07:57
Speaker
No, no, this is great. ah Maybe one area that we haven't identified, so besides learning and data analysis, is communications and Jupyter notebooks as a medium for communication.
00:08:12
Speaker
So, you know, you can send it by email or through large deployments like Binder. there It's been used for science communications. And one prime example of it has been ah the early publication of a binder deployment of all of the data processing and and signal processing work that was done for by the LIGO project to detect ah gravitational waves.
00:08:38
Speaker
And this was done early after the publication of the results ah in the form of Jupyter notebook. So you're actually getting academic but paper level material, pre-academic paper material?
00:08:56
Speaker
being shared as Jupyter notebooks instead of as in, don't know, latex? Well, sometimes the resources that they want to share, instead of being a static resource, a PDF of graphs, or even like CSVs and stuff, if you share Jupyter notebook that you yourself were using to arrive at your conclusions before you authored your paper, that means that even if i don't Even if I don't have the expertise you have, I can just hit go, run that analysis, see exactly what you did, tweak it, verify all those things that we want to happen in science, but we couldn't do with just static artifacts. Now that's starting to change because the artifact being the notebook means the analysis and its answers are in the same place.

Adoption in Financial Institutions

00:09:46
Speaker
Yeah, yeah, that makes sense. And you can also skip over the nightmare of writing LaTeX. Well, i mean, yeah you might actually generate a LaTeX document out out of it, right? And Jupyter natively understands snippets of it inside inside cells where you might just be putting in an equation or something because that's the right way of formatting it. So it's not not a competitor to, it augments it.
00:10:14
Speaker
Okay, okay. That makes me want to get into the architecture. But before we do, I'm going to ask on more user space question, maybe of you, Sylvan. This makes me wonder if there's another very large user base that aren't academic whose primary interface is programming but aren't software developers.
00:10:34
Speaker
I wonder if there's any aim, hope from Jupyter tackle the people that are using Excel as their programming and analysis tool, for instance, in like banks.
00:10:47
Speaker
Yeah. Yes. so Jupyter is very popular in financial institutions. Is it? For certain, yes. um And I've seen banks and funds where, you know, as many people have Jupyter open as they have, um you know, spreadsheet software open, especially as they get more and more quantitative in their focus, they tend to use Jupiter because of its connection to the Python Scientific Stack.
00:11:25
Speaker
that I suppose that surprises me, but maybe it shouldn't. Well, no, financial firms are often secretive about how they make their decisions. but I worked for years at a hedge fund.
00:11:39
Speaker
And the reason why I was there was because they wanted to support Jupiter. the reason they wanted to support Jupiter is because they had such a large user base doing the actual quantitative work and the analysis that they use to drive their buying and selling decisions.
00:12:00
Speaker
ah So the large group of people working on these things internally, their backgrounds weren't necessarily software. Their backgrounds were often mathematics or finance or physics or something highly analytical, highly mathematical, but not necessarily software.

Funding Model Evolution

00:12:15
Speaker
And I don't think I was working at a real outlier. I think a lot of financial services firms have a contingent of people who are either sitting in spreadsheets or sitting in Jupyter, depending on maybe what their academic background was or what their own proclivities are or or whatever. But yes, no, there's a huge Jupyter presence in that industry.
00:12:41
Speaker
Ah, i may i maybe I shouldn't be surprised by that. But i ah the one thing that surprises me is change in large banks. that They're changing their habits at all from the old ways. Maybe that's cynical. It doesn't even have to be large banks, right? You could have you could have focused, dedicated, quantitative hedge funds.
00:13:01
Speaker
You could have... ah divisions within a bank whose focus is this right because a lot of finance isn't necessarily analytical but the pieces of it that are those those folks are very interested in using the best tools available at the time without writing their own. And they're also very interested in recruiting people who have been trained on these tools to begin with. So if your physics department is teaching you inside Jupiter and then you end up working at a hedge fund, you might stay in Jupiter.
00:13:36
Speaker
Yeah, yeah, yeah. Fast forward 10 years from the foundation of the project and suddenly there are banks that have relatively senior people using those tools. That makes sense.
00:13:46
Speaker
Yeah. Does that... This is slightly an aside, but does that change the funding picture for a project like Jupyter? Well, the funding... The funding picture for Jupiter has changed a lot in the last... Now, keep in mind, we said Jupiter itself as a name exists since 2014. The project is about 15 years older than that. That's when IPython begins.
00:14:09
Speaker
And because it initially comes out of academia, a lot of the initial funding was academic grant-based projects. Then we've had generous support from a few corporate sponsors that were early to support us. one The biggest one um for the last decade being Bloomberg.
00:14:28
Speaker
But now in the last a year or so, Jupiter has moved to the Linux Foundation and the Linux Foundation's funding model is to create a Jupiter-specific body, the Jupiter Foundation, with um subscriptions that ah institutions can buy, and those subscriptions are the biggest part of the funding model.
00:14:53
Speaker
And so It is definitely related to who your users are, where your funding comes from. That's true. But also, you know, the project has to operate on its own and make some decisions about how do we create sustainable funding.
00:15:09
Speaker
And neither academic funding nor one-off generous gifts were things that we could count on. So now we've swapped to this subscription model, but it's our first year. We'll see how it goes.

Jupyter's Language-Agnostic Architecture

00:15:21
Speaker
Year one is going pretty great.
00:15:24
Speaker
Oh, okay. Okay. We might get it back into that a bit more, but I am, I'm curious for more to go more technical. So I, so again, it's something I heavily associate with Python, but you said it changed to, and you've said already that it's um language agnostic.
00:15:43
Speaker
yeah So is that a re-architecture from the early days? And if so, what's it look like now? Sylvan, I think it's your turn to i pick your brains. So um if you're a Jupyter Notebook user and you open...
00:15:58
Speaker
a notebook document and start execut executing code cells. When you hit shift enter to run a cell, what's really happening is that the message gets sent to the backend, to some server where the kernel, which is a part of the architecture responsible for executing your code, runs your code, and then the message is sent back to the front and forth display, right? So that round trip follows a well-defined and specified protocol that is not specific to Python.
00:16:26
Speaker
And so as soon as ah the people who were working on IPython decided to ah enable that kernel-based architecture, it became very obvious that, except in cases where we were abstraction and leaking the abstraction, ah the frontend was completely agnostic to the fact that the...
00:16:52
Speaker
the programming language existing the code was Python. So it was decided to make the architecture language agnostic. Okay, does that mean that this is why I'm starting to see services that offer notebooks over the web, right?
00:17:10
Speaker
I go to some website, I've got a thin client that is really just talking to a language kernel at the back end. Over HTTP? It goes over WebSockets most of the time. Okay.
00:17:24
Speaker
Yes. Okay. so And so once it became obvious that most of it was going to be language agnostics, there was a deliberate decision, even in the naming of the project and all of the places where we were leaking abstractions from the Python interpreter that we should not do it and progressively make ah Jupyter agnostic to the programming language.
00:17:51
Speaker
And now we have over 60 plus language kernels out there ah for a variety of programming languages that are supported by the... but Okay, I'm not going to ask you if my favorite language is in there because 60, the answer is probably yes, right? Yeah.
00:18:09
Speaker
Well, that's huge. says Okay, so that must have, the language agnostic thing must have been a success in order to get up to 60 languages.
00:18:21
Speaker
Give me an idea of how I would add number 61. who wants to take Who wants to take me through writing my own Jupyter kernel? I'll give you the abstract answer, so then we'll give you the real fast answer, right? The abstract answer is this. Okay.
00:18:36
Speaker
The protocol is implemented using the open source project ZeroMQ, which is basically magic. And so ZeroMQ means we have a language agnostic way of communicating with a process that isn't the process we're currently in.
00:18:53
Speaker
And it's fast and it's something that the server can be written in whatever language you want. The kernel can be written in whatever language you want. And as long as they're both talking over zero and Q channels, all they have to agree on is protocol.
00:19:07
Speaker
And the protocol is a messaging protocol that's very well defined. So if your language has bindings to talk to 0MQ, or even without bindings, if it has some way of communicating, which eventually ends up in 0MQ and it follows this protocol, your language can exist as a kernel.
00:19:26
Speaker
But I think the much lower hanging fruit is the Zeus project, Zeus spelled with an X, and Sylvan can tell you all about that. yeah Before we go on to that, I'm just going to quickly check I understand. So when I go into Jupyter notebook and I send my block of code with shift enter, that goes over WebSockets to Jupyter backend, which then sticks that message onto 0MQ.
00:19:51
Speaker
And it's the language executor that will then stick a return message on another 0MQ channel and eventually back over the WebSocket. yeah that's correct And that's why I can have multiple languages in the same notebook.
00:20:03
Speaker
Yeah,

Creating New Jupyter Kernels

00:20:04
Speaker
yes. so You can have multiple languages supported by your environment, but you open up JupyterLab, for example, and it'll say, do you want to create a notebook with a Python kernel? Do you want to create a notebook with a JavaScript kernel? Do you want to create a notebook with a C++? plus plus But typically, at least in the default way of using Jupyter, people don't have notebooks with multiple languages in them, although there are projects that have made that happen.
00:20:29
Speaker
Typically, you usually have like a JavaScript thing that you're doing and maybe a Python thing that you're doing because of the APIs you're familiar with or whatever. But those kernels, yes, they're all being they're all following this same lifecycle mediated by some server that launched them, but then after that only talking through these 0.1 Qchams.
00:20:51
Speaker
Okay, okay. I've got the architecture in my head then. Sorry, Sylvain, I interrupted you. You were going to go deeper into Zeus, right? writing Yeah, so exactly. ah So it's another way to miss Perl Jupiter.
00:21:04
Speaker
hi The...
00:21:08
Speaker
So if you want to get into the business of writing a kernel or building a new kernel for a programming language, even though there is a world-specified protocol, and ah even though you may have access to zero-mq bindings for that language because it's a very popular library, there is still a lot work to do.
00:21:26
Speaker
You need to implement a number of message types. You need to do some work around message signing. ah You need to implement certain... requirements in terms of concurrency and how you process the messages. For example, some messages have to be able to be processed as code is running. So therere I would say that there is a bit of scaffolding to be done, right?
00:21:53
Speaker
And some people have done it. There are, you know, language kernels that people wrote for OCaml or that are written in and OCaml respectively.
00:22:04
Speaker
um But if you want to start from scratch, it's still a lot of work. And for this reason, we've written a library called Zeus, which is a C++ implementation of the protocol that we decided to make a so that it would be a reference implementation.
00:22:24
Speaker
And it's not a kernel. It's a library that helps you make kernels. And once you have Zeus in place, you can simply implement the bits that are language-specific by overriding a few methods.
00:22:37
Speaker
How does that work for a language? and traing How would that work for something like Java? Because you're giving me the impression that I'm going to write some C++, but I don't really...

Kernel Protocol and Debugging

00:22:50
Speaker
Yeah.
00:22:50
Speaker
So most typically for... There is a Zeus kernel for Python, for example. There is another one for R, and in most of the cases, you would... ah embed the R interpreter or the Python interpreter in your process and use and use it as as ah use name as a library. Okay, because you can easily call Python from within a C++ program.
00:23:13
Speaker
That's right. for In the case of Java, people have written other Zeus-like frameworks that ah communicate over zero in queue and ah more and better integrated into the JVM.
00:23:28
Speaker
Okay. so So are you saying I would choose Zeus instead of sending zero MQ messages back and forth? So for languages of the JVM, yeah I would probably not use Zeus because the story for binding JVM yeah languages with native libraries is always a headache, even today. right yeah ah But for any language that has a native interpreter,
00:23:55
Speaker
I think using Zeus is probably the simplest so solution. Okay, yeah, yeah, okay, that makes sense. Just give me an idea, because I'm trying to get i'm trying to get this architecture in my head where without a whiteboard, which is always the fun of this podcast.
00:24:09
Speaker
Give me an idea of some of the messages, because I can see there's going to be a message that says, please run this function. but What are the other ones? um there is there are kernels There are kernel messages that we call control messages, which relate to the lifecycle of the kernel.
00:24:27
Speaker
ah Please shut it down. Or please restart the kernel. and over the control channel, we also have debug messages. so that we can attach debugger and ah you know set a breakpoint and do it while the code is running. So the control channel has to always run and be processed messages, even though code may be running in the main channel where we send ah execution requests.
00:24:57
Speaker
So the kernel has to be multithreaded? The kernel has to be multithreaded or have some kind of concurrency model that allows it to receive a stop message or an interrupt message while it's running code.

Kernel Protocol vs. LSP

00:25:08
Speaker
Right, yeah, yeah, yeah, okay. And um on top of it, there there are a number of messages around getting a meta information about the code, such as, ah is the code that I am typing already complete?
00:25:26
Speaker
For example, ah if you are typing in a short-like or console-like UI, Usually you would hit enter ah in the middle of a for loop and it would infer that you're not finished typing. that So it would ask you for another input to complete the one that you started. So there are messages around ah imp implementing a REPL.
00:25:50
Speaker
And finally, there are messages around getting um information on classes, or such as, um we call it the inspect request. So getting, you know, doc strings and anything that may be included in the in an inspect reply, basically, which depends on the language.
00:26:18
Speaker
Okay. There's some obvious overlap there with something like LSP, but with execution as well. Is it like a basically a superset of LSP-ish? LSP can work on your... LSP can give you suggestions and help almost treating like...
00:26:39
Speaker
treating your code like it's text basically and in a way where it isn't executing things but the kernel does have the ability to execute things so lsp is an add-on that you

LLMs and Real-Time Collaboration

00:26:57
Speaker
could have. We have LSP extensions that you can add to Jupyter, where you get more context about the code you're writing and more help filling it in and that kind of thing that is... Because LSP is also a well-defined protocol, right?
00:27:12
Speaker
But then some pieces of it have to come from the kernel. So suppose you install the LSP extension and you're in a Python kernel and you type...
00:27:23
Speaker
ah pri and you hit tab right the kernel can give you some choices to put in there probably top choice is going to be print lsp might also give you some choices and so we can put both of those things inside the little drop down you get for autocomplete um they complement each other lsp by itself is totally helpful, but not going to give you all of the things that the kernel can, but it can give you some things that the kernel cannot. So it's sort of adding both, making both available to you gives you sort of the fullest, you know, most interactive experience.
00:28:06
Speaker
ah Similarly, the inspect message that Selva was talking about, we have this um little window you can open up, contextual help it's called, where you could have just little, the side window sitting there. And as you type,
00:28:19
Speaker
Say you call like OS dot environment or something like that. The doc string for that will just show up next to you because that's the inspect reply. That's like that's a kernel telling you, oh, I know what this thing is. And it has a doc string and you could render that somewhere if you wish or you could ignore it if you wish.
00:28:37
Speaker
um So, yeah, these two things are complementary. They're closely they're they're closely ah related. And it's it's it's right that you think of that, but they're not exactly the same thing.
00:28:50
Speaker
Right, yeah. So one is static analysis, LSP, and the kernel protocol is runtime information. So I have to harp on this a bit because I'm wondering about the architecture.
00:29:03
Speaker
If... If you're running, if you're getting in the front end, both the compiler information from the kernel and from an LSP server, does that mean ah the back end you've set up two zero MQ channels, one that talks to an LSP server and one that talks to an interpreter?
00:29:22
Speaker
So LSP servers are not bound to kernels. So in an interpreter or like a language kernel is really about execution and what you can get from the runtime and what's going on.
00:29:38
Speaker
And ah we've really made it so that it doesn't even know what... ah whether the currently running code is a notebook or console, like it's completely agnostic to the front end.
00:29:50
Speaker
It's really runtime information. While LSP is about your workspace, all of the files that are in the current project statically analyzed. So, and so LSP servers and kernels lifetimes are orthogonal.
00:30:08
Speaker
You can have an LSP server running in a project and without having started a kernel or all of the opposite. ah So are you saying that from my front-end notebook, when it does get LSP-based information, it's not using that WebSocket to 0MQ channel? it's It's completely independent?
00:30:27
Speaker
yeah Yeah. Okay. So you could have a notebook open, for example, and you... you go to the kernel menu and tell it to stop the kernel. And so it's no longer going to be able to get any of the meta information the kernel is giving it because you're literally instructing it to stop.
00:30:44
Speaker
But you'd still have help from at the LSP service if you've installed that because the LSP service, as the as was purely doing static analysis on your code, is treating your code like a static artifact that it can give you info on.
00:30:59
Speaker
And it's quite different than the help that the kernel can give you. Okay. The reason I hop onto this is there's one question I have to ask you, which feels like a very 2025 question.
00:31:11
Speaker
The other thing you want, code execution, you want LSP these days. You also often now want an LLM looking at your code blocks and suggesting something.
00:31:22
Speaker
And is there a place in the architecture for that kind of messaging? Yeah, absolutely. And that, again, is not the kernel. The kernel is a very focused thing. The kernel is about executing your code and being the interpreter and being the runtime environment.
00:31:36
Speaker
There's a project, Jupyter AI, and it does basically what you're describing. And the way it does is that LLMs are standalone services. They also want you to give them a bunch of text, and they're going to give you back a bunch of text.
00:31:52
Speaker
And that can happen. Yeah. The same way it happens with LSP, right? You can have multiple inputs to this document that you're working on. One of them is the suggestions coming back from the kernel. One of them is suggestions coming back from LSP. And one of them is suggestions coming back to some API ah you're hitting whether it's an LLM on the cloud or it's one you're running on your machine.

Security Concerns and Isolation Strategies

00:32:17
Speaker
That's also just treated like a separate set of endpoints that you hit. And this web application ties all three of the all three of the inputs together. Well, actually, I say three, but there's more, right? You can have real-time collaboration, so there might be five other users looking at it with you. And then there's you, the actual author sitting there as well.
00:32:35
Speaker
And all of these sources are... are integrated into this one UI and you're seeing you're seeing perhaps a suggestion coming from a human user.
00:32:46
Speaker
You're potentially making a request and getting a streaming response from an LLM. You potentially have this inspection window open so you're seeing the dock string of where your cursor is currently sitting.
00:32:56
Speaker
you know All of that is happening and they're not they're not all resident in the kernel. The kernel is a very specific thing and these are sort of add-ons that can work with it.
00:33:07
Speaker
i Yeah, and that makes sense to me. It surprises me. It sounded like you had this architecture for a thin client to send lots of information to the back end and it would worry about executing it for you.
00:33:22
Speaker
And now it sounds like there are three or four different protocols that the front end needs to be aware of. And that architecture surprises me, Silvan. Yeah, so there are three protocols. ah main thing. So there is the LSP protocol that is also language diagnostic that is really meant to address the analysis of the entire workspace that you're working on, right?
00:33:45
Speaker
ah And provide warnings in line in your text files, and notebooks, and also some autocomplete suggestions. If you have a kernel running, you may have more autocompletion suggestions coming from the kernel because ah it has runtime information and we you know we use both in the ui And so the UI is aware of that, the front end is aware of LSP and kernel protocols.
00:34:13
Speaker
Now, when it comes to LLMs, so Jupyter AI currently is more bound to, it's more similar to LSP in that it uses the prompts from the user and the content of the documents that you're using.
00:34:30
Speaker
ah But there are some proposals out there to make Jupyter AI, ah make requests to the kernel to get information about, runtime information about your variables and whatnot, and so that it could improve ah the suggestions it makes.
00:34:51
Speaker
ah So there is pre-proposal at the moment, ah to expand the kernel protocol with a ah new channel for machines. talk to it
00:35:11
Speaker
Right. where Would that that were would would it sit as almost like a client to the kernel or is it a back-end to the kernel? There would be a new channel to communicate with the kernel and, uh, and the clients to this channel would be, uh, the various AI yeah extensions to Jupiter.
00:35:35
Speaker
Would that be, i mean, maybe this is a slight semantic difference, but would it be communicating, are you treating it like a front end? Is it communicating over web sockets? Are you treating it like a back end and it's communicating over zero MQ?
00:35:47
Speaker
Well, it can be both, right? It can be but zero MQ... channels that are created. Not every single one of them is exposed a website, but they could be, depending on depending on what kind of client you're building.
00:36:03
Speaker
The kernel doesn't care. The kernel's just listening to 0MQ in a vacuum. It doesn't know what kind of clients are talking to it, whether the machines, humans, browsers, doesn't matter, right? So if you create this thing, it's up to you to decide that my web server that manages the lifecycle of instantiating a kernel, then connecting to it, and then giving the user a web socket to this specific kernel and all that, that server can make the decision that among the channels I'm exposing to the front end is this new LLM AI one, so that the front end could have a web socket, it talks over some other API over the cloud, gets something, then puts it into this, and it goes back to the kernel.
00:36:48
Speaker
Or it could be that your connection to the LLM is happening server side. You have some library with some API key. It gets invoked by ah message coming in here. It does something and then it talks over zero and cue to the kernel and the kernel state is modified.
00:37:07
Speaker
Both are possible. The architecture doesn't doesn't require one or the other. Okay. Okay. I think I get sense of that. So I would like to dive back into implementing ah the execution side of the kernel. Because the thing that always worries me about services like that is soon as you're executing code for people over ah network, you've got to worry about the security of arbitrary code execution. yeah know does Does the protocol, does Zeus do, what's the mitigation for that? Or is it entirely on the person authoring the interpreter?
00:37:45
Speaker
Sylvan, maybe you. So the core feature of Jupyter is arbitrary code execution as a service. Great. All right. So what could possibly go wrong? ah So we should assume that ah any code can be run and there is no limiting to that, right? Even if if you are exposing a Python interpreter and you...
00:38:12
Speaker
want to um limit the usage of some packages such as os to prevent some operations in the file system order.
00:38:23
Speaker
ah You can't do that because it's just as simple for anyone to even write an assembler in Python and send code for execution for you know so

Streaming and Concurrent Execution Challenges

00:38:35
Speaker
for the machine. So you you you cannot assume that there is any limiting um to the code that we would be run once we are allow for running code.
00:38:48
Speaker
Okay, so that makes me wonder how the cloud services that are offering Jupyter Notebooks are mitigating that and how how this works with that kind of like a collaborative model where multiple people are using the same notebook.
00:39:05
Speaker
Yeah. So in the case ah so
00:39:11
Speaker
It depends who we ah are protecting from who. So the attack scenario you are diverse. um But for example, in the case of a cloud deployment of Jupyter, how does the cloud provider protect itself from its own users is one thing.
00:39:31
Speaker
well if they If there are some sharing ah features on the platform, how do we protect users from each other? um And if you start allowing for people to collaborate on the same document, ah what do we allow users to do or collaborators to do at runtime?
00:39:52
Speaker
yeah And so depending on what you want to address, it gets ah the complexity is different. Yeah, okay. Well, it sounds like if you take that scenario where multiple but multiple people are collaborating on an academic paper or a financial model,
00:40:12
Speaker
they can you can probably get away with a lot of they trust each other. a cloud service provider, and I know this happens, do you think they're just like running isolated Docker environments and locking it down that way?
00:40:25
Speaker
Or? In terms of protecting themselves from the users, they are certainly considering that what ah the code that the users run is hostile but by default. And so they are limiting what the users can do.
00:40:38
Speaker
um Yeah. um Then when you start having sharing capabilities, you, for example, users, a user creates a notebook and sends a link to another person to execute that notebook.
00:40:54
Speaker
ah that other person will not only have to trust the code of the notebook UI or the code of the cloud provider deploying the notebook, but also the code of the person who shared that content with them. Because ah presumably this could have injected content in the notebook that will be run in your browser.
00:41:16
Speaker
yeah so um So mitigation for all of these scenarios are of different nature, and would say. ah okay Maybe I don't want to dive too much into it. we are we are basically accepting you are running someone else's program on your computer.
00:41:34
Speaker
And that's both the risk and the whole point. It's the like there is the execution in the front end. There is also the display of rich content in in the, I mean, there is the execution in the backend. Then there is a display of rich content in the front end that may ah run JavaScript.
00:41:54
Speaker
And we want to isolate that JavaScript. We don't want it to access any secret that may be in your browser session. So yeah, this is another concern. Okay.
00:42:05
Speaker
But is that, if i if I were to implement a new kernel for Jupiter, is the security ah are the security concerns entirely mine, or is there support for sandboxing somehow in the Jupiter architecture?
00:42:22
Speaker
There Sorry, so...
00:42:25
Speaker
sorry so oh Yeah, there's a lot of ways... Yeah, as a kernel author, I don't think there is a... there is like As a kernel author, I don't think you need to do anything about security there. ah You don't?
00:42:37
Speaker
No. Oh, okay. as As long as you provide an execution request... and your Your job as a kernel author is to take a snippet of valid text in your programming language and execute it Now...
00:42:55
Speaker
almost every programming language gives you the tools to create lots of havoc on the machine that it's running on. But if yeah the idea is to give people a working interpreter for Lua inside a notebook, well, you got to execute faithfully Lua code.
00:43:12
Speaker
So really the process that launched the Lua interpreter, whether that's the terminal at somebody's laptop that they did, you know, they went into it themselves or whether that you know,
00:43:23
Speaker
a and interpreter that was managed by a Jupyter server saying, hey, please launch this. It's still a Lua interpreter. should do what a Lua interpreter does. The difference is, how do you expose this to the user? So for example, the cloud service that Jupyter itself hosts is called mybinder.org. That does the maximalist case.
00:43:47
Speaker
It will spin up a VM just for you And then if you destroy that VM, you're the only person who sees the bad effects of that, so it's okay. And if you share that link with others and you destroy that VM, that's still limited to your notebook and that link, right?
00:44:03
Speaker
Now, yeah yeah there are there's another way you could go about it, which is super exciting and really new, which is run the whole thing in your browser, compile your Lua interpreter,
00:44:14
Speaker
to work in webass and to WebAssembly, have the entire like remove even the zero MQ part of it because it's still just an API you can hit. Have the whole thing running in your browser, and then there is no file system for you to destroy.
00:44:28
Speaker
so there's There's different ways we can give you this capability of a document with code in it that executes when you hit Shift-Enter. and still protect you from destroying the world, but none of them involve the interpreter itself stopping you because that thing's job is to just run code that's correct in a language.
00:44:47
Speaker
Right, yeah, yeah, I totally get the picture. um There's one other question I wanted to ask you about the arc but about the architecture of this before we go more into user space, which is, and this may be obvious, I don't know, because I but but don't have enough experience in Jupyter, you can tell me, but...
00:45:06
Speaker
I get the feeling that Jupyter is very much a request response thing. I press shift enter, it runs some code, the thing comes back. And I wonder if there's support in the protocol or in the world for streaming, right? So I have, I can imagine i have one block which is tailing a log file and gradually rolls and updates while I do some other stuff in another block.
00:45:28
Speaker
Is that possible? Is it happening? Is it not possible because the architecture?

Expanding Jupyter's Front-End Project

00:45:33
Speaker
Maybe, Darian, you would failed that? There's a lot of cases here. So, for example, let's take the the most mundane thing. You showed up, you wrote a for loop, and it's going to take a bunch of time to execute, and then you refreshed your page, right?
00:45:55
Speaker
So you lost your connection, that kernel's still doing some work, and you want to eventually see that. So, All the Jupyter clients that Jupyter puts out have the ability to reconnect to the kernel that you used to have and to recreate the state that your browser had and all that.
00:46:14
Speaker
And then if there happens to be a bunch of output there, they'll render that. but If the output is coming out streaming, you know keep in mind it's coming from a WebSocket. So you said it's a request response. That's sort of true.
00:46:25
Speaker
It's a request, but the response comes back over a WebSocket, which is a stream. So it doesn't all come back at once necessarily. It can come back in a streaming way. But some of the work that people do is really long term. like It's going to take 27 hours to run this. So I want to see it tomorrow afternoon.
00:46:43
Speaker
We haven't yet solved that satisfactorily across the board. Like that's a thing where you should be able to do that. Close your laptop, walk away, come back, sign in on a different machine and see it.
00:46:56
Speaker
We're not quite there yet. That's a pretty hard problem because okay suppose the output is really big. Who's supposed to hold it for you? Like that those are problems we're actively working on, but some subset of what you described already exists.
00:47:10
Speaker
Right, right. There's one other piece to that then. Let me quickly ask, is there concurrent execution? If I've got two blocks which both take a long time to run, can I start them both at the same time? Or is the execution single-threaded?
00:47:25
Speaker
the execution single-threaded Yeah, well, it's not that the execution single thread, it is, but that's not really the reason. The reason is you told your code to run this for loop, then that for loop.
00:47:37
Speaker
It has to run it sequentially, even if it's a multi-threaded interpreter. Like you've given it basically ah script to run. So that's that that's not... Like if you instead gave it, you know, like a Dask instruction or something, like you bring in a library that's a multi-threading library, and then you do 12 concurrent things, of course it can do that.
00:47:58
Speaker
But you writing code yourself, you're writing a single thread worth of code unless you specifically write threaded code. Yeah. So you are always free to start your own threads and, you know, do something smarter in the notebook so that the first cells...
00:48:16
Speaker
returns immediately and the second thing is started, but and you know, have some reconciliation in the end. um There are some interesting projects out there ah where people have done static analysis on notebook code and detected that two cells were completely independent and could be run in parallel.
00:48:38
Speaker
ah So that you they would spawn two different processes running the two cells and like resolving the tree of dependencies of the cells so that some parts of the notebook could be run parallel and others not.
00:48:51
Speaker
However, this hasn't landed in core Jupyter. ah And so, yeah, people have been in products around these things. Yeah. yeah Yeah, that sounds like a very hard job to do even for one language, let alone for 60 backends.
00:49:07
Speaker
Yeah, this can't be done in a language agnostic way. Yeah. Okay, that okay so so really, trying to dive in and understand the architecture, but logically I should be thinking of this as just like REPL as a service, in which I expect there to be one thread of execution unless I put my own threads into it.
00:49:29
Speaker
Right. It's a REPL as a service, and there are few control threads on the side for like debugging and whatnot. Oh, yeah. So like monitoring the REPL, but there's there's one execution thread and then there are some...
00:49:42
Speaker
looking into what's happening in my REPL threads. Yeah, okay, okay, I think I get it, I think I get it. um so in that case, I want to zoom out because, and this relates to where we started, um my ignorance thinking that Jupyter is just about um data analysis in Python.
00:50:03
Speaker
You were telling me that the Jupiter front-end project is expanding into other things, like there's some kind of CAD project and things like this? Tell me about the what the wider world of Jupiter.
00:50:16
Speaker
Sylvan, yeah, you take that. Yeah, ah so the the way Jupiter has evolved over time is ah basically early on it was this IPython project, which was essentially rapport, like an improved rapport for Python, right? And eventually we moved to this kernel client architecture that became language diagnostic. And in 2013, the notebook came out and then it became very popular.
00:50:46
Speaker
um And there was a big ask from from users to make this new frontend that they loved a lot more fitterful and closer to an IDE.
00:50:58
Speaker
hence the JupyterLab project. And so, Darren here is one of the co-founders of the JupyterLab project. And um if you want to build an IDE there, there are lots of things that you need to to resolve that are not specific to IDEs.
00:51:18
Speaker
Basically, IDEs, to me, they are one example of a broader category of applications that include includes things such as CAD software, GIS software, photo edit editing software, IDEs, of course, and whatnot.
00:51:33
Speaker
If you take an example of several of these tools, tools and you blur your vision, you squeeze your eyes a little, they all look the same, right?
00:51:45
Speaker
they You have a tiled window, a lot of information on the screen. and You have, you know, theming. And usually these all of these tools, they they have a lot in common.
00:51:56
Speaker
they they The users who make use of them, they have very high expectations for the the quality because they spend many hours every day using it.
00:52:08
Speaker
um They expect it to be very configurable and customizable with themes and custom keyboard shortcuts and whatnot. They expect it to be internationalized to the local language.
00:52:21
Speaker
yeah They expect it to be extensible with plugins. ah So if you want to build a new CAD modeling tool in the web and you want to check all of these boxes and you just start with React, it's going to take a lot of time before you have theming, configuring, give-and-share records, extension system and all of that in the rich layout.
00:52:45
Speaker
Yeah. Okay. So you're attempting to become like um a rich application framework for like desktop-ish

Application Development and Collaboration Features

00:52:54
Speaker
experiences. Exactly. That's right. yeah So what we've built for JupyterLab, a lot of the effort went into building something that is not actually specific to making an IDE.
00:53:07
Speaker
Yeah, there's a bunch of primitives Jupyter offers, the application authors that help us here. JavaScript as a programming language is required to build web apps, but it doesn't come with a dependency injection and a plugin system. So we have to roll our own.
00:53:28
Speaker
VS Code has to roll their own. Everyone who has a JavaScript-based app has to create one. um Having scriptable environment where you can write code inside your application that executes in some language that you that is appropriate to the work you're doing is a generic feature, and we have that already with kernels.
00:53:50
Speaker
Having support for rich text editing and a lot of a lot of bells and whistles there like LSP, like autocomplete all that is a feature that lots of applications can use, not just IDEs.
00:54:04
Speaker
Having real-time collaboration as a thing you can add to your application is either something you have to spend tons of money on and build from the ground up Or the first time you do that, you write it in an agnostic way and you can apply it to lots of different applications.
00:54:22
Speaker
So we can create an application in a whole different space that uses... plugins that were initially authored for JupyterLab, because those APIs stay the same, that uses kernels that were initially created for the IPython terminal, and that uses LSP that was initially authored by Microsoft for VS Code, and that uses real-time collaboration that was originally authored to work just in the case of rich editable notebooks, but actually can work generically across the board.
00:54:54
Speaker
And a chat system that was built for one application, but can really be repurposed to also talk to in the LLM. These are all plug and play pieces of a very rich application toolkit.
00:55:06
Speaker
Whereas Sylvan says, if you start with React, you can do it. It's just going to take you quite some time and money. Yeah. Yeah. There are a lot of fundamental foundational pieces you're going to end up reinventing to get there.
00:55:20
Speaker
what What kind of applications do you think this is well suited for then? Because it seems like I'm not going to use it for my product website. I might use it if I was trying to compete with Google Docs. and i I don't know. Give me the idea of the space you're actually aiming for.
00:55:38
Speaker
Let's take an example that we haven't built yet. First, let's say that you want to do a web-based collaborative Photoshop competitor. OK, all right. ah You probably want that to be to have theming, configure with sheboard keyboard shortcuts, a file browser, a command palette that you can hit with a keyboard shortcut to see all of the things you can do ah by hand.
00:56:02
Speaker
You will want to have a rich layout system. There is probably 80%
00:56:10
Speaker
the foundations of JupyterLab that you will want to reuse. And if on top of it, you want your competitor to be scriptable and have a Python API and have real-time collaboration, then i don't think there is anything like JupyterLab to build it out there at the moment.
00:56:29
Speaker
Okay. Okay. So you really are... We're going after desktop-y experiences that want to be... Presumably want to stay in the browser for... Well, they don't have to.
00:56:43
Speaker
I mean, you can go to the Mac App Store today and download JupyterLab Desktop because you can take it. but I mean, VS Code is written in TypeScript, but it's a desktop application.
00:56:55
Speaker
the The real yeah benefit we're bringing isn't that we're bringing desktop application power to the browser. The thing that we're doing that's hard to do is ah language agnostic way to have any scripting language you want built into your app, a fairly painless... Keep in mind, this is a hard thing. So a fairly painless way to add real-time collaboration and a a um set of...
00:57:25
Speaker
application primitives that maybe you know you might have been getting from q or something like that before but now you can get from a web-based system instead of like a c plus plus desktop based system so a menu system a command system a keyboard shortcut system a file browser that integrates into your app a status is bar a windowing toolkit for drag and droppable tabs ah the ability to accept rich drag and droppable things all that stuff is actually kind of just yeah entry stakes for building an application that doesn't that doesn't um really differentiate you from your competitors but the less time you spend on that the more time you spend on that image manipulation ui that's awesome that you yourself can build and no one else can do but the other stuff everyone else can do as well and it will be cool if you get all of like photoshop
00:58:23
Speaker
with real-time collaboration is probably a very hard thing to build, but it's easier you just build the Photoshop piece of it and not everything else.

Collaborative CAD and GIS Applications

00:58:33
Speaker
Yeah, yeah, okay, I'll buy that.
00:58:35
Speaker
So then i would imagine something like that is going to be very popular at first for, like, intranety applications, internal company applications that want something very specific. I don't know.
00:58:48
Speaker
But... I would think an experience like that lives and dies on what it's like to actually build things with it. So am I, give me an example, give me an idea of what this feels like from the developer point of view. Am I writing JavaScript?
00:59:03
Speaker
and but Is it like a JavaScript framework that I'm doing or what, what's my DX? Silver. Yeah, you do it.
00:59:14
Speaker
No, go ahead, Darren. Go ahead, I think. Well, okay, so there's two sides of this. I can tell you a lot more about the the front-end architecture. it is It is likely going to be that you'll choose TypeScript because you'll get a lot of API support and a lot of compile-time support, and it'll help prevent runtime errors that you might get if you were using JavaScript.
00:59:37
Speaker
But you could use JavaScript. That's... What you get out of the box is a system that gives you um the layout and the commands and the keyboard shortcuts and a plug-in system.
00:59:54
Speaker
And you can look at the plug-ins and decide how many of these plug-ins that are out in the wild are actually appropriate to my app, how many aren't. And you can start implementing the APIs that some of those plugins that already exist out there support. So, for example, if there's a plugin already out there that gives you a, um there is a plugin that gives you a status bar. right? Well, what does that thing want? That thing just wants an application shell to put itself into.
01:00:25
Speaker
So if you implement a minimal application shell, you can take the status bar plugin of JupyterLab and use it off the shelf in this thing. So you you have to learn an API that isn't like a globally famous one, which is the Jupyter API for building frontend apps.
01:00:43
Speaker
But what you get for learning this somewhat niche API is these powerful primitives that you otherwise wouldn't. Now, none of this describes what your backend looks like. Your backend might not exist. You might be so literally just giving people a static HTML, JavaScript, CSS app with these things.
01:00:59
Speaker
or your backend might be very complex. It really depends on what you're trying to build here. But at least for what you see on your screen, you're starting with a TypeScript set tools.
01:01:11
Speaker
And so the first one we built is Jupyter CAD. And Jupyter CAD is meant to be a web-based, free CAD-like application with real-time collaboration enabled.
01:01:22
Speaker
And a lot of JupyterLab is... reused, like the command palette, the file browser, status bar, top bar, theming. It's really only a JupyterLab extension, a plugin for JupyterLab that knows how to how to handle CAD files.
01:01:43
Speaker
and with collaborative editing enabled. The way we enable collaborative editing is by specializing the base classes and interfaces that define the shared documents in JupyterLab.
01:01:57
Speaker
I want to ask about that then because it seems to me that the real-time collaboration thing is particularly hard. um It's definitely something you wouldn't want to implement yourself.
01:02:08
Speaker
But it's also something where it's sort of-ish a solved problem for text. Like you can get a library that will just let you write rich text documents.
01:02:20
Speaker
Much harder to do live real-time collaboration on the definition of a CAD file. That's right. So ah first, before we speak about the difficulties of doing it, I think it's worth talking about why it's important to do it. Okay. And so my take on this is that collaborative editing for word processing has collectively made us a lot more productive in our work.
01:02:46
Speaker
If we were to collaborate together on a document, we wouldn't be sending versions of that document with annotations and colored pieces in it back and forth until we have a final and then the final v2 version, right? Yes. I know someone that still does that and it's painful to watch.
01:03:04
Speaker
Yeah. Sometimes because of the corporate boundaries, you still have to do it. But ah but most of the time we we we have we are a lot more productive, but still work processing only really concerns you maybe up to a dozen people at most, right?
01:03:20
Speaker
But there are endeavors and engineering that require a lot more people. If you are building a football stadium, It's a collaborative thing that you are doing, which involves lots of expertises and companies, and they need to collaborate on some object that usually has some kind of digital twin in the web somewhere on the server. And then people would use heavy clients to operate on CAD files and you know electrical design and whatnot, right?
01:03:53
Speaker
And I am very convinced that using web-based API and the sort of tools and concepts that were developed for work processing, we can, as a society, also become a lot more productive for these bigger collective endeavors.

CRDTs and Non-Text Data Challenges

01:04:11
Speaker
Yeah, yeah. It's certainly easy to see how an architectural firm would want to have architects collaborating on a large building. Yeah. Yeah, and anything complex like a car or a a ah jet engine, gene these are really complex things. And usually if you have to change the specifications of the boundary of a thing you're responsible for, it has to go a few levels up in your company and then back down or to the subcontractors who are responsible for the design of that other part.
01:04:43
Speaker
yeah these get These things definitely get too large for one person to own the project file, right? yeah But what's difficult is, early from what I know, building a CRDT for text is hard enough, but you can go out there and get one.
01:05:01
Speaker
You seem to be implying that you you're going to make it easy for me to build a CRDT for 3D objects. So, I mean, CRDT, it's worth spelling out for people, config-free replicated data types, plural.
01:05:19
Speaker
So text is a data type, and large text might describe, i don't know, a paragraph in, you know, like the a metadata thing of your 3D object. But your 3D object also has other other pieces of specification in it that can be serialized. So,
01:05:41
Speaker
It has data points right that are being rendered in a certain way, vectors that are being described, all that. So we have a type for number arrays that's also config-free replicated and can be injected into. We have a type for...
01:05:57
Speaker
like a matrix, let's say, or ah as we said, a text type or a Boolean, all the different things you would use to, let's say, create a JSON representation of your 3D object.
01:06:08
Speaker
Well, whatever it is that you're eventually writing to disk, that can be described as ah series of data structures. And so if you take those data structures and map them onto the primitive versions that exist in our CRDT toolkit um that we're broadly calling the shared document, you specify what this type of shared document is, then it starts working, right?
01:06:34
Speaker
But that part of it is, it's non-trivial, but it's way easier than if you were building it from scratch. And it's built on top of a set of tools called YJS. And then on top of that, we put the Jupyter concept of a shared document. And then on top of that, you put your concept of your 3D document.
01:06:55
Speaker
So are you saying you are providing me a toolkit for arbitrary CRDTs for JSON, and if I can turn my model into JSON, then we can start to have a party?
01:07:08
Speaker
Is it something like that? Yeah, it's pretty much that, okay with some limitations compared to JSON. ah when And so that's really why we we had to do it that way in JupyterCAD.
01:07:21
Speaker
um Most of the time when people devise a data format that is based on JSON, they would also specify a JSON schema for it, right? And JSON schema ah allows for all kinds of constraints on what are what makes a file a valid file or an invalid file, right?
01:07:41
Speaker
And a lot of these were devised without CRDT constraints in mind. So CRDT, it's... It's is's going to be a lot for dictionaries and lists and whatnot, but it's going to not be very good at implementing some of the constraints you can specify in a schema.
01:07:59
Speaker
And one very simple example, but it's probably the simplest. Let's say that we decided that a valid notebook must have at least one cell, even it if it's an empty cell.

Version History and Git Integration

01:08:10
Speaker
That's the user experience. If you create a new notebook in JupyterLab, you see one first empty saw.
01:08:16
Speaker
It's not empty, right? So let's say that on top of this user experience, we also said that an empty notebook was invalid. and you have two users displaying uh the same notebook right and which has two cells and each one of them deletes one cell but not the same yeah and you know the the natural resolution of the crdt is to end up with an empty notebook and there is no clean way to resolve this because crdts rely on the fact that people are going to resolve
01:08:54
Speaker
and implement eventual consistency in a decentralized way. Yes. yeah So ah if we say it has to ah have at least one cell, you you need to re-add an empty cell, and both people will be re-adding an empty cell, and then you end up with a new notebook with two empty cells.
01:09:13
Speaker
Because, you know, the message crosses again. So when you devise a new data format for CAD, we need to... make sure that we don't require constraints like this.
01:09:27
Speaker
And there are many such constraints. How? How do you do that? Is it just a lot of thinking about your data set or are there tools to support it? um it's usually constraints that are linked to the cardinal are like not good. Constraints that bind ah data from different parts of the document together are also not great.
01:09:51
Speaker
So we tend to end up with ah data formats that are a lot flatter than what people would typically do if they're just coming up with a nice XML or JSON format for CAD
01:10:07
Speaker
Okay, okay. This... Okay, I start to get a sense of that. um Just one other the feature that comes to mind, because you said about this earlier that um like users of these application spaces might not necessarily use Git.
01:10:26
Speaker
is Is there something in there for things like maintaining version history of stuff? and Does that feed into collaboration? Yes. So...
01:10:38
Speaker
You can, and this can get complicated, just think about, think about there's just the three of us, we're inside a document, we're all editing, one of us hits undo.
01:10:54
Speaker
It's not obvious whether the last thing that was done should be undone or the last thing that the person who had undo did. Right. And so you have to make a choice. Some of these are actually just user experience choices you have to make.
01:11:07
Speaker
We've decided undo means the last thing i did. doesn't mean the last thing that happened. It means the last thing that I did. That's a choice. Right. But, um, yeah, The other is, so if you install Jupyter Collaboration in the toolbar, there's a little history browser.
01:11:23
Speaker
If you go to Google Docs in the toolbar, there's a little history browser. like These histories have to be stored in order for RTC to make sense. That doesn't 100% map onto Git.
01:11:35
Speaker
Depending on the way your company works, you might... Then, after you've solved whatever thing the three of us are trying to, like the paragraph is done, then we committed or not, like that that has to be a different strategy.
01:11:50
Speaker
There are plugins that both Jupyter and third party have written to support Git, but there isn't yet a unified platform. RTC history plus Git history resolution. And I suspect that's because there isn't a generic answer.
01:12:06
Speaker
But um history is there. It is implied by this action. You can actually navigate that history, and it's not the same as Git

Improving Documentation and Deployment Scenarios

01:12:14
Speaker
history. So that's the sort of landscape of it.
01:12:19
Speaker
Okay, then given that that's complicated and you've got tools, I'm going to ask you about the documentation here. i mean, what's the maturity of this project if I decided to use it? And what's the documentation like? Because it sounds like there are a lot of constraints I need to learn about, and a new way of doing things that I need to learn about.
01:12:39
Speaker
Where are you at with all that? It's a past stack.
01:12:45
Speaker
Yeah, so but let me give another example from the Jupyter project that is not about real-time collaboration. JupyterHub. yeah JupyterHub is a highly configurable tool that can address use cases ranging from ah single bare metal machine ah that where you install JupyterHub where users authenticated using their Unix credentials, and everything runs in the home directory.
01:13:15
Speaker
That's already JupyterHub. All the way to Kubernetes-based deployments where users are authenticated using some kind of third-party SQL sign-on service and spawn custom environments in Docker images on pods.
01:13:34
Speaker
selecting hardware from a menu when they start their machines. And all of that is still JupyterHub. And even Binder is based on JupyterHub. So ah deploying JupyterHub from scratch by just looking at all of the possible options makes it very easy for you to shoot yourself in the foot.
01:13:56
Speaker
So what we, what the JupyterHub team has done is defined ah collection of well-documented deployment scenarios that they call distributions.
01:14:08
Speaker
One is called the little as JupyterHub, which addresses the first use case. The other one is called Zero2JupyterHub, which is based in Kubernetes. um We're in the same situation for real-time collaboration.
01:14:20
Speaker
We have built the Lego bricks. And it can be assembled in variety of ways. And ah think we need converge now on well-defined scenarios for which we have a fully documented, consistent story for deploying it.
01:14:42
Speaker
This is very challenging because... real-time collaboration in enterprise settings, who use permissioning and whatnot can be very hard. And so, but I think this is what we will end up having some kind of distribution story.
01:14:58
Speaker
So is this at the status This kind of implies that we're at the state not where you would generally expect people to go and just run off and build their own arbitrary real-time collaboration apps, but that maybe a large company would get some expertise in to support them building it.
01:15:19
Speaker
Well, one engineer wrote JupyterCAD's first version alone. So you can. It's just different parts of the Jupyter stack have different levels of maturity.
01:15:34
Speaker
And there will come a point when when you pip install JupyterLab, you'll just get real-time collaboration out of the box. For now, it's isolated in a separate package, which is front-end and back-end extensions.
01:15:50
Speaker
And it's a bit harder API to learn. And it's moving faster. Part of the reason to put it in a separate package is to release more versions of it than the clients that use it.
01:16:02
Speaker
So this is probably the hardest thing you can do with Jupyter Tools if you're an application author. But even still, with the hardest thing you can do, one engineer...
01:16:14
Speaker
wrote JupyterCAD alone. So that's not something you can do. I can't sit down and write Google Docs by myself in a reasonable timeframe, no matter how smart I am. yeah and But if I'm using the Jupyter tools, I can sit down and write a clone of it because somebody else did the smart part, right? so And because lots of people did the smart parts.
01:16:37
Speaker
um So yeah, it would be harder to do this than almost anything else in Jupyter because it's one of our newest cutting edge APIs and because it's sort of a moving target and because we've only built a handful of real-time collaboration-enabled apps. So we haven't hit all the edge cases. And as soon as you try to do something novel, I'm sure you'll hit an edge case we didn't.
01:16:58
Speaker
Like the notebook requiring to not be empty, we didn't realize that was a problem until it was a problem 15 years after the format was written. Yes, Sylvain, please. Right, here yeah. Yeah.
01:17:10
Speaker
And so very recently, actually, we released another application like JupyterCAD, which was actually announced on social media this morning on the day of the recording called Jupyter GIS.
01:17:27
Speaker
Ah. And Jupyter GIS is meant to be to QGIS what JupyterCAD is to FreeCAD, basically a collaborative JupyterLab-based, web-based interface for editing QGIS files.

Jupyter GIS Project and Future Goals

01:17:43
Speaker
Oh, okay. So is that for is that for people collaborating on maps? I'm not familiar with the inspiration.
01:17:54
Speaker
And keep in mind, our expertise isn't this either. And yet the toolkit is there for us to build something that is meant to be complementary and recognizable to practitioners.
01:18:06
Speaker
Right. So, right. Yeah. No, I mean, it it is. um It's taking all of these concepts, these primitives we've been talking about and applying it to this analytical mapping space.
01:18:20
Speaker
okay So what's your aim in doing that? are you ah you building this because that's something you need for something else? Are you building it because you think the world needs it? Are you building it to prove the concept which you're actually working on, which is Jupiter as an environment?
01:18:36
Speaker
So the Jupiter GIS project was built um with the funding of a grant from esa the European Space Agency, as a demonstrator that this could be built, so that we would build feel web-based and JupyterLab-based front-ends for QGIS workflows.
01:18:56
Speaker
Okay. So does that mean eventually there'll be people at the European Space Agency collaborating across the internet on where to land the rocket? Or is that going too far? for Earth observation, i would say.
01:19:10
Speaker
and So if you have a lot of data from but Copernicus satellites or or data from you know local you know governments and you know cities that they want to share and make available to researchers, then this could be a good tool.
01:19:31
Speaker
Okay. Yeah. But what's your angle there? Is it just that, okay, so they've provided the funding, it seems like a good project, you'll do it, or is it helping you build towards something?
01:19:43
Speaker
Well, we applied for that funding with a very specific idea. So it's very aligned with the vision that we have for the for the tools, which is that which is that JupyterLab as an application framework is a great tool to build desktop-like applications and that GIS is a really good target for that.
01:20:05
Speaker
Okay, yeah, yeah, that makes sense. i can see that i can see that working out for large companies, large organizations who need that kind of thing.
01:20:18
Speaker
Do you think ah you I'm trying to get a sense of who you're targeting with this as a framework and without saying everybody, but is it more for these larger desktop-esque applications than, say, plucky young startup?
01:20:37
Speaker
Well, no, I don't think so because... This project that we just announced, you could easily imagine a plucky young startup building it. it didn't It didn't require a massive team to do because the toolkit made it easy for a small team to do.
01:20:55
Speaker
And it is, for Jupiter, somewhat low-hanging fruit. For any other... for any other set of tools you use is actually quite a tall climb.
01:21:08
Speaker
So if there's a way to take a bunch of analysts who are already writing code to do analysis on data that is publicly available and they're already using tools to render this stuff on a screen,
01:21:23
Speaker
And on top of that, you can add real-time collaboration and arbitrary scripting in any language they like rather than the one they must currently use. That's like a really big win that is good for science. It's good for open source. And it does validate this architecture that we've built. It's it's aligned in a bunch of ways, I think.
01:21:48
Speaker
Okay. does does that Does that mean, does that suggest that a particular sweet spot for this is where you're looking to also run a programming language or a scripting language within the application you're building?
01:21:59
Speaker
Yes. This is really an area where, for example, I guess that, you know, FreeCAD allows for scripting already with a Python console.
01:22:11
Speaker
And it's actually really nice, but that Python console is not as featureful as fully-fledged Jupyter notebook.

Jupyter as a Framework for New Applications

01:22:18
Speaker
So in terms of scripting capabilities, it's very possible that Jupyter CAD will offer areas of comfort that FreeCAD doesn't because we have a more sophisticated front end for scripting.
01:22:34
Speaker
Right. Okay. So one this is a very programmer question, but i I want to know because... um For people doing like programming languages, new programming languages, one thing you want to do is give people a sense of what the programming language is to like to write in.
01:22:50
Speaker
And it seems like all the languages that compile to WASM have a very easy answer for that. yeah Are you therefore providing a place for people who've written any new programming language to say, and we can put it straight in your hands and you can try it out?
01:23:07
Speaker
there's not that many new programming languages coming out, so it's not like a community Oh, you'd be surprised. No, no. I mean, there's there are plenty, right? But we're talking dozens or hundreds tops. So it's not like that's the demographic people after.
01:23:21
Speaker
But what is possible today is you can put basically a live REPL that's just going to work in your browser, on your web page, describing your programming language, and that's compelling.
01:23:34
Speaker
So if someone came forward and said, hey, I wrote this language, it can compile to any target. How do I get a repliford on my screen? Probably three hours later, it'll be working.
01:23:48
Speaker
Yeah, that's nice. Okay, so I think I get a sense of where this might be. Where would would this be a framework that I put on my shortlist for building something? So I guess the final question is, what's the current state of it if I want to go ahead and where's the best place to go and learn?
01:24:08
Speaker
maybe Maybe, Sylvan, you'd like to tackle that one.
01:24:13
Speaker
um It's a very broad question, but... Is it production ready? Is it a good idea to start a new project in this or is it still an experimental thing? Well, you know, Jupiter is a very popular project with a high technical readiness level if you measure it by that standard.
01:24:31
Speaker
So it's used by an entire industry. And within Jupiter, there is a varying degree of you know maturity. But anything that's been elected as being called Jupiter...
01:24:46
Speaker
is is not a prototype. Sometimes we have to be very um conservative about what we enable by default, because this is a tool that is used by many people in the world to do very serious...
01:25:03
Speaker
real world things and we can't just break the workflows for the latest cool thing that we built. Uh, but as a, as a developer, it's a very solid stack to build upon.
01:25:18
Speaker
Okay. And where do I go and learn about it? If I want to, if I want to build a prototype, where's the right place to start? Well, I just want to say one thing, though.
01:25:31
Speaker
Keep in mind, we we love this question. And this is definitely a group of. ah people we'd love to attract is the application authors.
01:25:44
Speaker
But you have to recognize that we have millions and millions of end users, right? So our primary focus actually is to give them a good user experience to do the work that they're doing.
01:25:57
Speaker
To do that, we've built this rich application framework. So it depends on the application you want to build. If you want to build some new desktop-like application, what I would say is first and foremost,
01:26:10
Speaker
ah check out JupyterLab and its APIs, right? So you can either go to jupyter.org find JupyterLab there and follow the thread, or you can go to jupyterlab.readthedocs.io and start there. And that gives you multiple ways as a programmer you might come in. You might just want to write an extension. You might just want to configure your install. You might want to write a whole application using the Jupyter front-end framework, but that would be a good place to start.
01:26:39
Speaker
Or if there's a Jupyter application that already looks like the one you want to write, so for example, the notebook UI is quite stripped down. It's just like, you know, a notebook screen where you begin writing.
01:26:52
Speaker
Maybe that's the application you want to start from, or ah maybe even simpler, you can go inside the JupyterLab repo, there's a There's a group of example applications. One of them is just the standalone code console.
01:27:06
Speaker
Maybe that's all you want. One of them is a lab-like application that's simplified. ah So it really, I think, if you are coming at it from the ah angle of, I'm a programmer, I want to build a cool thing, and this is a toolkit I'm considering, yeah, go there. Look at the examples in the JupyterLab GitHub repo.
01:27:26
Speaker
Okay. Okay. That sounds like a good place to get started because I'm curious. I'm curious to get a feel for what the code would be like to put this together. Yeah. So I think that's where I'll go and take a look. man Yeah, yeah. Darian, Sylvan, thank you very much for taking me through it.

Reflections and Encouragement to Explore Jupyter

01:27:39
Speaker
That is and a larger and more complicated field than I expected, which is always what I want to hear on this podcast.
01:27:46
Speaker
Thanks. Thank you. Thank you. thank you Thank you, gentlemen. know One of the thoughts I leave that conversation with is they said Jupiter supports about 60 different languages, but we've covered a lot of new and niche languages in previous episodes. There must be one of those that isn't supported by Jupiter yet.
01:28:06
Speaker
And wouldn't that make an interesting side project? That's a rabbit hole. I am tempted to go down. I'm going to have to try and avoid it because I've got too many ideas and never enough time. But, you know, I'd rather have too many ideas than too few. So don't curse your luck.
01:28:21
Speaker
As always, if you need more ideas, you'll find links to everything we've covered in the show notes, Jupiter's homepage and its many moons that orbit it. So go and check there to get started.
01:28:33
Speaker
If you've enjoyed this episode, please take a moment to like it or rate it or share it with a friend or with one of your networks. If you want to support future episodes, please check us out on Patreon or YouTube memberships.
01:28:45
Speaker
And if you want to catch future episodes, make sure you're subscribed because we'll be back soon with another great developer and their voice. Until then, I've been your host, Chris Jenkins. This has been Developer Voices with Sylvain Corley and Afshin Darian.
01:28:59
Speaker
Thanks for listening.