Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Will Turso Be The Better SQLite? (with Glauber Costa) image

Will Turso Be The Better SQLite? (with Glauber Costa)

Developer Voices
Avatar
0 Plays2 seconds ago

SQLite is embedded everywhere - phones, browsers, IoT devices. It's reliable, battle-tested, and feature-rich. But what if you want concurrent writes? Or CDC for streaming changes? Or vector indexes for AI workloads? The SQLite codebase isn't accepting new contributors, and the test suite that makes it so reliable is proprietary. So how do you evolve an embedded database that's effectively frozen?

Glauber Costa spent a decade contributing to the Linux kernel at Red Hat, then helped build Scylla, a high-performance rewrite of Cassandra. Now he's applying those lessons to SQLite. After initially forking SQLite (which produced a working business but failed to attract contributors), his team is taking the bolder path: a complete rewrite in Rust called Turso. The project already has features SQLite lacks - vector search, CDC, browser-native async operation - and is using deterministic simulation testing (inspired by TigerBeetle) to match SQLite's legendary reliability without access to its test suite.

The conversation covers why rewrites attract contributors where forks don't, how the Linux kernel maintains quality with thousands of contributors, why Pekka's "pet project" jumped from 32 to 64 contributors in a month, and what it takes to build concurrent writes into an embedded database from scratch.

--

Support Developer Voices on Patreon: https://patreon.com/DeveloperVoices

Support Developer Voices on YouTube: https://www.youtube.com/@DeveloperVoices/join

Turso: https://turso.tech/

Turso GitHub: https://github.com/tursodatabase/turso

libSQL (SQLite fork): https://github.com/tursodatabase/libsql

SQLite: https://www.sqlite.org/

Rust: https://rust-lang.org/

ScyllaDB (Cassandra rewrite): https://www.scylladb.com/

Apache Cassandra: https://cassandra.apache.org/

DuckDB (analytical embedded database): https://duckdb.org/

MotherDuck (DuckDB cloud): https://motherduck.com/

dqlite (Canonical distributed SQLite): https://canonical.com/dqlite

TigerBeetle (deterministic simulation testing): https://tigerbeetle.com/

Redpanda (Kafka alternative): https://www.redpanda.com/

Linux Kernel: https://kernel.org/

Datadog: https://www.datadoghq.com/

Glauber Costa on X: https://x.com/glcst

Glauber Costa on GitHub: https://github.com/glommer

Kris on Bluesky: https://bsky.app/profile/krisajenkins.bsky.social

Kris on Mastodon: http://mastodon.social/@krisajenkins

Kris on LinkedIn: https://www.linkedin.com/in/krisjenkins/

--

0:00 Intro

3:16 Ten Years Contributing to the Linux Kernel

15:17 From Linux to Startups: OSv and Scylla

26:23 Lessons from Scylla: The Power of Ecosystem Compatibility

33:00 Why SQLite Needs More

37:41 Open Source But Not Open Contribution

48:04 Why a Rewrite Attracted Contributors When a Fork Didn't

57:22 How Deterministic Simulation Testing Works

1:06:17 70% of SQLite in Six Months

1:12:12 Features Beyond SQLite: Vector Search, CDC, and Browser Support

1:19:15 The Challenge of Adding Concurrent Writes

1:25:05 Building a Self-Sustaining Open Source Community

1:30:09 Where Does Turso Fit Against DuckDB?

1:41:00 Could Turso Compete with Postgres?

1:46:21 How Do You Avoid a Toxic Community Culture?

1:50:32 Outro

Recommended
Transcript

Choosing the Right Database

00:00:00
Speaker
I find databases endlessly interesting, and for good reason. As programmers, we're always going to need more options for storing and reshaping data. And as that data becomes more important to us over longer timescales, then your choice of database and the behavior of your database can become perhaps your most pivotal design decisions.
00:00:22
Speaker
why there are so many options. We need so many options. We need everything from the huge enterprise distributed cloud databases all the way down to a relational database so tiny you could find it in your thermostat or your watch.

SQLite: Pros and Cons

00:00:39
Speaker
There's definitely a SQL database inside your phone and inside your browser, and it's going to be SQLite. It's the embedder's choice. And SQLite's fantastic.
00:00:50
Speaker
Let's be clear about that. It's small, it's light, it's extremely reliable, it's surprisingly feature-rich. It's a success. But you can always want more, can't you? You can always ask for more.
00:01:03
Speaker
The two features I've found I wanted in the past from SQLite are concurrent writes. That'd be good. it Doesn't need to be wildly performant, but writing from multiple threads would be nice.
00:01:15
Speaker
And synchronisations. Wouldn't it be great to be able to write to a local database on an iPhone and have the changes magically synced to the cloud once the internet connection comes back?
00:01:27
Speaker
I'd love that. Is there any chance of seeing it in SQLite? Not really. I can't see it happening. The design is pretty much locked down.
00:01:38
Speaker
And while SQLite is open source, it's not really open to new contributors. So...

Innovations in SQLite by Glauber Costa

00:01:45
Speaker
My guest this week has taken the very bold move of rewriting SQLite in Rust.
00:01:51
Speaker
I'm joined by Glauber Costa to explain how you rewrite SQLite, why it makes sense, and why it might be the only realistic way to get an embedded relational database with a whole new wish list of features, like concurrent writes, like CDC, so you can subscribe to a stream of database changes, And like vector indexes for lightweight, local, agentic databases.
00:02:17
Speaker
It's ambitious. It's potentially a huge project, but Glauber's not doing it alone. It's easy to point at Rust as being the headline difference from SQLite, but it's the open contribution model that Glauber's most enthusiastic about.
00:02:32
Speaker
So let's take a look at how the community is building a new embeddable database called Terso. One quick note before we begin, when we recorded this, Terso was in alpha, it's now in beta. So congratulations to the team on the progress. And if you hear something in this that you're keen to try out, remember beta is a marker that they want you to try it and send your feedback.
00:02:55
Speaker
Link in the show notes. Let's get started. I'm your host, Chris Jenkins. This is Developer Voices, and today's voice is Glauber Costa.

Personal Journey: From Canada to Texas

00:03:16
Speaker
Coming in live from Texas, I believe, Glauber Costa, are you doing? I'm doing great, Chris. Thanks for having me. I'm very pleased to have you here. your Where you are is probably as swelteringly hot as it is over here, so I hope the air conditioning noises don't disturb the podcast. Yeah, ah and and the fun thing is that I moved two months ago, I'm relatively new to Texas. I feel at home already. It's a great place. where Everybody's very friendly.
00:03:41
Speaker
But I moved two months ago from Canada. where the temperatures are not the same as in here. So I'm looking forward to the winter and and it is very, very hot. that this i knew I knew this was going to be the case. I'm not complaining about it.
00:03:57
Speaker
ah But, you know, it's one of my favorite phrases from The Matrix. If you remember the movie, like knowing the path and walking the path, two different things. So like big be being here in the heat, I mean, it's been it's been one of the most challenging parts of the move, but I'm enjoying it so far.
00:04:12
Speaker
Well, i um as long as you've got plenty of air conditioning, you can always adapt to anything, right? I believe that's the whole of Dubai The one thing Americans do very well is just like the AC everywhere. Yeah.
00:04:23
Speaker
So I've got to somehow bridge from that particular piece of technology through to some more intricate technologies. um ah Looking at your background, we could almost do two whole podcasts, and I'm going to get on to the main topic.

Early Contributions to Open Source

00:04:36
Speaker
But you first, you've got to tell me a bit about your background being a Linux kernel contributor.
00:04:41
Speaker
Because that's some interesting stuff that I've never looked at. yeah Yeah. So I contributed to the Linux kernel, of course, in in in various ways, and and sometimes more, sometimes less, sometimes doing more interesting things, sometimes more boring things, but for for a grand total of 10 years. so So I started i started by around I think my first contribution was around 2003, 2004, and then I was very active until around 2013. So nine or ten years was the total time that I spent. It that it was actually my first introduction to programming in a sense. I mean, i learned it i learned it in school, but I got... to
00:05:21
Speaker
Fascinated, and I could never explain why. It's what it's one of those things that, that you know, just ah it just happens to you. I could never explain why, but I always go fascinated with the Linux kernel, and I was fascinated with the open source nature of that.
00:05:34
Speaker
I think that the the first time I even used the Linux was in 2011. end of the end of the year 2000, beginning of 2001. I will never forget that you know I was using Windows at the time as as everybody else was. yeah and Then I had this friend in college that said, dude, you've got to try this. This is going to blow your mind. and and The source code is open. You can read the source code. and and i and i and I still remember that like I did not know that you had to download the source code. I was assuming that you know you would install Linux and the source code is just magically there. yeah So i started I started looking for it. The first thing I did when I installed Linux, at the first time I started looking for the source code.
00:06:15
Speaker
And then I found, and I did not know C at the time, i was ah you know i was in college, so we were learning Pascal, which is i know it's a language just for learning. at the end of the day, and i yeah we're not going to go there, but I don't like this idea of a language for learning, but that's what we had at the time. Yeah. So I knew nothing of C. So I start looking, i said, where's the source code? I mean, I installed Linux now, I want to see the source code. So I find this.h files, ah which are the header files, and they only have like constant definitions, although the Linux kernel has a bunch of code in the.h files. But I start reading them and I don't understand anything, but I'm mesmerized. It's like, ah you know, just ah it was ah like this mystical experience, which I have no idea what's going on here. But wow, I am reading the source code. Like, am I understanding? Of course not.
00:07:02
Speaker
But I'm reading the source code. and And then I went into this, you know, I started learning C. My goal was always like, I wanted and i want to be able to write code for the Linux kernel. I don't know where this came from. It was just, ah you know, something. And A couple of years later, thiscon die right yeah couple of years later i actually started understanding. Well, at first I learned that you had to actually download the the source code. It wasn't there. It was just the header files.
00:07:30
Speaker
and Then I started doing like small things here and there. uh and and you know just reading books there was a but bunch of very good books at the time about the linux kernel and and the internals uh which is something i think not a lot of projects do today i mean just but but at the time and linux being of course that the behemoth that that it is there were many books uh about led the internals of the linux kernel so i read them back to back yeah and and the one of them i think was the I don't even remember the names of the books, but I think it was Understanding the Linux Kernel, one one of them that I read. And and circa 2003, 2004, I think I sent my first contribution to the Linux kernel mailing list, which was rejected.
00:08:14
Speaker
ah and What was it the main thing was it you were trying to submit? Say that again? What was it you were trying to submit? ah It was a couple of fixes to the EXT2 file system. That's how old it was because we have EXT4 now for a long time.
00:08:31
Speaker
and The maintainer of the EXT file systems and and the virtual file systems in general, I will never forget his response. He was a very loving Linux. The Linux kernel community is known to be a very loving and and nurturing place.
00:08:47
Speaker
ah So his response was that he has never seen code this bad in his life. I managed to introduce three bugs and two lines of code. And people like me should never be allowed close to a keyboard again. So that was the introduction to that was my introduction to Linux. Yes. I'm surprised you stuck around after that.
00:09:07
Speaker
As I said, I was

Career Progression and Red Hat Experience

00:09:08
Speaker
very motivated. i i don't know pay i don't know why, but but you know I could never explain that. But i was ah i was very motivated. That was my dream, to it land code and into the Linux kernel. And then I did, and that became my job. right My first job was that was at Red Hat.
00:09:24
Speaker
And I was at the time in the virtualization group. The virtualization was the AI of the day, like the hot technology. Yeah, yeah, yeah. I remember the whole VMware explosion. You had this interesting company. Yeah, there was this very interesting company. They were a bookstore.
00:09:36
Speaker
and So you could you could buy books from them online called Amazon. And they were using virtualization to try this very crazy project to have like something called the cloud. So they were all like the the new the new things at the time. Red Hat was a provider at the time for Amazon.
00:09:55
Speaker
ah so so and And then my career started like working professionally, of course. I mean, in the beginning, I was just like this guy who... did some cool stuff. But then I had the opportunity through my job at Red Hat to work in in many things. I mean, the bulk of it was around virtualization. I later worked in in containers, at the Cgroups subsystem, which today underpins the the container technologies.
00:10:20
Speaker
So i've I've done a lot of like very low-level x86 code for the boot sequence so as part of the virtualization work. Oh, wow. Okay. You've gone all the way down to the chipset. Yeah, memory management, ah scheduling as well. This is actually why I'm very grateful that I worked in virtualization because you end up... Virtualization is essentially this idea that you have the virtual machine running.
00:10:44
Speaker
So there's a Linux on top of Linux. And you end up having to do everything. like just that So there's a how-to. You have problems like how to... One of my... biggest contribution was in the timekeeping code because you had those virtual machines that couldn't keep time right.
00:11:01
Speaker
right just ah in the In the early days, timekeeping in the Linux kernel was based on interrupts. An interrupt would come and then you would count how many interrupts you have received and and then every interrupt. Interrupts come every 100 milliseconds, then you count how many you have received and then you know how much time has passed.
00:11:18
Speaker
The hardware is very precise to do this, but the virtualized hardware is absolutely not precise because ah the virtual machine might not even be running. so we had We had a lot of ah issues around timekeeping. I worked a lot with that. and Then like how do schedule how do we schedule efficiently processes that are in the virtual machine knowing that they you know the scheduler may or may not.
00:11:39
Speaker
that the It's another assumption that of the Linux kernel that the scheduler will always be running. right But again, in a virtual machine, that's not the case because the virtual machine itself may not be running. And and and then how do manage but how do you manage manage memory across virtual machines? And so I i ended up working with with essentially the the entire kernel. Right. and right Yeah, yeah.
00:12:02
Speaker
You must have known the stack inside out by then, if that's even possible with the Linux kernel. it's It's not possible because there's always something. so so but but just the what one one of the well one of the last things I've done was around networking. And that's around the time I i would i was at i think I had the sense of I had enough. and And I think that the rudeness and the brutality of the Linux kernel eventually got to me. I'm not ashamed to say. I mean, just at some point, I was just tired. I was like, look, man, it's a...
00:12:32
Speaker
It's just people who scream at each other. and is it yeah Because I know Linus has a reputation for it, yeah but it percolates down to the whole community, do you think? it's not the whole community It's not the whole community, but but it's a large part of of the community. and and because i mean linus Linus starts becoming a role model, right so and and people emulate this behavior. so It's a very common thing to see like people screaming.
00:12:56
Speaker
um and and I think I was in need ah of of of a new chat. I was feeling like already like, hey, maybe I want to do something else. and but but But I'm not afraid. and And I think it's just nice for me to to talk about this openly. It it might feel, it might feel depend you know, say oh you it didn't stick around. you You know, I did for 10 years. I had very thick skin.
00:13:18
Speaker
But I think at some point I just got tired of it. And there was one factor. There was not the factor, the factor but it was one one of the factors that led me to go work in in other

Introduction to Databases and Startup Life

00:13:27
Speaker
things. not Not everybody was like that, mind you.
00:13:31
Speaker
my Today I have a co-founder, which is one of my best friends called Pekka. Pekka at the time was one of the maintainers. I was i was never a maintainer in the Linux kernel. and never It's not even about rising to this position. I don't think it's the right term. It just didn't match my personality to be maintaining things. I was more the IC type in which, you hey, i want to code. want to contribute. and not like The maintainer job is very different.
00:13:54
Speaker
um And Pekka was a tremendously good maintainer. He was one of the maintainers of the memory management subsystem. Memory management in Linux is a large thing, so there were many maintainers dealing with specific parts of the stack.
00:14:08
Speaker
Pekka was one of them. and And we became very good friends at the time, and he was always a very gentle, very nice ah person. Now, ah you know, his Finnish, like Linus, so the way a Finnish person is nice is different. So there's some room there's some room there for, you know, some, ah ah how how can I best put it, like ah extreme honesty.
00:14:32
Speaker
But yeah you can do extreme honesty in many ways. And I think I like the way Pekka does it, did it at the time and still does it today. Andrew Morton, at the time, he was like the number two in the Le Mans Cardinal. He was famously...
00:14:45
Speaker
a very nice person. Greg as well, which was maintaining a lot of the stable releases. So there were very nice people in in the Linus kernel, but but this this environment of of like people fighting was pervasive. Linus, of course, it's just his style. you know It is what it is.
00:15:03
Speaker
But but yeah it trickles down, I think, to to um to a large part of the community. Yeah, I can totally understand that. and Anyway, 10 years on any one project is a long time, right? Yeah. But where do you go after that? I mean, what what's what's meaty enough to sink your teeth into after the Linux kernel?
00:15:22
Speaker
Yeah, so I joined i joined a startup in 2013, and that startup was founded by the creators of the KVM Hypervisor. The KVM Hypervisor was a hypervisor that I i worked with. from a Hypervisor is a technology that underpins virtualization, so I worked with them. I knew them personally for a very long time, Avi and Door, and they they created they started a company.
00:15:49
Speaker
and and i And I decided to join them, honestly. And and again, they know that. So I'm not i'm not you know talking behind their backs or anything like that. I did not believe in the project that they were doing.
00:15:59
Speaker
I didn't think it was a great idea. it was it was ah And it maybe was just an idea that was ah ahead of this time. It was a unicarnel. So a unicarnel is a specialized kernel. that that runs a single process.
00:16:11
Speaker
There is a very good Unicarnel that is ah growing a lot today called Unicraft. I chat every now and then with with their with their founder. ah But at the time, they had a project called OSV, which the idea was like, hey, cloud computing is now a reality, whereas five years ago, it was just a promise, now it's a reality. yeah But we end up in this situation where you have ah a heavy Linux kernel as the hypervisor running a heavy Linux kernel as the guest. so Can we come up with a new kernel that will only run one process and and and this is much lighter, this is way more efficient? I didn't quite believe in that, to be honest, but and you know and and they know it. But I wanted to join the company. I wanted to join them, right?
00:16:54
Speaker
yeah Yeah. Do you mean you didn't believe that it was possible to do well or you didn't think it was a good business idea? i didn't think it was a good business idea. Mm-hmm. Yeah. and why and And I also believe that if we were going to do that technically, like the best way to do this was to likely turn Linux into a unikernel, which is a much larger project. But the With the constraints of a startup, it's not the kind it's it's very hard to do. Now, I had a friend also were also from the Linger's kernel. Most of my friends at the time were working in the kernel. And and then I had a friend that told me, look, Glover, Alex was his name. And he said, look, i I knew very little about startups at the time. So I took his advice. said, when you have a team of very...
00:17:45
Speaker
of very fascinating, bright people together. if they don't get it right the first time, they get it right the second time, the third time, however many it takes it it needs.

Founding a Company with Pekka

00:17:55
Speaker
Because I was seeking advice. I mean, look, I would love to join Aviandor because I like them and, and you know, that they're they're brilliant. and And I think I want to work at something small, like and and and like a so startup thing.
00:18:07
Speaker
I don't know, but but I don't quite believe in in in this. Yeah. And Alex's advice at the time was, I i took it to heart and said, man, you know, if if this is not the thing, I'm sure they will come up with something. And that's exactly what happened. Because after after two years, this project failed.
00:18:25
Speaker
But the company pivoted into a database company. So we had we ended up writing a database called Scylla. And then I spent another 10 years, almost 10 years with them. So that I like to, i'm of the belief that if you don't spend time, you never actually get to know something deep enough, right? A lot of people today, they just jump from thing to thing. yeah And I think, and I think there's an aspect that, Hey, you you just need time with something that there is something that time gives you that you can't replace.
00:18:53
Speaker
But it wasn't almost it it was around eight to nine years and it wasn't Scylla because the first two years we it was the same group of people but doing something else. But I so i so i spent another, let's say eight to nine years in the company. And that was my introduction to databases because we ended up writing a database from scratch. Right. right yeah Yeah. With a team of how many people?
00:19:16
Speaker
um I think we were around 12 people, 12 to 15 people at the time. It was a large team already from the start, right? Yeah, yeah. it's It's funny. A database is a sufficiently complex project that you can't do it with a small number of people, but it's a sufficiently intricate problem project that you can't do it with a large number of people either.
00:19:35
Speaker
you Once it reaches a certain level of velocity, I think you can absolutely do it with a large number of people. But not in the beginning. just I bet that Oracle has a lot of people working on on the database. right yeah As I understand it, very few people actually touch the core anymore.
00:19:52
Speaker
it It might very well be, like especially especially Oracle. But but you you use, I think, a word that that is critical, the core. Because a database, much like the kernel, is a very complex system. So if you look at the Linux kernel, the Linux kernel has thousands and thousands of contributors at any given moment. I think I've heard the number 10,000 while back of active contributors.
00:20:15
Speaker
two um If you look at lifetime, it's probably an absurd number. Yeah, yeah. But there is the core of the Linux kernel and there is the Linux kernel has layers.
00:20:26
Speaker
They're all contributing to the Linux kernel, but only very few people are contributing to the core of the Linux kernel like I did, to the schedulers, to memory management, to networking. So it's a group of people that we all knew each other and and and we were all friends because it was a very, very, very small group. But then when you count the number of people working in drivers, working in in like attachments to the Linux kernel, then this number grows very much.
00:20:48
Speaker
And I bet that with ah with a database like CoralCore, it's the same thing. You have this core, like the query engine, but then you have all sorts of things that a database does. yeah that that is ah that is ah It's easier to put people working in those places. Yeah, all the way out to the teams of people just selling rebranded open source projects at consultancy. That happens Okay, at the risk of getting ahead of ourselves, I begin to see a picture emerging that brings you to your current project. You've got a taste for large open source contribution projects. You've got knowledge of how an operating system can work and not work, if I can hint at that.
00:21:29
Speaker
And you've got a taste for databases. Mm-hmm. So where does that leave you when you get to the end of the Scylla project or your involvement in it? Yeah. So ah i I'll just mention briefly after Scylla, I had a one year stint at Datadog. And again, I don't like short stints like that. But what I really what first and I love Datadog is one of the most fascinating companies I've seen. They have a lot of very interesting data problems.
00:21:52
Speaker
But by nature, i'm ah I'm a person who likes to build stuff from the ground up. Right. and I think what I always wanted to do after CLA was to start my own company.
00:22:04
Speaker
but i But I got to that state and it's the same malaise that hit me in the Linux kernel. like I've been doing this for nine years. you know just ah it you know you i need a i need a change. I want something else. And yeah and then you start having other problems on top of that to compose over time. So i was i was looking for something else to do.
00:22:25
Speaker
What I really wanted to do was to start my own company. But then by the by the time I was ready to leave Scylla, that was 2020. And in 2020, if you remember, something happened and and like the world was very crazy. yeah and And then I i had ah i just thought that that maybe wasn't the right time to start a company.
00:22:46
Speaker
I was wrong, by the way, because money was flowing freely in in the startup market, and which which is why I ended up staying only a year at Datadog. Yeah, in hindsight, probably was flowing more freely back then than it is now. It's very fascinating, but like yeah you know just ah it's the interest rates, I guess, because ah if you the same phenomenon that lets you like housing prices going up and etc. The VC community just had a lot of money floating around. yeah So in 2020, I decided to join Datadog.
00:23:16
Speaker
And in 2021, think it became clear to me that, hey, if I want to start a company, i i could. And I had the opportunity and and then Paka was willing to do it. That was my that was my one...
00:23:29
Speaker
ah that was That was the one thing i i wanted to do to start a company. i said, I will start a company, but I will only do this when Pekka is willing to do this with me. Him specifically? Specifically, I love it. okay And and we we have this we have this thing that we on Twitter, we pretend that we hate each other and and it's a great engagement bait.
00:23:49
Speaker
ah So we're always we're always fighting and we're always like ah calling each other their names. our Our friendship is like that, by the way. is is is not It's not that we act like that just for Twitter.
00:24:01
Speaker
we are like that to each other. Like we we act like we hate each other. But then what happens on on on Twitter is that like ah people don't know us, right? So they they don't know the context. They're like, what why are those guys? so So it's not that we are putting out a persona or anything like that. is It's quite the opposite. I mean it it feels like engagement bait than it is. of all And and we basic your band i will we we milk it a little bit, of course. but But at the end of the day, we're just being ourselves. and and but But I love him to, know,
00:24:30
Speaker
dearly. He's probably one of my best friends, if not the best the the best friend I have. I've been through his place in Finland a couple of times. He hasn't been here to the United States yet to visit me, but again, I'm new here, hoping hoping he can come. so and And I knew that if I ever started a company, I wanted to start a company with him. We had we had this this list of ideas that, hey, one day, and more crazy ideas like a special mattresses that warm you up. These days you actually have one that cools you down. And it the soft and all soft like we had this bunch what kind of things we could do So we started company in
00:25:06
Speaker
and it's funny because our story was very ended up being very similar to sila we weren't we were not trying to do it In fact, we were quite explicitly trying to do the opposite, right?

Development of Terso: From ChiselStrike to Cloud Service

00:25:19
Speaker
Because we've seen all of the things. At Scylla, I was employee number three at Scylla, at the company doing the Unicarno thing. And Pekka was the employee number four.
00:25:31
Speaker
Because when he saw that I joined, I said, hey, ah you know i want I want to join too. yeah And we we always like to work together like sit since the kernel. And and then like ah he was looking for a job. he He started working for the company. was called Cloudy Systems at the time.
00:25:45
Speaker
ah So we both worked together at Scylla for like around, he he had an extra year ahead of me. And then one day I said, hey, Becca, let's just start a company. And at Scylla, because we were so early, we've saw everything we saw everything that they've done right.
00:26:00
Speaker
And we also saw everything that in our minds they've done wrong. So one of the things we want, and we like we want to do everything different. We'll try to keep the things that that I think they've they've done right. But the things that we thought they' they've done wrong, we want explicitly do it differently.
00:26:16
Speaker
And we did everything differently. And... Fate wanted it so we would land about in the same place. right Be specific, though. What specifically did they do well and badly that you wanted to learn from? Well, I think i think what we what they did very well was ah not trying to, for example, when pivoting when when pivoting the company to Scylla, they picked... an Because writing a new writing a new database is a temptation.
00:26:44
Speaker
our new database. and then And then you start thinking about all the things that you could fix in in in databases. and And it's always like, this thing is not great, I'm going to do better. This thing is not great, I'm going to do better. But what I think what i think that folks at CIL understood very, very well is that something being bad doesn't mean it should be replaced because so some some things acquire this giant inertia.
00:27:12
Speaker
And this is one thing that we' we're keeping, for example, on Tursil. ah There are many embedded databases out there. So we're writing, well we'll talk about more later, but we're writing an embedded database that's fully compatible with SQLite.
00:27:24
Speaker
Scylla was a rewrite of Cassandra, of Apache Cassandra in C++ plus plus with a different architecture. There are many things wrong with Cassandra. in in know The query language is is not the best. like I think there are many, many, many things in Cassandra that were bad.
00:27:42
Speaker
Some of them are fixable and then you fix them. They're architectural, like that they are in in the way that the code is structured, in in the way that the the engine works. Some of them is things users interact with. Everything users interact with acquires a giant inertia. So there's a very, and and then you don't change them.
00:28:02
Speaker
So Scylla was a fairly compatible rewrite of Cassandra. And sometimes we will look at it and say, hey man, if this feature did not exist and nobody uses this thing anyway, we could make this other thing 10 times faster. It doesn't matter. It has to be there. It has to be, know, you can disable, there are ways in which can do this.
00:28:20
Speaker
But it has to exist because drivers depend on it. There are integrations that might depend on it. And so understanding the power of the ecosystem was one thing that Scylla did very well. and And we did the same. So we were like, okay, there are many people who try to write embedded databases.
00:28:39
Speaker
But we understood and we understand that it it's not enough to be an embedded database. It has to be fully compatible with SQLite. So understanding the power of the ecosystem, I think is something that they did very well.
00:28:51
Speaker
And it's something that we're replicating. So we're trying to emulate that. the other The other thing that, you know, the the thing that in our mind, they didn't do very well was validating more aggressively the the the market for the Unicarnel thing because and taking too long to pivot. So we wrote this Unicarnel for two years and then it didn't work. And then we ended up writing Scylla for another two years.
00:29:15
Speaker
And by the time the company but by the time the company put something out there, it was four-year-old company almost, right? So we wanted to move a lot faster. We said, hey, look back, a like ah we...
00:29:27
Speaker
We don't want to pivot, so we want to we want to have the the right idea right away, which is easier said than done. yeah But if we have, so we're we're going to put a lot of effort into validating that that this idea is is actually something good.
00:29:42
Speaker
And if it if it doesn't work, then we have to pivot like way more aggressively. We have to we have to move a lot faster. ah but but But the funny thing, I mean, this is an interesting thing, is that we we did all of that. We we we validated things. of we We spent almost two months just talking to people and understanding is the thing that we're building the right thing.
00:30:03
Speaker
and And it ended up not working anyway. yeah and and and and then we pivoted And then we pivoted to... But we we we did pivot faster. And then we pivoted to a fork of SQLite, which we maintained for for a little while. And then we built the cloud service on top of that.
00:30:19
Speaker
Okay, hang on. I think we've jumped a step. So yeah what was the first thing you built? i Sometimes I just mentioned, because ah you know I never want to ah dwell too deeply on that, but it was called ChiselStrike.
00:30:33
Speaker
ChiselStrike. ChiselStrike chisel strike was a a edge system for data persistence. So our idea was like, hey, people don't want to write SQL, but SQL is pervasive anyway. So what we're going to do is that we're going to merge a database, SQLite, with a TypeScript runtime, Dino.
00:30:54
Speaker
So we had Dino with embedded SQLite, in a way that you could deploy in in the Cloud and and and a query compiler. so That would allow you to write your code in pure TypeScript. The compiler will automatically generate SQL-like queries. so You would just write TypeScript code and then you will mark some classes as persistent and you will not be aware that there was a database there.
00:31:21
Speaker
Now, it wasn't a great idea. And and we we, in fact, we learned a lot about the process of validation because at the time it's, hey, let's do a lot of validation. But the way we did validation was just asking people if they would use it. And they all say yes. But then we never do. if That's the difficult thing. And then you go away for two years and build it and come back and they say, well, I didn't mean I'd use it for money.
00:31:41
Speaker
Yeah, or or or like, yeah, mean, sure, I would use it, but no I don't feel like it right now. Yeah. Or whatever. So so it became clear to us after a year. and and But that the second part we did right, like I always said, if we need to move away from this, we're going to we' gonna do super fast.
00:31:57
Speaker
ah So after a year, it became clear to us that this wasn't working. We were using SQLite. I love SQLite. SQLite was ah the first database that I used in 2003, something like that. I was mesmerized that the database was just there.
00:32:13
Speaker
like that you know it's The SQLite 3, there's a database here. There's no servers, there's no nothing. The database just works. It everywhere. Yeah, just a standalone process that magically turns into a database. Yeah. and like ah I say SQLite is the most used database in the universe, not the planet, because as soon as it gets used in the Mars rover, it is the most used database in the universe. Okay, okay. I'll i'll allow you solar system.
00:32:38
Speaker
Well, fair enough. Aliens must have a database. They must be. We don't know what they're doing. We don't know what the planet is or whatever people are doing. But in our solar system, for sure, yeah it's ah maybe let's dare to say the galaxy. So it's so in in in the galaxy, it's the most used database.
00:32:56
Speaker
And we were very fascinated with that. it It started to become clear to us though the SQLite had a lot of limitations, which I'll be happy to expand. But just to finish, ah you know how how do we get to Tourso? Thurso was an attempt that we had to build a fork of SQLite in which we thought we're going to fix a lot of the problems with SQLite. And then we built the cloud service on top of that. And the idea is that you would use SQLite over the wire and and have replication and edge replication.
00:33:29
Speaker
Now that Jesus Strike didn't work. Thurso worked. Thurso's there. like the We call it the Thurso Cloud now. But but it it works just fine.
00:33:39
Speaker
ah right but Do you mean as a technical project or as a business model? both Both. Okay. Yeah. and and But we knew that it started to become clear to us that Again, this ah the analogy that I used when when I published my first blog about what Tursle is today, because Tursle today is a rewrite of SQLite, a full rewrite of SQLite from scratch.
00:34:06
Speaker
and then And what what was Tursle before, we renamed it to Tursle Cloud. So it's good. We we want to use like the same branding everywhere. Tursle Cloud will be just a way for each deploying today's SQLite in the future, Tursle databases to the cloud.
00:34:20
Speaker
ah But the The analogy that I use is that, look, you know, you probably heard that phrase, should shoot for the stars. If you miss it, you're going to hit the moon you or you may hit the moon. It's not a guarantee. But that we we felt a little bit like that because we had we we had when we started this fork of SQLite, we had this grand vision and and this grand dreams and ideas and now we forks to collide and this is going to happen and that's going to happen and that's going to happen. And none of those things did happen. So we were in a a little bit sad, but we built good business out of
00:34:55
Speaker
right so Okay, you're going to have to take me through the motivation of this because I look at SQLite and I see something that um I don't see many problems with it, except maybe it's too small to run a large service on, but that's by design.

Community Response to SQLite Rewrite

00:35:11
Speaker
And i believe it's multiple reader, single writer, which seems like a limitation. but that is ah That was one of the big limitations. to us what what did What did you see when you looked at SQLite that you thought desperately needed fixing? Yeah.
00:35:23
Speaker
So it's great to dwell on that because that didn't change from you know the Thurso cloud to Thurso today. The main motivation, like what is wrong SQLite, didn't change in our view. What did change is that we thought we could fix this with a fork.
00:35:39
Speaker
and a fork took us up at a certain point. And later when we tried the rewrite, and just to mention that, when we announced the rewrite of SQLite, we had 9,000 GitHub stars in less than a month, and it wasn't working. Tursu still doesn't work. Tursu is alpha at the moment, right?
00:35:57
Speaker
But we had 9,000 GitHub stars in less than a month and 60 contributors in less than a month. People coming in and contributing. So there's that much pent-up demand for a rewrite SQLite. That's right. Why? What am I missing? What's not good enough? I'm going to get there. for just ah the This is, for me, I think so critical and crucial to understand because...
00:36:18
Speaker
What did not happen with with our fork was exactly that, like this this community embrace of people coming from all sorts of places with all sorts of different motivations and and and and their own agendas.
00:36:32
Speaker
But now let me let me go to your question, which is like, what were the things that were wrong sickle life? yeah There is one thing, and and the word is funny because like wrong is on the eye of the beholder. like ah Wrong, what does what does it mean to be wrong? I think sometimes we say that people understand that we are saying like they are objectively wrong.
00:36:50
Speaker
ah Look, it some things are by design, some things are trade-offs. and and yeah so so wrong yeah i'm not i'm almost I admire the work a lot, and I always try to make it very clear. Okay, but what different decisions would make?
00:37:03
Speaker
I think SQLite is the best database in the world. right? even even with the Even with the limitations. But the one thing, the one thing that, and and and a lot of startups, sometimes you hear this thing, but are you the right person in the right time and et cetera? And I think this applies to us because the one thing that I that i think everything flows, i can I can spend hours for you telling like the things that that we we will wish SQLite would do better. Multi-writer and multi-writer is the thing that a lot of people ask for.
00:37:37
Speaker
but I think they all flow from one thing. They all flow from the fact that SQLite is a, and this is their wording, not mine. This is on on the SQLite website, open source, but not open contribution.
00:37:51
Speaker
And what he means is that you can see the source, you can download the source. The source is a public domain. It doesn't even have a software license, but nobody contributes to SQLite with some exceptions here and there. SQLite is not a open contribution product project. is is ah I think two or three people work on it.
00:38:09
Speaker
And they say very explicitly, if you want something changed, you're welcome to send us code. We will read your code. we If we like it, we'll implement our own way. And most of the time, we're not going to like it. So it's an open source, but not open contribution project. in In my mind and in Pekka's mind, all the limitations and all the the glass ceiling around C-coli comes from this fact.
00:38:37
Speaker
that you don't have a large community of contributors that are helping push this thing forward. yeah And here's where i think the right person at the right time comes comes along.
00:38:48
Speaker
A lot of people think that the reason SQLite is so great, it is exactly because it's a single, let's say, a single individual, or three individuals.
00:39:00
Speaker
Because there's so much slop and there's so much, like you see a lot of those projects that go in crazy directions and you take contributions from everybody. And then it's the old argument, like what is best democracy or this or what kind of... Benevolent dictator for life. And some systems are better for this or for that. And then there's China. Software projects are completely different than political projects to begin with. But I think there's a lot of people today in our industry that see a lot of slop being generated, see a lot of open source software moving at...
00:39:35
Speaker
being being influenced by money in in a not so healthy way and et cetera. and And they all see, like, we've heard this so many, many many times, people arguing over this on Hacker News.
00:39:49
Speaker
The reason SQLite is great is because there are only a few people who contribute to this because they can have this, you know, gatekeeping. They they don't need, like, ah people. Now, yeah.
00:40:03
Speaker
Linux is the counterpoint. Yeah, I can totally see how you would come from that perspective that says actually this can work, right? Exactly. So yeah working with Linux for so long, as Pekka and I did, very closely to each other, we give the devil what to do.
00:40:22
Speaker
Because this this this thing that people say that the slop takes over the project and the project starts becoming bloated and it's The forces are there. you know Working in the Linux kernel, we admit that. like ah it's a The work of a Linux kernel maintainer is very... Part of the work of a Linux kernel maintainer, and by the way, maybe that's why I never became one, because I'm not very good at putting the bricks in that. That is fantastic at that.
00:40:53
Speaker
is exactly working as a counter-entropy, counter-slop force. No, this is the quick way to get us there, but we need to do this the right way to get us there. yeah And the main the culture in the Linux kernel was very, very good at enforcing this now in in a very brutal way. as as you know And I think the brutality becomes a a part of like, hey, how... There are only so many ways in which I can tell somebody who's going to start arguing with me that forget about it, your code's not getting in because then you know people start arguing and et cetera.
00:41:29
Speaker
Yeah, people take it very personally too and that can be difficult. People can take it very personally and one of the ways is just saying, go away, i don't want to see you here anymore. Now again, is it the best way? I don't know, but it's part of this enforcement culture in the Linux kernel. So we start by giving the devil its due Yes, it's real.
00:41:46
Speaker
the The forces that turn software into slop are very real and and we know them and we dealt with them for 10 years. We understand that they exist. And those forces have got a new ally with AI that's accelerating that side of the equation. yeah it's it's ah It's a brave new world out there. And and i'm very I'm very curious to see how AI is being handled in the Lewis column because I'm sure people are sending AI work there. It's a multiplier for good things and bad, in my opinion. I'll say that.
00:42:16
Speaker
yeah for ah we We use AI a lot and in in the company. register and the but But we knew we knew like hey you that, yes, you you're right. that the The forces are real. that There's this this entropic tendency of turning anything into slop.
00:42:32
Speaker
ah But we also question this this idea that you know it has to though the only way to do this is by having a a community of two or three people. I mean, linu Linux said was a community of thousands of people and and it works. it can work. And ob acyclle the other thing we always question, Obesico light is designed for this.
00:42:55
Speaker
Therefore, this other thing doesn't fit here. Like you said yourself, I mean, it's not very good for big systems, but it's probably fine because that's the trade-off. I mean, like we we question that.
00:43:07
Speaker
because in Linux famously, you you might you probably know, but maybe kids these days don't know, but Linux Torvalds, when he started Linux, he wrote this now very famous message to the Usenix mailing list saying, ah this is just toy, this is just a project that, you know, it will never work it will never work on anything other than x86 processors, and it will never be big as new and and and this and that. And like Linux started as something very niche,
00:43:37
Speaker
very small, very focused. And now it powers both the Mars rover and the biggest supercomputers in the world. yeah So it is entirely possible. It is entirely possible to have a system that works great across the spectrum. It is entirely possible.
00:43:53
Speaker
And again, it's and it's possible because you have different people with different agendas, with with different motivations that come together and find a compromise.

Open Source Benefits in Terso Project

00:44:04
Speaker
That's what we wanted to recreate. What I need to know is, are you saying there's a whole bunch of things that you wanted to change about SQLite and you thought you could build a community around it? are you saying if you just thought opened up the community, things would happen? Did you have an agenda of features? Yeah.
00:44:25
Speaker
it It is a little bit of both because we do we do have things that we would like to see on C-coli. But most importantly, we kept asking we we we we kept seeing in front of us very obviously that those things would have been fixed by now.
00:44:39
Speaker
if only you had a larger community. So so we do have things that we believe SQLite should do better. And again, the we we can come up with a list and and there are there is a big list, but the I think the main one that everybody's been asking us for is concurrent rights because SQLite doesn't do concurrent rights.
00:44:59
Speaker
yeah and And it's funny because it's actually not the one that we wanted to do. The one we wanted to do at the time was just replication. We wanted to be able to replicate the SQLite wall the write-ahead log so we could write yeah you know the distributed database. and we've seen We've seen other people doing it. and and There was a project from Canonical called DecoLite. They wanted to do exactly what we wanted to do in our company. and They wrote the code and the code was not very complicated. The code was actually fairly simple.
00:45:26
Speaker
But again, you cannot merge this code back to DecoLite.
00:45:31
Speaker
Later, as we started running the the cloud service, it became clear that the concurrent rights was the thing that the the first ceiling that almost everybody hits on SQLite is concurrent rights. Yeah, that makes sense. So it it was the one thing that that that I think we wanted to fix. Thurso has support for CDC.
00:45:47
Speaker
which SQLite doesn't have, which is another change data capture. So you can have a stream of changes that happen to the database. It's a feature that almost every database has. That makes sense. If you're cracking open the right head log, that that wouldn't be too hard to add in. yeah You know, sort of Torso, the fork, it was not called Torso, it was called LeapSQL, the fork. And then Torso was the cloud service built on top of that. like Torso, that the database that we write now, that we're doing has CTC.
00:46:16
Speaker
ah we We want to add encryption. We want we already have a vector search because you know vector search is something that is becoming quite important today. So there are there are a lot of features that very specifically say, hey, SQLite could do this.
00:46:30
Speaker
Again, yeah if if you go see from the community asks, what are what is the what is the people using, for example, our cloud service asking for? CGC and concurrent rights. Top top two top two asks.
00:46:44
Speaker
But we had, I think, you know, I always, and and maybe we should talk more about features, but for me, it was always a social issue, first and foremost. if If you fix the community, if we fix the community, if we have an open contribution community, those problems that, you know, things that we don't even see today as problems, because it's not a use case that we care about,
00:47:09
Speaker
somebody will, and and and we're going to have a, you know, there's there's this large amount of oxygen that that that comes into into it. and and And we're going to, you know, how do do, we're going have to resist this law, we're going to have to do this, and et cetera. And that's what we forked, so these reasons didn't change. i mean, we decided to fork at the time instead of rewriting.
00:47:30
Speaker
ah Makes sense. But the, so and and we had reasons that made sense at the time, and I think they were wrong, but not in hindsight. I mean, in hindsight, but you nothing like obviously wrong. and We're just proven wrong.
00:47:43
Speaker
But we had this dream. I mean, we're going to make SQLite into this open contribution. We wrote the manifesto. We said, this is the open contribution fork of SQLite. Come build with us. And very few people came. I mean, nobody came ah to the court. for To the court, nobody came. ah A couple of people came around again in in the margins and we had a couple of contributions here and there.
00:48:02
Speaker
ah For the court, nobody came. And I don't understand why a fork wouldn't get any attention, but a complete rewrite would. um I do understand now.
00:48:14
Speaker
I do understand now. but that but But again, i I'm with you because that's why we decided to fork. right We the say hey we would fork because then we're compatible from day one. It's working from day one. right There's nothing. you know and just ah So more we saw that the the things that led the fork not to be.
00:48:34
Speaker
So so i want to clarify here. It was, we built a successful business on top of this. So it's ah when we use the word successful, I just want to be careful, right? Because our endeavor was successful.
00:48:49
Speaker
but But the the dream that we had, oh you know, werere we're going to build this open community of contributors around SQLite and people people are going to start using our thing instead of SQLite everywhere. Those things did not happen.
00:49:06
Speaker
right So it was that that analogy of like shoot for, should for and look, we were happy we were happy with that. We were happy with that. Yeah. I mean, if you've got startups paying the bills, that is quite a lot in itself, right?
00:49:19
Speaker
That's right. And and the just ah just a quick context of how that came to be. Because I mean, it's not that we were actively thinking, it's different than than we when we pivoted the company and that we were thinking, oh man, this is not working, what do we do?
00:49:36
Speaker
For us, was like, this is working great, right? Now, we weren' we were aware of the limitations and and and we were and it was being very hard to fix them in the fork. ah But then PECA decided peca decided to play, as I said, Toursault today being in Alpha was a full rewrite of C. coli when we announced it in December.
00:49:58
Speaker
We announced it. ah It did not even support transactions. You couldn't write to the database. It it only support part it it only supported part of the select statements. So you could read some data from the database.
00:50:12
Speaker
ah So again, it was really just a weekend pet project. not Not a weekend because if he was working on it for a couple of months, but it was Pekka's pet project. yeah But we started to see that, and and to your point, I didn't quite understand why at first,
00:50:28
Speaker
that maybe this is what people want. It does show to me, by the way, that the pent-up demand for SQLite is not only a technical pent-up demand. i mean, it is a sit-on-the-table pent-up demand because people came. People came and flocked to it. and
00:50:44
Speaker
So Pekka started working on this more seriously. I mean, Pekka is always doing things like that, they're writing small toy systems to test hypotheses. So he had some code of like, ah hey, how would the new bytecode interpreter, they interpreter sure the the two most used SQLite instructions would look like? Boom.
00:51:06
Speaker
So he had like a toy that he would do some prototypes on. He started working... a bit more seriously in in in this project. I think it was around July of last year, something like that. in we did not It wasn't a Tursal project. It was not on our GitHub account.
00:51:26
Speaker
It was not anywhere. he He then, in six months, just we were not talking about it. i mean we were We were not talking about it on Twitter. It was just Pekka pushing code to GitHub, just him.
00:51:41
Speaker
We had, like, that project had a thousand GitHub stars. And just people finding, if it's not us talking about it, it's just people having find them on GitHub.
00:51:54
Speaker
How? you have stuff And why do they care? Why do they care more? Here's the most important part, because remember the dream, building the open contribution version of SQLite.
00:52:06
Speaker
Paka's little cute side project had 30 two people contributing to it
00:52:15
Speaker
without anybody talking about it. ah and And he was merging their contributions. That's right. So the feedback loop has begun. Yeah. And two of those people were so good because, again, there is this thing about the power law, right? that There are many, just like in the Linux kernel, there are many contributors contributors doing easy things here and there. Some people become part of the core.
00:52:36
Speaker
um Two of those people were so good that we decided to hire them to work on Thursault, the cloud. Right? Yeah. And so we thought, hey, Becca, there's something here, man. theres There's something here. So what we decided to do in December, it was December the 16th, we decided to Nothing extra, no more code, nothing. like just we're We're going to move this from your GitHub repository, just the transfer operation.
00:53:09
Speaker
Move this from your GitHub repository to the Tursu GitHub repository so there's a seal of approval. and and And we are going to write a blog post about it.
00:53:22
Speaker
That's it. yeah it took us It took us maybe four hours to do all of that. but i wrote a blog post. In the blog post, we said, hey, this is what is happening. What is the announcement?
00:53:33
Speaker
PECA's pet project is becoming a official Tursil research project. What is the end goal of this? I do not know. it's It's just like, ah you know, hopefully over time, and and I can give you the link later for you to put in the description, but like ah hopefully over time, more people will come contribute to this. I don't know. And then we go back, or we go out for holidays.
00:53:57
Speaker
right When we come back in the beginning of January from a thousand stars, because when you transfer the repository, it keeps the stars. From a thousand, it jumped to 9,000.
00:54:09
Speaker
in the space of less than a month, quite honestly. And from 32 contributors, it jumped to 64 contributors. So it doubled in in in in around the month.
00:54:23
Speaker
And this was mind blowing. And now to finally answer your question, I started talking to those people. Yeah. Why are you here? right yeah so yeah some some Some of those that we had hired already and and their answer, were which is another hallmark of things that work, their answers were very consistent.
00:54:44
Speaker
it very consistent Were they aware of our fork? Yes. But when you fork SQLite, you are beholden to the SQLite architecture.
00:54:57
Speaker
You don't have a blank canvas to create something anew. ah the fork was Most of the code of the fork was because of the nature of being a fork was in C, which is ah which is a language that is very hard for for a lot of people today to consume. Now, those engineers that that we're talking about, they're super good. They can consume C, but the you know some of them is becoming a rarity, even even across top-notch programmers. The rewrite was in Rust.
00:55:26
Speaker
ah yeah that's very and That's going to be very significant. It in the language is fairly significant. And now what I think is the most significant thing, even for us, like we built Vector Search into the fork.
00:55:41
Speaker
ah And it's just because it was a feature that a lot of people were demanding. And Vector Search is kind of easy to build on the margins. but We built it, but, and and again, not that we do not know this is a potential trade-off, but here's another thing that very people a lot of people do not know.
00:56:01
Speaker
A lot of people like SQLite because SQLite is the most reliable database in the planet, or at least the galaxy. And again, that's that is because their task suit is incredible. SQLite has 10 times more lines of code in the task suit than it has on the database itself.
00:56:16
Speaker
It is incredibly reliable. It's battle-tested. Everybody loves that. But that task suit, lo and behold, is proprietary. You don't have access to that. The test suite itself is not open in any way. So it's not even downloadable. It's not even like the source is... you You cannot run the SQLite test suite.
00:56:34
Speaker
So in the fork, to make more aggressive changes, it was very hard because you can't really test the changes. Now, you can test the changes because you're making the changes, but it's very hard to test the effect that the changes have on other parts of the system because you don't have- Yeah, you can't run the whole test rate of existing stuff, yeah. And we kind of brushed- brush it off over time and we said, we'll figure this out as we go. We knew this was going to be a problem for the fork, but we said, hey, we're just going to have to write a bunch of tests. and just the
00:57:06
Speaker
ah but But over time, i mean it was it was very it was a very hard thing to do. So what we ended up doing instead is that even our contributions, they naturally became in you know relegated to the margins. Now,
00:57:21
Speaker
With TURSO, the rewrite, we use deterministic simulation testing from day one because when you're writing something from scratch, you can write it in a way that lends itself to a very easy way of testing.
00:57:33
Speaker
I am sure you talked to Yoran in the past from Tiger Beetle. yeah We were very inspired by the work that he was doing at Tiger Beetle with deterministic simulation testing. And we knew that we could use the deterministic simulation testing to prove to ourselves and to the world that whatever we were writing had the same level of quality and reliability than SQLite.
00:57:54
Speaker
So this is the thing where you kind of set up ah almost a virtual operating system for the test suite where you can simulate yeah the network failing or the disk failing. Yeah, yeah. Which makes sense, giving your background. Yeah, it's not it's not just failures. The terministic simulation testing is ah is ah it's an interesting combination of fuzzing, property testing, and fault injection. I think the term is just used loosely, but but if if you want to if you talk to an expert, he will say, no, this is this is just fault injection, or no, this is just fuzzing, because you you can have those things.
00:58:29
Speaker
know You can do one without the other, technically, but a good test harness that that we have, we'll do all of them. So what's going to happen is that you're going to randomly generate a large... you don't You don't write unit tests like a select star from table. You write a query generator, you write the test generator, and then instead of testing, because you don't know in advance what the query is going to be, so instead of testing this query has to have this result, you encode properties of the system.
00:58:59
Speaker
So an example of a property, that the one is very easy. An example of a property is very easy to understand, is that after you write a lot to the database, the database cannot be

Advanced Testing for Database Reliability

00:59:12
Speaker
corrupted.
00:59:12
Speaker
So if you write write write, write, write, and do a lot of random operations, and then you run an integrity check on the database, the database file has to be sane and not corrupted. So that is a property of the database, for example. But you have other properties, like ah you you have to read what you wrote. or you know So you you include you you cannot like write the value and then in the same transaction, read the value back and don't see what what you wrote. So yeah you encode all of those properties of the system. And then you let the simulator...
00:59:41
Speaker
ah run awry with that, just generate queries randomly. We do fault injection. So every every now and then, with some probability, you inject a fault. you know The simulator, it's it's a page from disk that didn't return fully. It's something that that failed, memory that blew up, allocation that failed. Something happens that that that is faulty, and the system has to to behave well on top of that. And now the deterministic part is the most important because You have to virtualize all of the I.O. that you do. You have to virtualize all the operations that you do that interface with with anything in the outside world. So that if you run the simulator again with the same seed, you generate the same result.
01:00:28
Speaker
Because now you let the simulator run for hours and and in in the hours that it runs, it simulates like so sometimes decades of realo execution of of complicated execution.
01:00:40
Speaker
And if it fails, it gives you a seed. And then if you run the simulator with that seed, it fails in the same way. So that's the where the deterministic part comes from. So it looks like it's behaving randomly, but it's randomness based on a formula from and ah a seed value. Exactly. you can make yeah the exact same kind of randomness happen again. for one of the things that you virtualize is reading timestamp.
01:01:01
Speaker
right Because there are many bugs that we will only show up. For example, you can have a bug that only manifests in in in the the day of where the Unix... ah I forgot there was a name for that. like the We had the year 2000 bug and then there's the there is a name that people use for the upcoming Unix ah ah pandemonium. It's the end of the epoch, isn't it?
01:01:22
Speaker
Yeah, because Unix uses 32-bit for timestamps and and it's coming to a close around 2030. So there's an example of a bug that only happens with some timestamps. You have things about also... So every time you do something that that requires you know then it's going to be different across executions. You do it through the sim you do it through the simulator. you don't you don't You don't go and read the time from the operating system. You read the time from the simulator. The simulator will always give you the same time for the same execution. right so so
01:01:56
Speaker
You run this, you you let the thing run completely all right, and and then if it fails, it gives you the timestamp. Because of that, we we we released our alpha finally a couple of weeks ago.
01:02:10
Speaker
It's alpha. it's being It's being developed by six months. If you don't count the six months prior, which was just Becca, and now like very actively, because we're not going to talk too much about it today, but we and we completely reoriented our company The cloud business is there. it It's on hold, it's on pause. We're not adding new features to the cloud.
01:02:32
Speaker
Our entire company now is working on rewriting SQLite. It's been six months only that we've been developing this with with full force. And if you go to torso.algora.io, you will see the torso challenge. The torso challenge is the following. If you find a bug that leads to a data corruption and you show us in the sim where the simulator missed it, because the simulator is just software. So the simulator has to explore this the this space of probabilities. And there are, of course, ah you know places that the simulator doesn't go yet. So we say, if you find the data corruption bug and it shows us
01:03:11
Speaker
Why the simulator missed this? so for For example, maybe you were never missing indexes with with ah views, right? And then we you you you you missed this one bug here, the ledger data corruption. yeah Or like ah the default injection for reading from disk was was not random enough. and And so there was this one.
01:03:33
Speaker
So you do that, we're going to pay you $1,000. Okay, so you've got bug bounty on data corruption. Yes. And again, some people, of course, because there's always some people, the internet is ah is my favorite place in the world. you must not be You must not be that confident then if you're only paying us $1,000, right? And obviously the people who said that didn't contribute anything and and and didn't find anything, but um it's also you know fair criticism. But that It's alpha.
01:04:03
Speaker
it's It's there for six months, right? that Our plan, and we say this very explicitly in the challenge, our plan is to increase the bounty. So for the beta that is coming in a month, i want to increase that to $5,000.
01:04:14
Speaker
Later, maybe next year, we want to get this to $10,000. And over time, we want to increase the the bounty. And we also want to increase the scope of the bugs. It's not going to be just data corruption. It's going to be, for example, asset violations, right?
01:04:28
Speaker
Over time, we want to be confident. And we we say, Torso is not ready for production. It's Alpha software. They're not ready for production. But we're confident enough that if that it if you find the data corruption, we're willing to pay you because we know there are very little.
01:04:43
Speaker
Let me pick you nitpick there. Surely violating ASIC constraints counts as data corruption. No, no, data corruption is a very data data corruption is a very well-defined thing. ah data data Data corruption is the database ah losing or corrupting your data in a way that is not recoverable by a future patch.
01:05:02
Speaker
So if you have if you have an asset, now you're right that there's some there are some classes of asset violation that can lead data corruption. For example, if you write a value and don't read it back because the database lost it, there is a data corruption. ah But a situation where you wrote a value and then you have a problem reading the value, but the value is there and is recoverable, it's not a data corruption.
01:05:26
Speaker
right Because you can patch the database and and retrieve your data. Okay. Yeah, I i kind of see what you're saying. yeah Yeah. Data corruption is like, I was supposed to write this value and we just didn't. There is no way to recover this. Your data is gone.
01:05:39
Speaker
So you're saying... If it's somewhere in the write-ahead log, for instance, then that for counts is not right. Or or it doesn't have to be on the write-ahead log. It can be on the main database file. but before You can be like, I was supposed to write i was supposed to write the value 4.
01:05:55
Speaker
Then there was a problem with the serializer because the serializer depended on reading a timestamp or something like that. And instead of 4, I wrote 40. forty and you You are never getting your four back because you don't know what was supposed to be there. So that's a data corruption.
01:06:10
Speaker
So how much of this in six months, how much of it, SQLite, can you

Future Goals: Surpassing SQLite

01:06:15
Speaker
rewrite? What's the current feature set? Because it sounds like you've forked a very large project. We've learned 70% of SQLite now. We've done 70% of SQLite. What does that mean? Is is is it a project where 80% is under the surface and actually the last 20 is features for... In all fairness, the other it's one of those cases and in in which the other 20% is going to take 80% of the time. ah so But it still I think it's still quite impressive. and And one of the things, and and some...
01:06:46
Speaker
Some contributors, because we again, we have now, as of today, 137 contributors to Thurson. you So remember, it started with 30, which was just like ah some random people, two very good. yeah In a month, it was 64. Some of those, out of those new 32 that came in, some of them are still with us.
01:07:10
Speaker
right So some of them came and stayed, which is what you want to see you in a healthy open source community like that. yeah ah We already have people who found jobs because of their contributions to TourSoap, which is another great sign.
01:07:22
Speaker
That's not a great sign. We have people who use Tursil in their PhD thesis. Another great sign. Because, I mean, you you start having those people who are who have different agendas and different motivations.
01:07:34
Speaker
We wanted to recreate the Linux community without the screening. that That's essentially our... With SQLite, because SQLite lacked that. And this is happening. This makes us so happy. But...
01:07:46
Speaker
You talk to some of those early people and what they say is that one of the things that made it easier for me to contribute is that when Packer started the project, ah and this is his merit.
01:07:57
Speaker
One of the things that he did is that he wrote a file called compact.md, it's a markdown file, with the entire compatibility matrix of SQLite. Every single program, every single part of the expressions, like the the selects, and search for the AST, abstract syntax tree, every single bytecode instruction is listed on that file.
01:08:21
Speaker
And then there is like... What it is, first of all, there's categories, by code, progress, functions, expressions. What is the expression or or what is the thing?
01:08:33
Speaker
Status, comments. And then in the beginning, it was all like status. No, no, no, no, no. no no no no no no like and And now if you count the number of things that are marked as yes, that's 70 to 80% of. stuff ah that's where you're getting your figure.
01:08:49
Speaker
Yeah. What does that mean for me, though? If I go and download Terso and I am used to SQLite, what am I seeing? what What makes it alpha? What am I missing? So what makes it alpha is, first of all, that we do believe that it needs more time.
01:09:04
Speaker
ah the alpha did not support indexes. So one of the things that we cook one of the things that we believe that, you know, for for this to be ready is is that we need indexes working and and we need some other features. I think the main features that we're missing now are materialized, their views, triggers, and vacuum, right? So so those are those are a couple of things that are missing. we We're going to go into beta with that.
01:09:31
Speaker
The main reason I think we have this as alpha is that, again, as I said, the simulator is great. The simulator helps us find all sorts of things. But we know that our simulator is not perfect.
01:09:45
Speaker
So we know that there will be some problems that are there. The simulator just hasn't found it yet. And so we believe that, you know, in the spirit of we're not going to put slop out there.
01:09:58
Speaker
It's just time. part Part of it is like it needs time. But we are confident that we can actually get to the point in which to say, hey, this is production ready within the year. So if you count from December to December, we think we're going to have the rewrite of SQLite missing a couple of things here and there, but mostly usable by most people within the year. okay There are two ways to consume Tursuit. One of them is through the embedded drivers, just like SQLite. You're using your programming language of choice, Rust, Go, TypeScript, Python. Python is one of the communities that are embracing Tursuit the most, as as as we see from the statistics on downloads. So you you go on on our TypeScript, just you do like npm install Tursuit.
01:10:39
Speaker
you're going to have Tursol driver, and then you can create a database, do selects on a file, all of that. SQLite is that, and SQLite also comes with a shell, which is how a lot of people interact SQLite. Tursol also has the Tursol shell. So you download Tursol, you point to a SQLite file or to a new memory file or to a new file, and then you can use ah you're going to see a message saying this is alpha software, but but by and large, it's there.
01:11:05
Speaker
And eighty let me just clarify, because it seems like indexes, that's a big missing feature. It was obviously more, but it was missing from the alpha. So it's going to go into the beta?
01:11:16
Speaker
It's already done. yeah it was ah but by by the time By the time we released the alpha, we had indexes already. They were just full of problems. And and we we decided our goal is to go straight to beta.
01:11:30
Speaker
But and indexes were still one of the features that were problematic the most. So we said, let's disable and make it alpha, which we don't regret because there is also this component of, you know we yes, we trust the simulator, but it needs time. There's some level of trust that only will come with time.
01:11:50
Speaker
Yeah, yeah, that makes sense. youve got to bed that in. But, okay, so...
01:11:59
Speaker
The current status, what you think you have is something that's very close to the present status SQLite, but needs time to prove itself. Is that what you're saying? ah it
01:12:12
Speaker
it's It's one way to look into that. but And if if we're going to be precise, first of all, it already has things the SQLite doesn't have. So examples, Vector Search and CGC.
01:12:23
Speaker
so the The main features that we have already already that we're going to be in the beta, the SQLite doesn't have. Vector search, so you can do vector search in one tour.
01:12:35
Speaker
CGC, so you can see the stream of changes. So, Tursua automatically will keep a table with the changes made to your tables with index. So, you can you can consume the changes since the last time you look into the stream of changes. Okay, so you can do like a real-time stream of what's new. Yeah.
01:12:52
Speaker
That's nice. More or less, yeah. so so So, you see, like it's It's it's it's mapped as, because SQLite is not a server, right, that you can just so put a list in the process, it is mapped as a SQLite table, and you have to go and read it, but it gives you an index since the last reads. You say, oh, le let me read what's in this table, and then you know what the index in this table is.
01:13:16
Speaker
Next time you read from that index, and then you see only the changes from the last time you read. So there's some, because of the nature of SQLite, it's a little bit different than a CDC on on a server-based database, ah but it's there. Out of curiosity, is that transactional? Can I say, okay, I will commit how far I've read through?
01:13:36
Speaker
yeah Yeah, it's all transactional. It's all transactional. We also already have it in the beta. the ability It's already there. The ability to run this in the browser. Because SQLite, some people manage to run this in the browser, but it's very hard because the browser is an asynchronous environment and SQLite is fully synchronous. The one thing that we changed from the beginning is that Torso is an asynchronous process, so we can run in all of those ah complex environments, including the browser. So Tursor runs natively in the browser.
01:14:06
Speaker
And because of the wall changes that we made, again, we we made them in the fork, but we we're bringing them in slowly Tursor. So you can add a foreign page from a different wall into your wall.
01:14:19
Speaker
So what happens is that you can now do syncing. So you can have a database that gets synced to a cloud-based database, which is the nature of our business, right? and but but But in Torso, it's even more powerful because you don't even need the server. I mean, the server will obviously help, but you can manually you can manually extract the stream of changes from one database and copy and and push it to another database manually.
01:14:43
Speaker
and And then you see you see essentially those databases syncing between each other in the browser. So we have that already in Tursuit. So Tursuit already does some things that SQLite doesn't do.
01:14:55
Speaker
right And and and i want I think this is important to mention because it's not at it's not just a subset, it's more of a disjoint set at the moment. Tursuit will become a superset of what SQLite is.
01:15:08
Speaker
But because there are things of SQLite that we don't implement yet, you have this disjoint set in which SQLite has some things that we don't do. ah Torso has some things that SQLite doesn't do. But our plan by the end of the year is to turn that into a superset. We will do everything that SQLite does plus extra features and time and the simulator and the reliability.
01:15:29
Speaker
so Let me see if I can guess how some of this works. If you've cracked open the write-ahead log, it becomes a lot easier to bring other systems' new changes in. Yeah, the the the changes the the the way the changes work today is that it it uses a combination of the write-ahead log and the CGC table.
01:15:48
Speaker
So it's already happening that we built the feature and then we built another feature on top of the initial feature that we built, which is another great sign of of you know a healthy collection of abstractions and in an open source project.
01:16:02
Speaker
Because what the CGC table allows it to do is have a logical collection of changes that you can use to do conflict resolution. Because imagine you are writing to a database here and then you're now importing the wall from another database. yeah So what you do is that you import the wall But if there are conflicts, you can use the CDC table to essentially resolve those conflicts.
01:16:28
Speaker
so So you use those two things together, like the ability to to ingest a foreign wall and the CDC table to have a syncing mechanism that syncs this straight into your browser.
01:16:40
Speaker
Not only in the browser, but the browser is one of the environments that a lot of people want to see this right Do you end up in that situation with some kind of merge conflict process? Or are you just saying timestamps and the last writer wins?
01:16:52
Speaker
Yes. so and and And again, what what you have now in Thursault is very simple. Over time, you want to be more complex. What you have today is essentially that if you can't merge, you can't merge. If you wrote to different rows, if you wrote to different columns, it's fine. You can merge those. If you wrote to the same thing, you just error out and you cannot import that.
01:17:15
Speaker
new information but over time but over time you can our vision is they're going to be able to specify conflict resolution strategies last writer wins and this and that and and or or just like call a function uh that you provide to how how do you deal with with with those things uh we are working we so we have all of these features we are working now uh in incremental uh computation for views which which is a very interesting feature. So the idea there is that if you have a view, like or a materialized view, yeah um that usually usually databases have to recompute. as When something changes on the base table, you have to recompute the view.
01:17:55
Speaker
Yes. more More recently, maybe a couple of years from now, there is new research in the industry in something called incremental view management.
01:18:06
Speaker
and And I think the M's for management, I might be mistaken on that, but the... Incremental view management is something that pairs views with streaming systems like Kafka, Red Panda, and and others, so that to recompute a view, you only need to look at the the new changes. You don't have to look at the whole history of changes. So incrementing a view becomes incredibly efficient and you can have real-time querying on streams.

Incremental View Management and Testing Innovations

01:18:33
Speaker
So this is something that we're adding to Torso as well. Work has already started on encryption. So we're not taking the approach that first we have to achieve feature parity with SQLite and then we we we go into new features. We're kind of doing this in parallel. We're doing this like in the sense of, hey, this is a good feature to have. Let's let's do it.
01:18:53
Speaker
Right. And over time, I think naturally, and and I think by the end of this year, beginning of next year, we will have a super set. But we're working on features that we believe make sense. Some of those features SQLite has, some of those features SQLite doesn't have. The majority of them SQLite have, like like indexes.
01:19:11
Speaker
But we're peppering already from the beginning ah things that are, and and know and and and the most important feature in the beta will be concurrent runs. Which is the thing that everybody wants. How hard was that to add in? Because that does seem like something you've got to architect from very beginning. Concurrent Rises, first of all, it's still being added. It's in the process and it's very hard. It's one of the hardest things to do in the current batch of features that we're doing. It's one of the most... Bekka is working on it himself. It's just...
01:19:46
Speaker
How are you that confident then that you'll get it done in what looks like four months to me? big Because I know Pekka for 15 years. so and And it might not be done in the sense that the feature is like fully done. and you know we We might release it saying this is experimental. There will be something. right the The beta will not yet be production ready. It's still a beta. So we also have this, first of all, as I said, I know Pekka for a long time and and I know what he's capable of. It's also not in the early days, so i I'm saying it's it's in development, it's hard, but but it's more towards done than than not done. and And also we have the flexibility of the beta. right It doesn't have to be rock solid for us to release it.
01:20:27
Speaker
we'll We'll still release the beta with the warning that this is the software is now ready for production. and we'll still and we also have and And I think most importantly, because you have the simulator. The simulator is catching a lot of those bugs now.
01:20:42
Speaker
concurrent writes is one of those things is in which you have to improve the simulator yeah to catch it because you're similar the that we have now doesn't generate writes concurrently because SQLite doesn't support it. right So now yeah now it's one example in which OK, so you support concurrent rights. Now you have to improve the simulator to generate those combinations that were not there before.
01:21:05
Speaker
So there will be bugs, there will be issues. But, you know, just I think it will be good enough for for people to try in not non-production systems. Do you not run into the problem that the simulator has to be almost an operating system?
01:21:19
Speaker
ah Yes, which is why we use two simulators. One one of them one of them is our simulator that we wrote. And the other the other one is a tool called Antithesis. Antithesis is essentially what they did, and it's fantastic.
01:21:37
Speaker
They wrote a they wrote a hypervisor. which is essentially the hyper... I love that because it it aligns with my background so well. But a hypervisor is essentially like a virtual machine manager that runs operating systems on top of it and and and and virtualizes the hardware, virtualizes the all the abstractions that that the the operating system needs. So Antithesis built a deterministic hypervisor so that if you run the process in this hypervisor two times, the results are always going to be the same.
01:22:08
Speaker
right So every time you and ah because of a seedable operating system. Pretty pretty much. And but it's one of the most interesting is interesting companies, I think, in in in the industry today.
01:22:20
Speaker
yeah I'm going to put a bookmark into that because ah I'm actually recording a podcast with the Antithesis tomorrow. Will? Yeah, with Will. Well, send in my best and you can you can freely say that I said openly that his company is one of the most interesting companies.
01:22:36
Speaker
and the only And the only reason like I can't say the most interesting company is because there is there's our company. Oh, course. And there's also Tiger Beetle, which I think is ah it's it's fantastic but as well. But those are all my by you know my favorite companies in terms of the amount of engineering that that goes into those things. But Antithesis is great. it's It's fantastic. It's amazing. And what we do is that we run ours.
01:22:59
Speaker
the The way I like to explain that is that our simulator plays the role of unit tests in a project. you you You write some code, you run the simulator with the with the difference that this is going to run for four. We we run it for four hours every day. yeah and and And it's very fast. i mean, you could change something, you you run that that, you know, in four hours, you run 10 years of simulations and and the iterations are very fast.
01:23:28
Speaker
Yeah. and but But it's very focused and and you have to write the the simulator itself. Antithesis plays the role of integration tests. So it's something that you run. So what we do is that once a week, then we run, a ah we run Antithesis for something like a week.
01:23:44
Speaker
I don't know the exact number, but it runs for a lot longer. It doesn't depend on us writing the integration tests. like ah it' so So it's like ah the unit tests in this analogy. And it finds bugs even in our simulator. So it's those two layers of simulation. So first we run our simulator.
01:24:03
Speaker
And then once a week with the code that that was recently merged, you run the antithesis batch of of testing. And then and then whatever whatever escapes our simulator antithesis usually finds.
01:24:15
Speaker
But you still need the workloads. I mean, you still need to, like, so for example, generating workloads with concurrent rights is an example of something that needs to be done for entities and it needs to be done for our simulator.
01:24:25
Speaker
Because otherwise, the system will not generate them. It's just not ah it's just not part of what what the system specification does. much this, because this seems huge, right? it almost seems like you're saying we're building one and a half operating systems plus a database from scratch.
01:24:40
Speaker
How much is this is being done? Sorry? yeah But it's simple operation. There are only few people in the world that can say it's just a simple operating system. right boy i mean you you can always You can always say that in comparison. right just yeah but um I don't mean to say that it's easy. you know um i mean in In terms of scope and size, you've compared this with with Linux. iss It's a simple one.
01:25:01
Speaker
It's small and the surface area is small. It doesn't take that much time to get it. Okay. But how much of this is happening in-house versus the community that you're building? So today, obviously the main contributors to tour, so just by virtue of like we have. So I'm a firm believer in our dream, as I said, is to recreate the Linux community minus the brutality. And I think we're on track to do this.
01:25:27
Speaker
And one of the things that I always understood is for you to have a healthy community that is self-sustainable. You have to have people with different motivations that come to your community. That has to happen. And again, Linux famously had many, many, many businesses.
01:25:42
Speaker
Some of them like Red Hat selling Linux directly. Some of them like Google who are not reselling Linux, but they just need Linux to work really, really well on their hardware. should run their system, right? So you started, IBM, which was selling hardware, and then they know people gonna run Linux on this hardware. So they wanna sure that Linux
01:26:05
Speaker
Those are the companies that IBM is not it' not even that big of a name anymore. But those are the companies we remember were in the early days of Linux, like IBM, Google, Red Hat, Novel. And so we this is this is starting to happen. So most people most people that we have today contributing to Terso are obviously our employees.
01:26:29
Speaker
and people who work with us. Now, we also hired some of

Inclusive Community and Diverse Contributions

01:26:34
Speaker
them. So when when when we saw like more of those people distinguishing themselves, and and it's in our best interest to to to get this done sooner rather than than later, say, hey, look, ah we have an open position. We have the budget.
01:26:47
Speaker
You want to come work with us? That's fine. And in in fact, there's this amazing story of this person who is currently incarcerated. He's not an ex-husband. ah prisoner. He is in prison and he is now our employee. We hired him. It's one of the most... I'm giving interviews about this left and right. oh ah' I'll point the interested parties at those interviews so we don't spend too much time on that, but the fascinating.
01:27:12
Speaker
But one of the contributors, there are two contributors, and I want to of course, be fair to all the others, which are very welcome and and and we love them dearly. So we have a lot of people who are coming and they're coming back and and they keep contributing. But their choose there are two that will highlight only because I'm speaking about this narrow view of like, hey, for this community to be self-sustainable, this needs to be like, it needs to have this like thing about i'm I'm deriving value from this project there.
01:27:39
Speaker
Real value. One of them is is ah is a Turkish man. We call him the Turkish delight because he's a very sweet guy. Alp is researcher. He's a Turkish guy, but he he he is a researcher in the University of Maryland in the United States. And his research is in deterministic simulation testing. So when he saw through so, so look, this is this is my my research.
01:28:02
Speaker
ah He poured a tremendous amount of work in making our simulator better because again, he is using that in his PhD thesis. So he has the self-interest in doing that. right um And the other the other contributor is a Indian man by the name of Krishna.
01:28:18
Speaker
And Krishna just recently found a job ah largely due to his contributions to Tursault. This is, know, he telling me and, you know, and yeah not not just me making this up. mean he He told me like it's mostly because of the work I've been doing in Torso.
01:28:35
Speaker
And it's very likely that in his new job, they will use Torso as part of the solution, in which case he can keep working on on on that as part of of his main job. okay So this is the status of our community. I mean, six months in, I think it's a pretty good result. lin Linux, of course, got to the point of being Linux after after many years.
01:28:54
Speaker
I'm certain it didn't get to that point within six months, right? Yeah, this is, yeah. I am contributor number, if if you go to to GitHub, GitHub shows you a graph of like the top contributors. GitHub is amazing because it has all of those informations.
01:29:10
Speaker
I think I'm number 14 or something like that. I'm not even in the top. ah First of all, because I'm managing the company. I don't have that much time. But I still write it because it's fun and it's nice and I want to help the team. with the There are 14 people ahead of me. and My company doesn't have 14 people. We have 10 people at the moment, right? So those others are are other people who who come through through their own agendas, essentially.
01:29:34
Speaker
Okay, so let's pull out and um look at where you think it's going to be like a year from now. right Presumably you think you'll have got, I assume you'll have got to your first released version.
01:29:47
Speaker
Oh yeah, yeah. The first release that will be not beta etc. not beta off etc Now, I want to do it December because then it will be like...
01:29:58
Speaker
The anniversary is the same day. Packer thinks it's crazy and it's not going to happen. tonight just But but if it doesn't if it doesn't happen in December, then it's going to be early next year, like January, February or something like that.
01:30:10
Speaker
But where do you think it's going to be positioned against? It seems to me there are three players in this market. There's SQLite established. There's you going for the new SQLite.
01:30:22
Speaker
um And there's also DuckDB, which you must have considered as production-ready but analytics only. Yes. So I love DuckDB. In fact, the the CEO of Mother Duck is one of the investors in our company. Oh, really? Yeah. and and And I wish them all the best. I wish them a lot of success. And and I think what they're doing is fantastic.
01:30:44
Speaker
I have the thesis, though. I have the thesis, though. And again... You don't change the past and then things happen the way they happen. And this thesis is unfalsifiable and unprovable because I'll never be able to to prove it. But here you go. I believe that if SQLite had an open contribution model, DuckTB would not exist because their first intuition would have been, let's extend SQLite to be able to run analytical queries.
01:31:10
Speaker
when you When you look at Postgres, for example, Postgres has a large number of extensions that change Postgres into a time series database, which is one of the keys to the success of Postgres, by the way. You can't you can run Postgres as an analytical database. You can run Postgres as a time series database with extensions. And those extensions come... Now, there are many ways to to extend the database. You can extend the database by making the database natively be able to do analytics, or you can have a very robust extension system, which is what Postgres does. But one way or another, you create this large ecosystem.
01:31:45
Speaker
SQLite has neither. It doesn't have a extension system. it has It has one extension system, but it's very hard. It doesn't allow you to register indexes. It doesn't allow you register new types. It doesn't allow you to do... any And you have to write the the extensions in C.
01:32:02
Speaker
it's it its it doesn't have any isolation. So if you write an extension that's bad, it's going to crash the whole thing. yeah right so So there are many ways to extend a database.
01:32:13
Speaker
It doesn't have to be just make the database natively do this. It can be, here's an extension system. We might go one way or another. We want, of course, Tursuit to be extensible.
01:32:24
Speaker
But my thesis is that if SQLite had a way to be extended either through contributions to the code base in a healthy community or a very robust and and reliable extension system that DB will not even have existed. You just go in, just like Postgres, like people keep for making post turning Postgres into those kinds of databases. Yeah, yeah, yeah. And and now now that they exist and they're great, they have huge momentum. It's it's fine. I think they will.
01:32:55
Speaker
It reduces our motivation to do any analytical work in SQLite because, you know, DUTTB, or Anturso more specifically, because DUTTB is there. But I think it will not they will not have existed in in in the world in which SQLite was open contribution.
01:33:11
Speaker
So you're saying that even though you're not going to go after that now, you can see a future in which to... maybe contributors to Tursor try and make it more DuckDB-ish.
01:33:22
Speaker
I don't necessarily see a future because because the future depends on the on the past, right? and And we live in a world in which that TV already happened. So that might, like, why would somebody contribute?
01:33:35
Speaker
This is a large piece of work. It's not a weekend project. attorney some quiets in short So for somebody to do this, they have to have the motivation. They have to have the vision and they have to be solving a real problem for a real business. And and therefore you do this.
01:33:51
Speaker
And because that DB exists, your motivation to do this is a lot less. And we might never see this this contribution coming in. But it becomes part of the realm of logical possibilities. Right. Okay. Yeah. yeah Okay. Let me throw in this feature then, which isn't a direct competitor to anything. But if I were to look at something like Terso, something like SQLite, I can see I would want vector indexes these days. The other kind of index I'd want to add is geospatial.
01:34:18
Speaker
And if I came along and said, hey, I want geospatial indexes in Terso, what would that look like today? And what do you hope it will look like two years from now? Yeah, so first of all, great example, because geospatial is also a feature that we hear a lot in our community.
01:34:36
Speaker
We don't have... Anybody that we know that is very actionable within our partners, because we do have partners that work with us, say, hey, and that's why, by the way, we don't do the strict subset of SQLite, get compatibility, and then more features, because we have people who are interested in running through certain production as soon as possible, and they want this disjoint set. They want, I want SQLite with this. Okay, but do you use...
01:35:02
Speaker
sort triggers No, I don't. So, so no, we're not going to do triggers. we're gonna we're gonna It's healthier to get those people to production faster, right? Because then then you have real people using this database yeah yeah in real life.
01:35:13
Speaker
and And if we don't do anything that SQLite doesn't do, then they're just going to keep using SQLite. But they want SQLite to do more, and and that's the more that that we're adding. Yeah.
01:35:26
Speaker
We don't have any of those big partners that want geospatial features, but we have a lot of people over the years that came to the Thursault Cloud asking for geospatial capabilities.
01:35:38
Speaker
um The SQLite has an extension called Spatialite to do yeah special essentially geospatial data.
01:35:48
Speaker
It is a terrible extension. It is a terrible extension. And I say that I have never used this extension. And I can say it's terrible because I tried to use it.
01:35:59
Speaker
and and I couldn't ah ah because it didn't this the extension is is not actively maintained. it doesn't even You have to compile it from sources. I tried a couple of times and I failed.
01:36:10
Speaker
I think let alone a random person, it will be even harder. Even if I were to compile, because you know I found some binaries here and there, like was like 30 megabyte binaries that you're not kind of you not just going to add to to the core of the database.
01:36:24
Speaker
So I think there I would love to see ah somebody, if if somebody wants to get on on our good side in our community, geospatial come up with a geospatial extension. I would love to see a geospatial extension done right, integrated with the database, exposing geospatial types. I think it's going to be a fantastic... It's not a weekend project, but I think for somebody who wants to do this, it will be it will be a very, very nice project to do. And what would happen is that like ah it would it would still be something that is optional.
01:36:57
Speaker
So it's... Right now, we still... ah We still do extensions in a very similar way just to SQLite because again, we did not have time to to extend how that works, ah but we don't do loadable extensions. So what happens is that the the extension lives in the source tree of Torso, and can disable and enable when when you when you create the database, right but but but that's it. So we would love to see this done this way. I mean, you you come, you contribute the geospatial code to our repository, and Torso databases now can do a geospatial data.
01:37:29
Speaker
Is it something, i mean, is there support for different, there must be support for more than one kind of index because you've already got vector indexes. Well, so so this this is where, this this is one of the reasons that geospatial index will not be a weekend project because conceivably it could be if if that if if we were much further along on on the road. ah in Vector indexes was one of the things that led PECA to experiment with with the rewrite, which again, as as we said, was very successful because adding the vector indexes to our fork of SQLite was very, very, very, very hard.
01:38:06
Speaker
And then what we ended up having to do is that we have we have to, when you query a When you query vectors on our fork, you query them naturally, which is what we wanted to do. You can write a query in which you mix and match normal columns with vector columns, and and and then you use normal SQLite syntax to limit 10, and then you only see that the 10 top results, right ah and and and then you put a where clause. You first filter by date, which is a normal column, and then you only search the vectors that were in that date. certain So it becomes a part of the language.
01:38:47
Speaker
But with indexes, it It's not really. with With indexes, you have to query an auxiliary table and then you join. The index is a separate table, then you have to join with the normal table. It's very cumbersome.
01:39:00
Speaker
and We did that exactly because by adding the ability of having just transparent types of index, it was a very core change. it was very hard to do. Yeah, imagine. Our vector search in Tursil doesn't support indexes that well yet to just have the brute force search, which is good for small datasets. But but we don't we don't do vector indexes yet.
01:39:22
Speaker
And when we do it, after we do it, then then making a geospatial extension will become easier because you will already have some kind of support for... Now, the second is all it's still harder than the third because on the second, the the the first implementation, we'll still have a lot of kinks for...
01:39:39
Speaker
that you didn't even notice, but they ah they've become just the way vector indexes work and and they're not that general. But then the second generalizes the implementation and then the third and the fourth and then becomes trivial to add the new index. Yes. Adding index type number two is always going to be harder than three to It's just like having kids. A month and a half ago, we had our fourth of kid. So having having the first having the first kid is very hard. Having a second kid is still quite, I mean, you you have some experience, but but it's still harder. The fourth kid is like, yeah, whatever, man, just ah he one more baby. I reckon the moment they outnumber the parents, the normal bets off. But what happens now is that my oldest kid helps out with with the youngest. right So you start to have this power, like they're empowering numbers. And and and The example, of the way it translates to the index example is that it generalizes better and better. So adding a new index will just become calling a function with with some definition, just at at some point.
01:40:37
Speaker
Okay. I again see a parallel between your desire to build large communities. It sounds like you're doing the hard way there. like Yeah, I'm doing i'm doing i'm doing the SQLite way, ah serial essentially a single rider. Elon Musk is doing the concurrent rider way. we're both trying to repopulate the planet. Yeah.
01:40:56
Speaker
Okay, quickly pulling away from that difficult topic. um So let's let me ask you another competitor question, because it seems like You're already kind of nailing SQL. You're already going after SQLite in the small, SQLite in the browser, the embedded story.
01:41:18
Speaker
But are you also, with concurrent rights, trying to get into a larger canvas? Do you think one day you'll be comp competing with Postgres? ah to To some extent. and in the same In the same way that Postgres computes with with MySQL, because they're both a database, and and both of those systems, they they at the same time compete, and then we don't, because they end up being...
01:41:38
Speaker
They end up being good. And this is just the the nature of how things evolve. They end up being good at different things. So when you look at Postgres, they look at MySQL, they have a large community of followers on both sides.
01:41:52
Speaker
And the Postgres people say MySQL sucks. And the MySQL people say Postgres sucks. And when you go look at those people, they're both right. It just so happens that they have very different workloads that they used to do. you know they They have very different things. that It's all a database, but they're things. righteous In the early days, of and you can have one of them grabbing all the momentum, but if you look at the early days of Linux, for example, a lot of the early routers and and things like that would run FreeBSD.
01:42:27
Speaker
because a lot of people just thought that it was better. And and in in all fairness, it like it was, I mean, the FreeBSD network stack was much better than the linux network stack for for a certain period of time. yeah Linux got a huge community, and at some point you cannot beat that momentum. And and ah with that momentum, you end up becoming better in in in many things, right? And I think the Postgres community has that over MySQL. MySQL doesn't have such large community. postgreds So if i if I were to bet between those two, looking 10 years in the future, I would bet on Postgres just eating MySQL completely.
01:43:01
Speaker
ah But but in but in the but in in this point in time, there are certain classes of problems that MySQL is just better at. There are certain classes of problems that Postgres seems to be better at. ah Both of those databases do compete with SQLite because a lot of people use SQLite for normal web workloads. This is why, by the way, we came up with the idea of Tursol, which is now the Tursol Cloud, right? The Tursol Cloud.
01:43:32
Speaker
back then, just named Durso, was a way for you to run SQLite for web server workloads. So you are on Cloudflare workers, you call into our server, and then you use SQLite.

SQLite's Popularity and Limitations

01:43:44
Speaker
ah So there are many people who reach out to SQLite for that, and SQLite beats everybody else out of the water in terms of simplicity. The problem with SQLite is now the default database, in as far as I know, in in the Laravel.
01:43:58
Speaker
ah framework and in the Ruby on Rails framework. So it is a database that is fairly used in the web. The problem with SQLite is that it's a great database to get started. The people who get started with it already know that the day they scale, they're going to have to replace this component. It's just a problem for my future self. So some people don't even start with SQLite because they know they're going to have to replace it.
01:44:22
Speaker
Some people start with SQLite knowing that they're going to have to replace it. Our goal is to get this, I have to replace it out of the equation. MySQL and Postgres will keep being better. There is a world in here in which you can imagine MySQL, Postgres, and SQLite, Torso, more specifically. All...
01:44:45
Speaker
fighting for parts of the server web market, and they will naturally become better and specialize a certain kinds of of workloads. ah So this is a possible, it's a possible world. It's it's also, in all fairness, possible that Postgres just eats the whole thing.
01:45:01
Speaker
I think it's almost impossible that MySQL eats the whole thing, but it might be possible for Postgres. And again, because of the size of the community. But I think there's a lot of server workloads that that we're going to do very well.
01:45:13
Speaker
Yeah, yeah. at At some point, it becomes less about the technical details of the database and more about the design space you're trying to go after. like what What kind of things, is what kinds of use cases that you want to do really, really well.
01:45:28
Speaker
Exactly. Yeah, exactly. And and then and and then yeah and this is this is just natural, I think, the way things evolve. Once more, the example I like, it might not be an example that resonates with an audience ah younger than 35.
01:45:42
Speaker
but the the the the Linux and FreeBSD. And when I started using Linux, there was Linux, FreeBSD, NetBSD, and obviously the proprietary Unisys. And and they were all they all had their niches. And if it wasn't for the fact that Linux had this tremendously big community, this would be the situation that we're still at today. They all have those niches that they specialize in.

Inspiration from Linux Community Growth

01:46:05
Speaker
And what happened with Linux is that you cannot compete with a community that has 10,000 active contributors. you just can't. You just give enough time and this will become better than than than than anything else. Yeah. Yeah. OK.
01:46:18
Speaker
Clearly, that's your central thesis for the whole project. So let me ask you the last big question about community then. How are you going to stop it from becoming somewhat toxic? ah the same way the Linux became toxic. So those it's all about the role model, right? So lin Linux is toxic because Linux Torvalds is toxic. I mean, there's there's a lot of people who are not, but like what what you clearly see is that those people see Success breeds copy, right? Then people see Linus, they see that the way he handles the community, they they see that it's working very well. Naturally, they they want to emulate that. and And I think the way you do this without becoming toxic, it's the same way. got People who emulate Pekka. And Pekka, I think, is a person that more people should emulate in in general, right? Yeah. So it's the role of the leader. It's it's just that I think Tourist is under very good leadership and that's how you you will avoid this. and And again, the people who get quote unquote promoted to maintainership, a lot of the role of the leader is the example, but it's also in picking the people, right? Okay, so you now for you're now going to be one of the people who are selected to be one of the maintainers of this thing.
01:47:37
Speaker
You have to be a good role. You have to have certain characteristics. You have to act a certain way. So so it's just leadership is is the way leadership works. i You're making me think of our very first guest, Louis Pilfold, who I'm sure is said to me in the past, show me the CEO and I'll show you the company.
01:47:56
Speaker
And he does a very good job of leading by example. And yeah, well, in that case, I'll have to wish you both luck in that you're growing Rust community and you're growing SQL-like competitor.
01:48:08
Speaker
Not only becomes a good set of features, but a good community to be in. there' there's nothing we want more than that. And and again, we're pouring our community is a very intentional endeavor. It is not ah a second thought. We think we think a lot of, ah it is true that a lot of open source projects these days, especially coming from companies,
01:48:29
Speaker
They're just a growth hack for for the company to to to grow faster or to eat get people at ease that that if the company falls, you can have your data, yada, yada, yada. For us, it was never about that. mean, for us, it's really us trying to recreate the good parts of what we saw in the Linux community.
01:48:50
Speaker
SQLite was the perfect match for that because it had no developer community. and And we tried to do it once. didn't have succeeded in some things, but didn't succeed in the community building. The rewrite is just going fantastically. I think that we we keep another reason, by the way, that I want to mention that is becoming, that this one is interesting because it's it becomes less relevant over time.
01:49:13
Speaker
But one of the reasons a lot of people came to Tourist in the beginning is that they said, I don't know if you can hear my baby or not. yeah I can hear part of your community in the background calling for tech support. Great timing.
01:49:27
Speaker
But the we had a lot of people saying, I wanted to learn databases and I wanted to contribute to a database. And Thruso is just so early.
01:49:39
Speaker
that i couldcon you know I could essentially come and write the query plan of a working real-life database. Now, this reason, I think it becomes less relevant over time because now Tursuit is getting readier and and and readier. But we still have people showing up almost every day. We still have people that come with with small features, and some of some of those features grow over time. They keep contributing. Some of them come and and and go away. that That's this normal, too.
01:50:04
Speaker
it's It's a power law. most Most things in the world end up being a power law. Yeah, yeah. And and we we're so excited to see the growth of the community. We run an interview on our on our blog with our commute with with our community contributor number 128. power choice. It's a beautiful number.
01:50:21
Speaker
I want to run another one with our contributor 256. And we think it's going to be sooner. we might might be sooner than than we realize.
01:50:32
Speaker
I hope it keeps accelerating. Glauber, thank you very much for taking me through it. And I'll leave you to go for the more important tech support call. There you go. I will. Cheers. Bye for now.
01:50:43
Speaker
Thank you, Glauber. As I said at the start, Terso is now in beta, so do give it a try. And if you do give it a try, do send them some feedback. Links in the show notes as always.
01:50:54
Speaker
Before you go, if you've enjoyed this episode, please do take a moment to like it, rate it, share it with a friend, share it on social media, and make sure you're subscribed because we'll be back soon with actually with another database episode of a very different kind, I believe. Until then, I've been your host, Chris Jenkins. This has been Developer Voices with Glauber Costa.
01:51:15
Speaker
Thanks for listening.