Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
#73 - Paula Gearon image

#73 - Paula Gearon

defn
Avatar
23 Plays3 years ago
In this epi(c)sode we had a blast chatting with Paula, graph database and Clojure expert, and creator Asami https://github.com/threatgrid/asami Check it out!
Transcript

Introduction and Episode Overview

00:00:16
Speaker
Yes. Episode number 73. So yeah, as we've been chatting about all the fancy stuff regarding triathlon and everything. Let's get started a little bit about closure and then let's get back to triathlon thing again. What he really means is let's start about closure and get back to Emacs. Exactly. Well, I don't know enough about closure and I know even less about Emacs.
00:00:43
Speaker
That's good. That is good. That is good. Yeah. That's how we get started

Journey into Clojure

00:00:47
Speaker
then. Yeah. So I think the biggest question, not biggest, I think the first question I think I'm interested in is like, when did you start with Clojure? Because you've been writing software for ages now, right? Yes. I started in 2010.
00:01:06
Speaker
Enjoy. Okay. So almost nearing 21 years, sorry, 11 years. Yeah. The pandemic has changed my- It's a dead scientist, you know.
00:01:22
Speaker
I need to run through my ML program to realize what... Never let any of your employers listen to this. I mean, nobody listens to this. That's okay. Oh, you'd be surprised. I mentioned at work that I was going to be on a podcast and people said, oh, which one? I said Dethan and eyebrows went up and oh, that's really great. Cool. We bribed them.
00:01:52
Speaker
to fake the enthusiasm. But yeah, so before coming to closure, so what were you working on? Where did you start in the software thing? Oh, started. I was writing C when I first started. Yay.
00:02:10
Speaker
Closure always begins with C. And then move to L. Well, I was doing a little bit of a simpler, but it was mostly C and I was learning to write Windows programs at the time. And then around about that time, the Win32S
00:02:33
Speaker
framework came into Windows 3.11 for work groups. So I was learning the 32-bit system as well, but also learning about undocumented DOS because there were still a lot of things you couldn't do in Windows without talking to DOS underneath, but DOS didn't actually do these things for you, or it did, but they didn't document it. However, lots of people have figured this stuff out and written books on it, like Andrew Shulman's books. So I was doing a lot of that to start with.
00:03:02
Speaker
Can I just ask you a quick question, Paula, before we go there? It's like, I mean, 2010 seems late to be starting with C. I mean, you know, I started in 1943, you know, and that, you know, eventually got into C. 2010 was when I started with closure.
00:03:18
Speaker
Ah, right. Okay. Okay. Okay. Cool. Yeah. Pay attention. Pay attention that Kombucha is getting to you already. Right. Okay. Cool. Right now I'm with you. Yeah. I mean, Windows for Workgroups wasn't around in 2020. I know. I was thinking, what the fuck?
00:03:37
Speaker
some legacy systems. Why is she starting like this? Okay, sorry. When did I start? Okay, good. Yeah, I think it was probably me misunderstanding. No, no, it's fine. Definitely me. Miss hearing. It's it's right. Anyway, come on. Let's do it. Right. See you winners for work groups. So Ms. Doss.
00:04:02
Speaker
But then did you continue doing Windows stuff for a long time? Or did you move into Java and then slowly other languages as well? Yeah, well, I mean, let's think.
00:04:17
Speaker
I went through C, mostly C for several years, but I moved off Windows and into Solaris and Digital Unix, and then back to Windows doing other languages like Delphi.
00:04:36
Speaker
I spend a bit of time on VBA. I spend some time with MFC and a lot of C++ and C++ on QNX.
00:04:53
Speaker
Ooh, wow. The RTOS thing. Yeah, yeah. That was for the rail system in Melbourne. All the passenger information displays. I was the team lead on a project for updating all the displays on the Melbourne platforms.

From C to Java and Semantic Web

00:05:12
Speaker
And that was all done in C++ on QNX. But then around then, I moved into Linux and Java.
00:05:22
Speaker
And I was there for about, I was solidly on Java for about eight years. But in the meantime, I learned, some friends pointed me to SICP. And through that, I became enamored with Scheme, all that approach to programming. And so every language which had Lambda in it suddenly became more attractive to me.
00:05:47
Speaker
And I'd look at Ruby and think, yeah, but no. And I looked at Scala and Scala got me a lot further and I love the immutable data structures in it. And I was really getting much more into Scala when the nonprofit I was working for said that, you know, we'd had the global financial collapse and they were going to focus on their core technology, which is not what I was working on. Could I find something else?
00:06:15
Speaker
And I went to a company that was doing closure development. Okay. So your path to closure was from SSEP and lispiness and then that led you to closure or is it something? Well, sort of. I mean, I was very enthusiastic about getting into closure, but I was a semantic web person. Right, right.
00:06:39
Speaker
And that's what brought me to the US in the first place. And that's what I was doing. And then the company I moved to in 2010 was doing semantic web development using Closure. Isn't it something that Alex was working on as well?
00:06:58
Speaker
It was the same company that Alex was working on. Ah, okay, okay. Because I remember going to his talk in Belgium. This was way before Closure was a bit popular. He was then not working with Cognite, but for that company, I think. Let's talk about all the... Leveletics. Yeah, exactly. Leveletics, yes. Okay, nice.
00:07:16
Speaker
So I interviewed there and they said do you know closure and I knew it was a lisp and on the JBM and I said no but I think I could learn it and they said yeah buy a book so I did and by the time I started I'd read the book. Nice so for for people like me who are not really that well versed with all this fancy text stuff so what is semantic web?
00:07:41
Speaker
It's a set of standards from the W3C around data linking and communication. From a simple perspective, the World Wide Web is a web of documents which link to one another, whereas the Semantic Web is a web of information which links to other pieces of information around the Semantic Web.
00:08:07
Speaker
And it's the standards for doing that. So one of the bases for those standards is the URI. And there's a whole lot of semantics built up around what these URIs are and how you should build them and things like that. But then the data model for connecting all of these things is a graph. And the standard for representing these graphs is RDF, resource description framework.
00:08:38
Speaker
And that allows you to describe properties and attributes of different objects in the system and how things connect to one another. Objects are all represented using URIs. Well, mostly.
00:08:58
Speaker
You describe these things in a graph. Your graphs, because you're using these standards, because you're using URIs, should naturally link in to graphs from other places. Where two groups have used URIs for the same semantic concept but have used different URIs for it, you can then create linkages between them.
00:09:18
Speaker
so that you can link up different documents, I mean, different genetic web documents, which say, well, this is that, and this is that, and this is related to this, and this is where linked data comes from.

Semantic Web Challenges and Critique

00:09:33
Speaker
The fact that there are all these standards is really useful because it means that if I want to take data from an Oracle database and move it over to a MySQL database, often the way that I've seen that done is it gets dumped out as a set of create tables and insert into, which is very crude.
00:10:02
Speaker
Whereas semantic web standards, using RDF, you dump it out in a standard format, which all semantic web systems should be able to move that data around and understand that data. What about the ole word, Paula? The scary ole word, ontology?
00:10:20
Speaker
That is a little like your schema over the top, but it's more powerful. And ontology just describes the structure with some concept of semantics on, well, this means that or it might be the addition of those things.
00:10:41
Speaker
Whereas the ontology is much more descriptive about, well, if you're one of these and you're one of these, you're naturally going to also be one of those unless you're one of these. And you can describe that to a nearly infinite extent. I mean, how far do you want to go? It's like trying to write spec for closure.
00:11:01
Speaker
you can describe your function in really minimal terms. You say, well, I know these properties about the function. I'll just describe more. I've written spec, which is bigger than the functions I was trying to describe.
00:11:17
Speaker
And you can do that with your ontologies as well. There's always a trade-off. But is it the Web 3.0 thingy, right? So we have this Web 2.0 with all the mispronounced or dropping the oval sort of companies. And then we have the Web 3.0, which is supposed to be everything changing into more
00:11:40
Speaker
semantic thingy because I remember. Well, data is the main thing, isn't it? It's meant to be data that's driving Web 3.0. That's the biggest thing. Because the problem is it's all documents at the moment, and that's the Web was all documents, but now it's meant to be like linked data, like Paul was saying, you know? Yeah. So more services can like understand what a particular page means, not just what documents it kind of has, you know, what just the HTML layout is.
00:12:08
Speaker
Exactly. Because I remember when I was doing, I think I was still blogging at some point and then there was like, you could put more details into your blog headers or meta descriptions saying, I am this and then my profession is this or I'm linked to this person or something like that. I mean, to me, the thing that's horrible about the current web is that we have all these people. I mean, there's a thing called SEO.
00:12:30
Speaker
which I think should be just taken out and shot in the back of the head. And that's really what LinkedIn and ontologies and sort of semantic web is meant to replace, you know, sorry, this is meant to replace SEO because you don't need SEO anymore. If you're actually clear and explicit about what this, what this website is meant to represent, then you can just be straightforward about it. And you don't have to do all this bullshit. That's my understanding, Paul. You're more of an expert than I am.
00:12:59
Speaker
I'm just, I'm just a sort box person. I've been peripheral to a lot of things. So I'm not going to call myself an expert, but I'm aware of a lot of stuff as opposed to being an expert on it. The problem with the semantic web has been that, I mean,
00:13:19
Speaker
Some of the things it's tried to do where it's created standards so that you've got this interoperability, you've got the portability of data. And they do try to be very complete about what the semantics of each element is. And that's led to a lot of debate as people have got into the corner cases and things like that. This has led to a complexity that people have avoided. They say, well, this is too hard. I don't want to do it. Or this seems like overkill. And so while it's a great idea,
00:13:50
Speaker
it hasn't really panned out. We do see it being used, but not as much as the vision had been. Instead, we see it being re-implemented without all of the complexity sometimes in small local areas, but it's missing things which make it portable with other applications. The semantic web, it's a nice ideal, but it just hasn't been
00:14:18
Speaker
working out in the way that a lot of people had hoped it would?

Semantic Web vs Reality

00:14:21
Speaker
I went to a conference about the Symantec Web a few years ago, and there were some people I was working with who were experts. They'd worked at Google and worked in other places that were still doing Web 3.0 stuff. There was a conference all about it.
00:14:39
Speaker
There was like a question at the end, which was really like, do we think because one of the problems with semantic webs to some extent is it's a bit of a top down activity in the sense that there are standards and there are sort of ontologies and all these kinds of things. And it's kind of like, well, in the sense it's top down. But what I mean is by that is that you have to have people organizing above corporations and above websites so you can have these standards. So that that's what I mean by top down in that sense, like any kind of like
00:15:08
Speaker
Standard or whatever is a sort of top-down thing and what people are saying was all the benefits that you could get from the semantic web are already available via Google or via Bing or whatever probably not being but you know you get what I mean because you can still search for things and kind of like you know your intuition about what a website is meant to represent.
00:15:29
Speaker
through SEO, et cetera, and through the Google's AI index, et cetera. Well, they're kind of making it work. But the problem, like you said, is it's just local to Google then. No one else can fucking do it. It's all siloed. And if you create data, you can't link it into what Google's doing. And you can only get so much out of what Google has. They'll let you see some of it with this API or whatever. And even if they do, then 12 months later, they're going to say, we don't offer that anymore.
00:15:57
Speaker
Yeah, it goes into Google graveyard. Yeah, so the technologies underlying all of this are being employed in so many different areas, but the standards are not.
00:16:12
Speaker
You know, so it's been a wonderful thing to be involved with because it gave me exposure to all of the technology and how these things work together. But the overall overarching dream of it all hasn't really been realized in the way that I had thought it might be.
00:16:31
Speaker
I mean, I hadn't, it wasn't my dream early on. I was just working in the area. Other people's idea and I was a developer doing it. But I, you know, I learned a lot in the process and this is stuff that I get to.
00:16:47
Speaker
you know, implement these days. I mean, when I use Datomic or Asami or anything like that, a lot of those techniques and technologies have all, you know, I can bring them forward in interesting ways to what I'm doing now.

Introduction to Asami Graph Database

00:17:03
Speaker
But re-implementing a lot of the, implementing a query engine for Asami
00:17:09
Speaker
I'm using a lot of the syntax for datomic, but I haven't actually gone with the semantics of datomic sometimes because I don't like them as much. And I've gone for sparkle instead, which is actually very similar in a lot of ways. So there's a few things like not, which is different. I think we need to, we need to introduce what, what a Sami is because you're, yeah.
00:17:35
Speaker
So Assami is a graph database, a little bit like Datomic, but like an RDF database, it is schema-less. Which is funny because I heard you talking about schema on another podcast at one point and like everything has schema or what does it mean to be schema-less? So before we get into Assami,
00:18:06
Speaker
So what do you think about the current state of graph databases? Because we have this Neo4j, because I've been playing a little bit with some obscure graph database called SAP HANA. That's one of the, something that I don't want to publicly admit.
00:18:25
Speaker
Yeah, as I said, nobody listened to this podcast, so I'm safe. Because it's all basically implemented at the end of the day, it's just basically two big SQL tables and then edges and vertices, and that's pretty much it.
00:18:41
Speaker
And we had this multiple graph databases come on Go, like we had this Titan DB thing, and then now JanusGraph, and Neo4j, and Dgraph most recently, which is in Go. So how do you see the graph database ecosystem, and what is the main use of it? Because they never seem to be like, you know, the mainstream databases.
00:19:06
Speaker
No, they don't. Personally, I like using them. It depends on what you build over the top of it. They're very, the idea of graph databases is that they represent data at a very simplistic level. Datomic, you know, publicize this idea of the datum, which is an entity has an attribute of this value.
00:19:38
Speaker
That's a very small unit of information. Graph databases are built on units of information like that. Now, some of them are a little bigger like Neo4j, I believe, which I haven't used very much. Neo4j, I think, has entities which have attributes on them and then connects entities to each other through these edges.
00:20:01
Speaker
other graph databases say, well, the attributes are basically edges to literal values. And so they don't necessarily distinguish between entity and connections from one entity to another. And that often comes down to an implementation detail. And it may be exposed in the query language or may not.
00:20:25
Speaker
So, various graph databases are taking different approaches to the same sort of thing, but you really do have a very elemental view of how data is structured, and that will be less efficient for some applications than, say, relational databases.
00:20:43
Speaker
or a document database. But it also tends to be extremely efficient for certain sorts of things. Graph traversals or just the fact that you often have a lot more data indexed
00:20:59
Speaker
that you can get to rapidly that you wouldn't necessarily be able to with relational databases. Like, oh, I'm missing this index. The query takes two minutes. But if I had the index, the query would take half a second. Whereas graph databases tend to be a lot faster on those sorts of things. But then when you say, OK, I can find the data quickly, but let's build up the documents or the structures out of that, that could take longer.
00:21:27
Speaker
It's a different way of viewing your data. And each one of them takes a different approach. And then you've got the whole scalability issues of like, if it's in the cloud or if it's local, if it's in the cloud and you're replicating, if I find this information in this part of the cloud, but the remainder of the information is over there, what's the best way to join that in a scalable way? And that's not easy.
00:21:53
Speaker
And I think there's been a lot of issues around doing that and not enough research into some of that. I would expect Amazon to be doing some of it. I applied for a job with Amazon, actually, just when I started at Riverletics. And I had been doing graph databases since 2000.
00:22:17
Speaker
And so, it was ten years later, and I'd been working on this graph database for like eight or nine of those years.
00:22:28
Speaker
I was being interviewed for the cloud team and I got an offer and then I got the Revlitic software and so I was balancing them. And they did say, look, we really want you to come in because we'd love for you to implement a graph database in AWS. And that was really tempting.
00:22:48
Speaker
And that's what ended up coming out some years later. So I was kind of curious as to how that worked out and who they got to do it. And what the infrastructure for it is. I don't know a lot of the details of Neptune, so I would expect it to be dealing with a lot of the questions that I have around.
00:23:08
Speaker
how to scale things. When you've got replication and you find the data, how do you then connect it to other data? Because the whole point of a graph database is the connectivity in the data. And if you split it up too much, then there are performance implications for that. Because one of the most obvious use cases to some extent these days is the social networks or
00:23:32
Speaker
networks in general of different things, networking, IoT, these kind of things. Given the fact that Facebook is a social network, they've got some graph. Used to be. Well, yeah.
00:23:49
Speaker
The anti-social network now. Yeah, exactly. Yeah. Because that's often the use case, isn't it? A friend of a friend of a friend, and how do you traverse this graph? And traversing one way is like one thing, but actually going back up is also tricky when you can go back up left and right. Whereas with relational databases, the navigation of those nodes is much more difficult in both directions or many directions.
00:24:19
Speaker
Graph database is a good for getting out that way. But again, this is where I'm saying if you've got linkages, part of the graph is stored here and part of the graph is stored there because of scalability, then how do you find those linkages?
00:24:35
Speaker
in an efficient way. Because you're going to have to ship one part of the data to a place where another part of the data is. Whether they both go to a mutual place in the middle or whether one travels to the other to be linked, you have to do it in some way like that.

Scaling and Development of Asami

00:24:54
Speaker
I've seen too many technologies which take the simple approach of either bringing it all into one place. Well, actually, usually they just bring it all into one place because you can get it up and running.
00:25:12
Speaker
That leads me to the thing of, so Assami, right? So how do you compare Assami with these other database, sorry, graph databases? So Assami is an easy graph database, right? I compare it very poorly.
00:25:32
Speaker
Well, it's fast. A lot of what we do is in memory, and so we haven't had the scalability issues there. It's very fast and effective, and it's been a really good way for doing things locally. Getting things up and running and working is easier and simpler in Asami than it is with other systems I've used. Also, because Asami is Closure Script,
00:26:00
Speaker
There's only a few local graph databases which do that. I mean, there's Datascript, for instance, and Data11, I think, is a more recent one. So these will run in your browser for you, and we get a lot out of that.
00:26:16
Speaker
But Asami also works on a system where it's storing things in files. And I do hope to push this out into other things like Redis backends and things like that. But to date, it's been almost entirely new. And I can only go so far as one woman. But it's... She needs help, people. You heard it here first.
00:26:44
Speaker
Well, apparently I'm getting some help soon. And I mean, that said, I've had a couple of, I've had some public PRs coming. I've had
00:26:59
Speaker
There's one person on my team at work who's helped with some query manipulations, which has been really useful and I'm hoping to get more of that. But apparently, there will be more people coming in to help with the project in the not too distant future, I hope. At the moment, I believe I'm the only one who's really across all of it.
00:27:26
Speaker
It's written in ClosureScript or is it something that you can use it on JVM Closure? It's almost all of it is CLJC code. Okay. So it can be used in both Closure context as well as in the browser. So the only parts which are not in CLJC are in CLJ or CLJS.
00:27:56
Speaker
The local storage stuff is currently in CLJ. That's done with memory mapped files. Unfortunately, I don't have an async interface for a lot of this stuff and I'm going to have to do that to use local DB, the local storage. There are reasons why I haven't gone that way. There are performance implications on the JVM.
00:28:24
Speaker
But we do want that flexibility in the browser as well. But as things stand, we can have smallish graphs and just store them into local storage in the browser. But yeah, it takes a bit of time to load data up into Asami. Other systems are faster. In the JVM, you can just do that in the background because the reason for that is that
00:28:51
Speaker
it stores data in indexes. So if data has been stored, then it has been indexed. And that takes time. But once it's in there, you can access it really, really quickly. There are other systems which are much better at loading data up quickly, but they don't enter queries as quickly. So there's trade-offs there.
00:29:15
Speaker
Yeah, this is something that because I've been investigating Tgraph for one of the projects that I'm working on as well. And there are a couple of surprising things that you see when you work with graph databases, when you're coming from, you know, relational databases. One of the things is that, you know, we wanted to have these real time updates of the of the graph. So that's
00:29:35
Speaker
kind of tricky business in in graph databases and also bulk loading of the data especially if you want to have like a referencing highly connected graph then all the uids then i can't load it in batches because i load one batch and i get the ids back then i have to query again to point them and then update my input again which is crazy for us because we have like almost three million nodes and all that stuff as well with the project that i was working on so it's like it felt like
00:30:04
Speaker
paradigm shift or different trade-offs. It's very difficult for me, for my brain to comprehend. I'm just, okay, insert into where, whatever. That's not going to work here, right? No, it is quite different because when you're bringing in more data and you're referencing original data, then you want to know what it is you're referencing.
00:30:30
Speaker
the way that RDF, sparkle systems work. So sparkle is the query and update system for RDF data. So it's a sparse QL, something like that? What is it? It stands for the sparkle protocol and RDF query language. Is it a custom acronym? Yeah. And interestingly, it includes protocol in that
00:31:00
Speaker
specification. So if you have an SQL database, you've got your own binary interface for talking to it. So talking to Oracle is different to talking to MySQL. And that's why we have ODBC and JDBC to try to standardize that. Well, sparkle defines how you talk to an RDF database.
00:31:24
Speaker
Okay. And this is part of the whole thing that I was talking about with having standards. But yeah, if you've, you know, when you're using RDF systems, everything that is to be referenced from elsewhere is going to be identified by URI. So you do have that ability to just say, well, I'm putting this in, and then later on, when you want to come back to talk to it, you can just use that URI. Yeah.
00:31:52
Speaker
Whereas if it's in atomic and you're saying, well entity, and I'm given an ID of negative one.
00:31:58
Speaker
then you come back and say at the end of the transaction, well, what was that object, please? Your DBI dent gives you a way around some of that. There are speed issues with doing some of this stuff. Asami, I think, is reasonably good at doing small updates. But if you want to do a bulk load, it takes time because it's indexing. If you broke that up, it would actually take a little more time and you end up
00:32:27
Speaker
taking more disk space because it's using persistent data structures internally. So if you do a bulk load, that's fine. But if you then break that bulk load up into two or three,
00:32:45
Speaker
Then the first one, you know, expands the tree structures on this, but then the next one will end up copying a lot of that tree structure because it was persistent. And, you know, the third block will copy a lot of it again, and you really didn't want to be doing that. But if you do it in one go, then it'll take you a minute to upload, you know, a gigabyte or a gigabyte or more. I think at the moment, there's a lot of optimizations to come in. But when I released the Asami,
00:33:14
Speaker
2.0, which has storage. Um, I think I'm only getting, um, I think it's 1200 statements per second being indexed, which sounds like a few, but you know, if you insert a gigabyte document that that's, um, that can be millions of statements. So 1200 per second isn't a lot. Yeah. But if you, if you look on the service side, I assume.
00:33:46
Speaker
That's on my notebook. Okay. In Firefox. So if you look under the hood, because in Assami at least, is it everything graph all the way to the disk?
00:34:05
Speaker
I know that Dgraph and other systems, the underlying data structure, like behind the scenes, it's either a key value store that they wrote to make it super fast. And so that there is a, there is like a graph is basically an abstraction on top of SQL or on top of key value thing. Awesome. How does, how does, because that, that seems to be easier to, to build, I guess, I don't know what the reason for it, but how is Assami even when you compare to this kind of
00:34:31
Speaker
Is it turtles all the way down? Graphs all the way down? No. Graphs are usually expressed using, well, I'll say tuples, but generally we look at them as triples. In the atomic, it was always entity attribute value. In IDF, it's always subject predicate object, the same things. Just different labels for the same thing. These tuples are what are being stored.
00:35:01
Speaker
And we index them by just storing them in order. So you store them in, if I'm saying subject predicate object, you store it in order of subject. And then for the same subject, you'll store it in order of predicate. And then if you've got the same subject predicate, you store it in order of object.
00:35:19
Speaker
and then if it's in, so that's the first index and the second index will be stored ordered first by predicate and then by object and then by subject and then the third one will be stored in order of object and then subject and then predicate. There's only six possible orderings and you only need three of them. That's if you're storing triples. Now if you're storing quads or more then
00:35:46
Speaker
If it's quads, there's 24 orderings. But to have complete coverage, you need six of those 24. And which six do you choose? And do you actually need that complete coverage? And usually not. And usually you only need four.
00:36:11
Speaker
out of the six. And what if you're putting in five? Do you need to index according to everything there? And sometimes there's trade-offs where you can say, well, I can find the statement. If I just index on the fifth element, I can find statements by that fifth element. And I've got this. And now I'm here. I can use the other indexes to then get to the next part. And that will be a little slower than a direct lookup. But it's actually all I need for this infrequent sort of query. So
00:36:39
Speaker
There are trade-offs of which indexes you want to use and things like that. And then, Asami, this index is right now like a custom format. Yeah. Because you're talking about Redis being one of the back-ends. So I've layered it.
00:36:59
Speaker
The first layer I built was just blocks where I say, give me a block. And it has an ID associated with it. And it says, OK, here's your block. And it's ID1. And here's your block. It's ID2. Or the ID could be a URI or something like that. And I'm allowed to read and write that block. And then when I'm done, I say, OK, this block is committed. And it'll never be written to again.
00:37:28
Speaker
But I can always say at any point, give me the block with that ID. I can do that with S3 storage where the block is just a buffer and when it's written, I can send it off and its ID is URL to get to it. Or I can do it as a file offset if it's a memory map file, or I can do it as just a key for Redis or anything like that. Over the blocks, I've then built a few different indexes. One is a tree index and
00:37:59
Speaker
others are flat storage and things like this. And then on top of those indexes, I've then got statement storage and I've got a mapping of IDs to
00:38:15
Speaker
things which can be serialized and deserialized. I can store strings and URIs and keywords and blogs and things like that. I reference them by the number that they come back as. Then once I've got numbers like that, I can store statements as just tuples of numbers. It's all layered like this all the way up to the top and then put a query engine on it and you're done.
00:38:45
Speaker
I think I understood everything, so now I can get started with my own graph database now in Emacs. That's funny. I hadn't thought about it, really. Just two days ago, I was speaking with a young woman locally. I mean, women who code. Yeah.
00:39:05
Speaker
We haven't been doing a lot in the last year with the pandemic. And so there've been various things where we connect up just for talking online. And this one young woman was asking me, how do you even start writing a database? What was facing you? Yeah, exactly. I think that is something that always fascinated me. So what is the answer?
00:39:30
Speaker
Well, I started with that first thing where I wanted to have blocks, which I could reference by an ID. And then from there, I built up. And I had that architecture in mind because the previous Scrap database that I'd built that I was a co-designer on and a co-implementer on,
00:40:00
Speaker
was Malgara, and that's entirely in Java, but that was built the same way. Now, there are different architectures as you go up that stack, because over the years, I've learned, I can do this, or I could do that, or I shouldn't do this. And Richie, he had this wonderful insight at one point, and I was like, oh my goodness, I should have been doing this all along.
00:40:30
Speaker
we were using a persistent data structure all along in Malgara.
00:40:41
Speaker
And we had all of this bookkeeping to figure out when each node in trees were being used so that when we'd done commits and we were no longer reading from old transaction points, then they could be cleaned up and any nodes being used in the data structures could be recycled and brought in for newer data that's being inserted.
00:41:12
Speaker
We needed to do that because when we first started building it, which was 2000, we didn't have very large hard disks.
00:41:18
Speaker
And as time went by, disks got cheaper and cheaper. And so I started saying, well, all of this bookkeeping we're doing is slowing the system down. Why don't we just stop doing that and abandon that information on the disk? And if you need to clean it up, you can just port it forward into a new database and delete the old files. But we don't need that stuff anymore. We don't need to clean it up because disk is cheap. That's before we went to solid state drugs.
00:41:47
Speaker
Disk is cheap, so let's not worry about that cleanup. We get a lot more performance out of this. And so I started moving in that direction. And then Rich released a Tommy, where he said, and we can go back to previous points in time. And I was like, I've got all those previous points in time. I just never gave anyone the access to get to them.
00:42:09
Speaker
And I thought, I really need to do that for Mulgara, but Mulgara was written in Java. And by that point I was writing closure and I didn't want to write it anymore. And I, um, and I thought, you know, maybe I could do this enclosure one day.

Asami's Origins and Evolution

00:42:25
Speaker
And then it sort of somehow in the end, I ended up with Asami and decided last year, let's just do it. So I did. And now Asami does it. So.
00:42:37
Speaker
to write a graph database, all you need is just like 20s of experience, understanding all the standards and writing it in Java and then C1s and then writing closures. It seems pretty simple, I think. Well, I started when I had 15 years of experience. I wasn't 20. The requirements are much lower. Fair point. So let's talk about the query side of it, right? Because now, querying the graph database these days, you have
00:43:07
Speaker
OpenCypher, you have GraphScript, you have GraphQL, you have Sparkle, you have Gateomics way of querying. Yeah, TinkerPop, yeah, Gremlin stuff and everything. So why did you pick Sparkle and how do you see the other query languages?
00:43:25
Speaker
Well, Asami doesn't do sparkle right now, though it should. Back to 2000, so originally it wasn't called Malgara, it was called the Tukana Knowledge Store. We sold it commercially.
00:43:42
Speaker
And then in 2004, I think it was, we open sourced it and gave it the name of an Australian marsupial because we were all Australian. It was called Kowari. And then about a year later, the VC investor who came along for some strange reason decided to close us all down. And there was a lot of money left for people who wanted to buy this system.
00:44:11
Speaker
Yeah. And so a number of them approached me and said, hey, would you continue working on the open source side of it? We'll pay you that money. So I did that for a bit. Then the company who bought Takana
00:44:27
Speaker
had issues with the way we were doing the open source stuff in in kawari so we said look you you take hold of kawari i don't care anymore i'm going to fork it and we'll do malgara yeah and so that i spent the next five years on malgara and even a couple of years ago i was getting contracts to keep it updated because there's still systems using it um so it has that long history it was one of the very first
00:44:55
Speaker
graph databases in the semantic web space anyway. And we did it because there were none. We wanted one and we couldn't find anything useful. So, we built that. And because there weren't any, there weren't any standards for talking to it. And we invented our own query language for talking to that. And I was
00:45:22
Speaker
stuck in a room with somebody working out how to do this, and then they recorded it all, and I've got my name on a pattern somewhere. But this was all, that was called the Tukana query language, which is TQL.
00:45:38
Speaker
And around the same time, Hewlett-Packard Labs was doing something similar. They came up with Jana, which has now moved into the Apache Commons. They came up with their query language, which was RDF query language, RQL. And then there were other systems which were trying to do their own thing. So that's when the semantic web, the W3C came together and said, we want to form a committee for a standard on this.
00:46:06
Speaker
one member of my team was on that committee and they were staying up till 2 a.m because we're in Australia. 2 a.m on the committee calls and they'd shop to work the next day with bleary items so we talk about this and this and I said no you shouldn't be doing that you got to do this and I was blogging about it and I was getting all these responses back on my blog which is you know an interesting interaction with the whole process.
00:46:32
Speaker
So I ended up on the second committee for sparkle. So sparkle 1.1.
00:46:40
Speaker
And I was involved in a lot of that. And so it was influenced both by RQL and TQL and a lot of experience from different people who had implemented these things over the years. And so I had done a lot of that work and had a good idea of what the semantics meant and why they were being done one way or another.
00:47:03
Speaker
But having the parse text in a query language is annoying. I mean, I know we've got Instapars, but it's still a frustrating process to go from semantic represent... Sorry, from syntactic representation through to this as well. And Datomic, just by doing everything in basically even,
00:47:33
Speaker
It's very similar syntax. It's the same shape. It does the same things. It just made sense to me to use that because I didn't need to write a parser at all.
00:47:46
Speaker
I take the where clause and I actually execute the where clause as is. I don't need to do any transformation on it. I just execute it. There are a few queries, which if you're doing aggregates, I'll do transformations on some of those queries. I really like that Eden approach. So if I do a sparkle engine, I'll end up parsing it and turning it into Eden and executing that. So mostly it's the data log
00:48:16
Speaker
kind of query language here for Assami as well. Yes, yeah. But I mean, I always wince whenever I hear people call it Datalog, even though everybody in the community calls it Datalog. Datalog is
00:48:36
Speaker
Datalog has been around for since the 70s, I think. It's a subset of Prolog. It's a decidable subset, which is it doesn't have any nesting, it doesn't have any negation, and it maps directly into database querying. You don't end up with searching which Prolog can get caught up in.
00:49:04
Speaker
Everything a data log can do can be turned into queries on a database. And the syntax of it looks like prologue. It's a subset of prologue.
00:49:16
Speaker
Now, what Datomic is doing is a pattern-based thing, which looks like sparkle, frankly. And it has the same semantics as Datalog. So, semantically, it's a Datalog. But syntactically, it's not. And whenever people are talking about Datomic's syntax, they say it's a Datalog syntax. It's like,
00:49:37
Speaker
So Datomic is datalog. Semantically, it really is. Syntactically, it's not datalog, but that's what everybody calls it now. And so I kind of have to accept that that's the way the world changed. But anyone outside of the closure ecosystem would not understand that when you said datalog, that that's what you're talking about, this Datomic style query, because it's
00:50:04
Speaker
Yeah. But this is the nice thing about this podcast. It's a mass editing opportunity. Everyone can just like correct themselves. Everyone's going to listen to me and change this. That's how it's going to work.
00:50:17
Speaker
So if people listen to me carefully, they'll notice that I don't ever call it datalog syntax. If I'm referring specifically to the syntax, I'll say datomic or data script-like syntax. I never call it datalog syntax because, to me, datalog syntax looks like prefix predicates. Yeah. OK.
00:50:45
Speaker
Why? OK, maybe why closure is probably, I think, kind of a given because from 2010 you've been working in closure. But do you think closure gave you any advantage in implementing this because you have years and years or decades of experience in C and C++? Absolutely. I mean, yes. You don't need to say that because it's a closure podcast, you know. OK. So over four years, we, a team of
00:51:14
Speaker
I don't know. There are like a dozen people built Malgara. And five years, thereabouts, four to five years. And I know that the founder of the company, the two founders of the company, they invested their
00:51:42
Speaker
their mortgage, everything they had into it, they then took on money from a VC to make all of this happen. Asami really came about as a part-time project for me, and it's just been me, mostly. There have been a couple of features. Joel Holbrook's Falcon has implemented a few things in querying. I've had one or two features come in through PRs and
00:52:10
Speaker
but all of the storage, all of the query engine, this has been entirely me. I have other jobs, other things that I've had to get done. I managed all of this by myself because I was using Closure. I was a little horrified at how large the storage system had become.
00:52:35
Speaker
because I mentioned that whole layering approach. Well, there's a lot of code in there, closure code, and the namespaces look really large to me. I said, for the whole storage system, how big is this thing and did a line count on it? I skipped the blank lines, but I left the comments in and I tend to be very verbose with comments. Maybe a quarter to a third of my source code will be comments.
00:53:04
Speaker
But I think I totaled around 7,000 lines. Oh wow.
00:53:09
Speaker
Whereas, you know, Malgara, I think is more like a hundred thousand lines. Yeah. So an auto magnitude smaller. Yeah. Yes. Yeah. Yeah. And it was, I can look at a function and see what it does. You know, sometimes it takes a bit of effort to understand what this bit of code does, but this is why I write a lot of comments because I know I'm the person looking at it in six months and wondering what the fuck is this?
00:53:38
Speaker
Exactly. And I often have. And so the more experience I get, the more I write comments. You know, I have some code where I've got more comments than code.
00:53:51
Speaker
I have no qualms about writing very tricky code, so long as I have extensive comments around it to say what I'm doing and why. But yeah, I can look at a function. Typically, I can fit the entire function into a page and I can see it all at once and my eyes can go up and down and around it and I'll understand what I'm looking at.
00:54:13
Speaker
And if there is some sort of complex chunk of something, well, that becomes its own function. And I'll describe it. The function name is what I'm doing now.
00:54:25
Speaker
I'll be going through a thing and I'll say, well, and now I do the thing. Well, I've got the do the thing function happening in the middle and I know that I do that thing. I'm doing that thing in a loop or I'm doing that thing mapped across something. Then when I want to say, well, what are the details of doing that thing? Well, I can go into it and that whole drill down approach that I learned from SICPs.
00:54:50
Speaker
So maybe it's the obvious comparison then, or the interesting question, maybe I'm flattering myself here. In terms of Java, you're doing everything based around class designs and objects and stuff like this and messages and theory at least.
00:55:07
Speaker
And obviously, with Clojure, you're doing more functional programming, again, in theory, at least. So how do those things play out in terms of your implementation? In terms of the comparison, was a lot of that the cognitive overhead of objects helping? Or was it when you stripped it off, was it actually a lot of complexity fell away? A lot of complexity fell away. Like I said, I had the same general
00:55:37
Speaker
data architecture that Malgara has, but I've implemented it differently. Where a lot of state was happening in Java objects,
00:55:49
Speaker
now they're happening in closures. And, you know, everything's functional. So I'm trying not to update things. There are a few occasions where performance has meant that I've really had to go back in and change something to stateful. I had, you know,
00:56:08
Speaker
I have a module which takes entities which have maps of nested complexity with actually these things come straight out of JSON. So they can be maps of values or you can have submaps and you can have arrays all the way down. And I've got to turn that into triples.
00:56:31
Speaker
I have to process that whole thing, turn it into a series of triples which then get inserted. I did all of that with pure functions. Then I was being asked, can you please make this faster?
00:56:50
Speaker
And so I went in and I created a lexically-scoped, sorry, not lexically, I apologize. I went in and I created some dynamically-scoped bars. And it would all be occurring within a thread, so I'd use volatiles.
00:57:12
Speaker
the same functional pattern was being executed, but rather than returning structures, which then get concatted together. Instead, as you go through the functions, it was doing, it was conjuring into the vector that was being held in the volatile.
00:57:34
Speaker
and then I'd return that big vector at the end of it and I got a 20-fold speed improvement. So that was really disappointing to go from really pure code because when things go wrong or when I want to change something, everything's isolated beautifully. But when I changed it to this, it, you know, necessarily I'm going to have a
00:58:04
Speaker
it's going to tie it into the implementation a little bit more. The data that's going in now matters, it's ordering, et cetera. And that was mostly in testing, but I was concerned about the way that this would change.
00:58:21
Speaker
the isolation levels. So far, it's been good. And in fact, we had a problem just the other day where there was a remnant of something where I processed an array as a recursively.
00:58:40
Speaker
Then it turns out we were getting JSON where we had several thousand items in an array, like 10 levels down in the map structures. I didn't know that because it was a very large JSON file. You spit it out to a console and it takes five minutes to scroll past, so you can't.
00:59:10
Speaker
and putting it into Asami through an out-of-stack era. And it turned out it was because I was processing arrays as a recursive operation. And so I changed that. And the recent work that I'd done on this with one of my colleagues had
00:59:33
Speaker
shifted that into updating of volatile. And so I got rid of that recursion part because I was doing the recursion, I get back a big list and then into the array. Well, now the recursion now directly puts things in and I don't recurse down anymore, I loop over it. And some of that work has paid off. And it's pulled me further away from functional programming, which I'm not
01:00:01
Speaker
happy about. But at the same time, there are practical concerns with performance. And that's going to continue because this sort of thing should probably be done in Java or Rust. Something where you're a bit lower down. But if I had set out building it like that, then I could not have done it as an individual.
01:00:30
Speaker
So you make it right, then you make it fast, you know? Yeah, and gradually moving that way. But I mean, doing it in closure.
01:00:39
Speaker
I'd spent a year and I had this thing up and running. Asami was originally a side project on Naga. I built the Naga rules engine. The idea was it was abstracted away from any graph database. My plan was to talk to Sparkle, Datomic, OrientDB, a whole lot of them. My manager at the time was thrilled with it and said, do this on company time.
01:01:05
Speaker
And he said, but I don't want you talking to a commercial system like the atomic. Can you build your own?
01:01:14
Speaker
I'm like, well, I don't actually need all the features of a graph database. I only need like join operations and basic querying. So I could do that. Sure. And I did. And it fits into two main spaces. It's really quite small. It does have a query planner, but the queries at that point would just join operations. So the planner was just which order to do the joins in.
01:01:38
Speaker
And so that code is small and tractable, and you should be able to read through that without too much trouble. And I did it in...
01:01:49
Speaker
couple of days, I think. The problem was that he then got, he was very happy with it and asked for more features and more features. And it kept growing. And the next thing I knew I was rebuilding a graph database. And then he turned around and said, I want to shift all of this off the computers, the backend stuff on AWS and put it into the browser. Can you make it portable?
01:02:17
Speaker
Wow. I renamed everything to CLJC and it all worked.
01:02:22
Speaker
Not really. I renamed it all to CLJC, and then I learned what the differences were, and I discovered a couple of ClosureScript bugs, which I reported and were fixed, and that was great. But it mostly just worked. But it has directed me. Sometimes I wanted to do something and went, no, I've got to make sure that I got it working on both systems together.
01:02:52
Speaker
But that's how it evolved. And eventually it got so big, I thought, this has nothing to do with the rule engine. It should be its own project. And so I pulled it out and we had this avatar naming scheme. And so I grabbed, I used the character Asami as the name for the project. And that's where the database came from.
01:03:18
Speaker
and the whole storage on disk. That was my idea. I was bored last year during a pandemic, so I started it. I mentioned to my new manager this time that I was doing it, and he said, that sounds great. You should do that for work. I got to spend, I think, overall, I spent maybe three months, maybe less, and I built the whole thing from scratch.
01:03:45
Speaker
Um, but it was a, it was a pet project and it could be done better. And I want to refactor some of it. Um, uh, one part of it is in AVL trees, which I'd like to keep using, but, uh, another part is also using AVL trees and I want to shift that to be trees, but yeah. And, um, so.
01:04:05
Speaker
one of the things that you said, that is the performance thing, that is still in Clojure, but you have to make some trade-offs in terms of the design of the application, whether you're using functional, pure idealistic, platonic functional way or some other thing. Are there any other frustrations or issues that you faced with Clojure? Like, okay, this would have been done in Delphi focus, something like that. Oh, goodness, I don't want that Delphi focus.
01:04:40
Speaker
doing some of the low-level memory mapped operations, that stuff is frustrating in the JVM in general. Especially when you say, right, I've got this mapping, but now I need to extend it because the file has to be bigger. That old one has to go away now so that I can free up my address space for the process. And you have to trust that the VM is going to do that for you, and it doesn't always.
01:04:49
Speaker
What are you talking about?
01:05:09
Speaker
And there was some magic voodoo that we had in Malgara that prompted the JVM to do that. And I've got a little bit of that still, but that shouldn't be happening. So do you use, I mean, I've done something similar in the past with Neti. So you can use these Neti abstractions over these NIO things. Or is that not something that you need to worry about?
01:05:39
Speaker
No, I haven't got any kind of instructions like that. I'm working directly with it. But if you look in the code, it's very abstracted one layer over the other, and it's heavily tested at each layer before moving on to the layer above it.
01:05:56
Speaker
External dependencies of Asami are minimal. It depends on closure and closure script. It depends on core cache. And not all of the caching is available in closure script.
01:06:21
Speaker
Yeah. But, uh, so I, I've actually ported some of it into closure script. Um, and that then used a, a, a map that, uh, I can't think of person's name has done it. I can look up the code, but, um, so there's a couple of things which I've pulled in like that, but there's no, uh, there's, there's very little,
01:06:49
Speaker
external dependency at all. In fact, now that I've said this, I'm curious to see what some of them are.
01:06:57
Speaker
I'm looking at it. So you have Korkash, you have Zuko. So Zuko I wrote. It's actually because Naga was doing some things and I wanted Naga separate from Asami and yet I wanted also to be able to talk to other databases. I wanted to be able to run Naga without any dependency on Asami.
01:07:20
Speaker
And so the things which are going to be used by both projects will put into a utility library, and that's it, which is another avatar character. And Naga, is it comparable to something like other Clara, or is it basically a rule engine? It's a rule engine, yeah. If you go to the 2016 Closure Conch,
01:07:47
Speaker
I did a talk at it where I described Nagar and the whole architecture and what it does and everything. Oh, I'm dependent on prismatic schema. Yeah. So you're still using old world. I like the
01:08:08
Speaker
the ability to describe my schema elsewhere. I really struggled pulling my schema into a separate namespace when I was doing it in spec. I also found that the spec schema was dragging me into doing a lot more development on that side.
01:08:26
Speaker
Whereas the schema that I needed, the level of description I needed was being handled quite well by Prismatic or Plamatic schema. And so I've stayed with that. Yeah, so I'm dependent on Prismatic, Plamatic schema core cache. And you'll see, yeah, data priority map.
01:08:53
Speaker
And then the ClosureScript priority map. That is for the cache that I was referring to. So the core cache stuff that doesn't work in ClosureScript, and I put it into ClosureScript, that's built on a priority map. So I just put that in. And that's it. That's all of the external dependencies.
01:09:23
Speaker
So Zuko and Q-test are both things that I wrote as well. So yeah, that's everything. But it's fascinating how much you can do with such a minimal amount of code and then have. So in terms of feature part that you do, would you say that it has similar level of features compared to, what is it, Malgara? There are a few things missing. Malgara had a lot of
01:09:52
Speaker
plugins for talking to external data sources. So you could talk to JDBC or you could talk to different file formats and things like this and represent them all as triples in RDF. And it would translate to talk to those things. I don't have that. Of course, the API for talking to Malgara was sparkle and I don't do sparkle.
01:10:22
Speaker
I think that would be a relatively thin implementation. So I created a ticket for it anyway. See if I ever get to it. But in terms of the engine and what, you know, in storage and query capability, then yeah, I think it's pretty close. Wow. I think that's amazing. Yeah.
01:10:44
Speaker
One of the adapters that was really nice in Malgara was that I could talk to remote data sources and I and I could do that in the middle of a query so I could have a sub query where I said From and give the URL of a remote store Yeah, and it would issue the query against that and return the data and join it locally and that was
01:11:12
Speaker
not the most efficient way to join data across a network, but it worked very effectively. I don't have that per se, but the results of the query can be used as one of the sources. Because if you're using Datomic, when you do a query, you
01:11:38
Speaker
the first argument is the query itself. And then the remaining arguments are the data sources that you're querying against. Well, you can put the results of another query as a data source. And so we can do that here in Asami. And that
01:11:59
Speaker
that actually gets executed extremely efficiently. So it's a nice way of doing sub-querying. And I want to bring names syntactically so you can just issue one query and it'll do it internally in one step. But yeah, I can do a remote query and have those results come in and use that as a basis of further querying against local data. But it's not as nicely
01:12:28
Speaker
set up as Malgara could do that. One maybe is a more abstract question, not abstract question, a more concrete question actually, is like, so what kind of like, like we're talking a lot about the implementation and the history and stuff, what kind of use cases are you going to, people who want to maybe be motivated to use Asami, you know, after this show, get very excited. What kind of use cases are there, you know, is a sweet spot for Asami?
01:12:59
Speaker
That's a really good question. I'm caught up, of course, looking at the trees, and it's always awkward to take that step back and look at the forest. Our particular use case is that we have a lot of data that comes to us as JSON from different sources, and we want to bring that in and find linkages between it, query to find where things are linked
01:13:28
Speaker
and then infer new data from that that can also go into the data structure. And it's very effective at that. I'm personally finding that it's really useful for exploring JSON. So one of the things that RDF does is it has very strict ideas about what can go in the subject, predicate, and object positions.
01:13:54
Speaker
The subject position can be blank nodes or URIs, and that's it. The predicate position can be URIs only, and the object position can be URIs, blank nodes, or values, and values like anything. Datomic is very similar. The subject position is going to be
01:14:24
Speaker
You know, the attribute position is, well, internally it stores it as a reference to an attribute, but you can go directly and represent that as a keyword.
01:14:42
Speaker
Those keywords all had to be defined up front as your schema, and then the value can be anything. That value in your schema, you say, well, this attribute represents date, so this attribute represents strings, or this one is a reference to another entity. Asami has none of those rules at all.
01:15:05
Speaker
Initially, it had some and then someone complained about something, so I pulled one of those rules out and then pulled another rule out. I threw all the rules away in the end. You can have attributes which are strings. You can have entities which are strings or numbers or whatever. There's no type safety tool whatsoever. But what this has meant is that you can pull in JSON and
01:15:32
Speaker
So someone was trying to load up JSON, convert it to Eden, and load it into Asami. But it turned out that they had attributes which were auto-generated. It was a PCAP file being represented in JSON. These attributes were really long string structures with spaces in them.
01:15:54
Speaker
And they were trying to convert this into eat, into load, and of course it wouldn't work. So this was what prompted me in the end to throw away all the time safety. So now I can load up JSON and I can do a query and say which attributes in my system have spaces in the names.
01:16:17
Speaker
And I can say, well, you know, find how, you know, if this talks to this talks to this talks to this, you know, show me that. And I can just load up a raw JSON file and do queries against it that does that sort of thing.
01:16:30
Speaker
And I'm finding that it's really useful approach to just exploring data that I don't know about yet. So you could basically suck in a MongoDB and make sense out of it. Yes. It's probably the biggest use case I can imagine, actually, because there's so much junk in these MongoDBs. Sorry.
01:16:56
Speaker
That's fine. I've been doing a lot of this sort of thing lately where I'm given this big data structure. I just throw it into the database and start querying against it and find out what's in there and how many things I have and where do I have something which is really enormous. When I've got a multi-gigabyte JSON file,
01:17:21
Speaker
and it's causing me problems somewhere, I can use queries to find where those problems are. But we can also treat it as a graph and get a lot of those sorts of benefits out of it. And we can bring data in from lots of different places. So it's just one feature away from
01:17:45
Speaker
You know, dethroning Excel. All you need is just a UI and then you're ready. Well, for me, it would be really nice if it were done, but it's not. The longer I'm on it, the more I realize it will never be done. And the idea that someone might be coming to help me soon is really exciting.
01:18:10
Speaker
But I mean, I love the fact that it's now big enough and useful enough that I can do things with it. I never set out to build a database. I set out to build a rule engine and I was told, well, can you build your own database for it?
01:18:29
Speaker
And that was all I had expected to do. And it just kept growing. And now a lot of those new features are trying to figure out how to expose that in the rule engine. So the rule engine is essentially, well, not essentially, it is a data log rule engine.
01:18:50
Speaker
But datalog is in the datalog sense, not datomic datalog sense. Yes. It's literally in the datalog sense. And so I can write datalog programs. So I can take a whole lot of prologue and then put it in directly and it'll execute it. Just fine. Have a look at that 2016 talk I did.
01:19:16
Speaker
Yes. At the end of the talk, I do exactly that. I take a small program in Prolog and I run it through Prolog and then I run it through Nanga. Wow. And it was being backed by this storage. Yeah, so that was what I set out to do, but now I've got a database.
01:19:38
Speaker
Because I knew from Malgara that a database was too big to implement as one person. I couldn't do it. I would never set out to build something like that because it's just too big. Somehow I built it. Didn't mean to. But at this rate, I think you're going to go and then build the operating system, the Lisp OS that you need to run this database on.
01:20:03
Speaker
Well, at the moment I keep, I want to do new things, new features, but I keep having to dive back into fix one thing more to tweet something else. Or, you know, at the moment I'm halfway through implementing document storage in there. So at the moment it's all about the statements and when you want to get a document out of it, it rebuilds the document out of statements. But I'm, I'm half built.
01:20:28
Speaker
the document storage as well. So it'll actually store the documents as well as the statements that represent the document. So when you find the document you want, you can pull the whole thing out. It just has to deserialize it as opposed to rebuilding it. So for the people who need to get into the code base and then start hacking on it or understand it better to
01:20:55
Speaker
to learn more of the graph databases and as well as contributing to SAMI. So what kind of prerequisites do you think they should have? Apart from closure knowledge, obviously. Don't say 15 years of... Don't say 15 years of graph database knowledge. I don't know. Having some knowledge of purely functional data structures. Yeah, like Chris Okasaki. Yeah, there's like...
01:21:24
Speaker
Yeah, extremely important book, I think. Yeah. Yeah. So the podcast listeners can't see, but I'm holding up the book right now. Having some knowledge of that would be a really great start. Although, you know, a lot of what Asami does in memory is just using closure data structures, which, you know, is built on purely functional data structures. But what's on disk is doing something similar as well. I don't know.
01:21:55
Speaker
A lot of it I try to explain in the code or I document in GitHub. If you look on GitHub, you'll see I've written lots of pages and pages of documentation.
01:22:15
Speaker
I'm trying to describe the process. I want to bring people in to understand what I've been doing so that if they're interested, they can read this stuff and have an understanding of it.
01:22:27
Speaker
And one of the frustrating things is when I have colleagues who are saying, well, I'm looking at this and I just don't get it. You know, how would this work? How would it possibly, it would say, have you looked at the page, at the documentation for this? And the answer is always, oh, well, no. And it's got to the point where I have other colleagues who said, have you looked at the documentation for that?
01:22:50
Speaker
It's all it could be. But I think we are conditioned to assume that there won't be documentation given the amount of things that are documented versus the thing. I mean, it's the same stuff. Nobody reads man pages. Unix is documented hell out of it. And nobody reads them. And every time you see, oh, you can use this switch. How do you know? Because I read the man page. That's how I know it. How would I know? It's impossible.
01:23:16
Speaker
So I think our default approach is that, well, there's not going to be any documentation, so don't bother even looking for it. Not everything is documented, but I've taken the time to do a lot of things. And I try to do that.
01:23:32
Speaker
where things aren't in the wiki, I've put a lot of comments into the source code. A lot of comments in the source code. I've tried to make it accessible. It's big, but what is it? If I
01:23:55
Speaker
I know it's also you've got like good first issues and stuff like that. So you're trying to make it accessible and amenable to people coming to join the project. I have tried, yeah. So I just did a find over the project and just put it all through a word count. So I'm not pulling out lines of comments, not pulling out blank lines or anything like that. The whole project is 7,800 lines.
01:24:25
Speaker
So that should be easy enough to get into, at least to walk through, read through. The problem with closure though is 7,800 lines, those 7,800 lines matter because it can be pretty dense. And I imagine, like you said, you're not scared of doing some tricky closure and a few volatiles, et cetera.
01:24:46
Speaker
Yeah, but I mean, it's still small enough to be accessible for sure. You're not looking at like a million lines of code to start. No. Well, there's 39 namespaces in there. The biggest one is the query namespace, and that's nearly 900 lines. But I also think a lot of that is whitespace and comments.
01:25:16
Speaker
Nice. So I think we had some green room talk, so let's get to the meat of the podcast. So the main topic of the podcast. So your editor journey, so Emacs or some other shit that you use. That is the most important thing.
01:25:40
Speaker
Welcome to the Emacs section of this podcast. Well, I mean, at university, they taught me BI. Okay.
01:25:48
Speaker
And then I, you know, when I went out to work in my first couple of jobs on Solaris, I was using VI. On Windows, I was using the ball and see API. I mean, the UI. Yeah. Yeah. But then, you know, I kept coming back to VI. And then I think in the late 90s, it was NVI.
01:26:13
Speaker
And then around 2000, I was back on Linux full-time then, and I was using Vim. And I used Vim pretty much constantly up until
01:26:28
Speaker
maybe five years ago. And I was just finding that Vim and Closure are not as smoothly integrated as I would like. Pareted in particular really annoys me in Vim. So I finally thought, I'll try Emacs. I've given it a go a few times and just never developed that muscle memory.
01:26:53
Speaker
And I was always so much slower navigating. So I went to EMAX, I tried Space Max, but there were
01:27:03
Speaker
there are some things which differed between Space Max bindings and VIM bindings. For instance, I think capital Y, which is supposed to yank an entire line into the buffer, instead yanks from that cursor position to the end of the line in the buffer. Little things like that being different with catching me out all the time.
01:27:30
Speaker
So I moved to evil mode and that worked really nicely. So I've been using evil mode Emacs now ever since.
01:27:42
Speaker
And every so often, I learn a new thing in Emacs. But generally, if I learn a new thing, I will create a key binding for it so that I don't have to remember it anymore. So my whole memory of Emacs is actually in my init file with the key bindings. How do I do this? I go through that file. Oh, that's right.
01:28:06
Speaker
So, EMACS is just a better version of VIM as far as you're concerned, a Closure version of VIM. Well, EMACS is a better version of everything. What do you mean better version of VIM? Yeah. So, I mean, just recently on the Closureion Slack, somebody said, hey, we're doing a show and tell channel.
01:28:30
Speaker
Why don't you take a screenshot of what you're working on and do this?" And I thought, oh. So I screenshotted where I was that day and put it in, and I have all of these windows, the index windows open. Because I use multiple windows, and they're all in the same process. So on the Mac, you can just use command with the back quote or left or right to switch between them.
01:28:57
Speaker
They were saying, oh, why aren't you using multiple buffers in the one window? Well, I don't like that. And I can switch around them very easily and I can resize easily with the mouse and things like that. Because resizing isn't something I feel I need to do with a keyboard.
01:29:13
Speaker
I think it's sacrilegious to open a new frame in Emacs, because you're supposed to have everything inside Emacs, full screen Emacs, nothing else. Yeah, I'm not like that. Well, it's quite funny to watch like, you know, VJ Squirm as you talk about that. He's obviously unhappy. I'm enjoying this section of the podcast now.
01:29:43
Speaker
I've been going back to VIM a little more frequently. Actually not VIM. I've been going back to NeoVIM. Yeah. Yeah. And it's a scripting language is slightly better than the ancient Archim magic ship that you need to the VIM script thing. The NeoVIM is slightly better. Yeah. Well, it is. Particularly I've been enjoying
01:30:11
Speaker
What's it called? Fennel by Oliver Coldwell. So that's a closure-like lisp for Nerevi. And so that's been pulling me back a little bit as well. Well, meanwhile, I think e-lisp is now getting compiled to native stuff. So there is a GCC file. It's now way faster than everything.
01:30:42
Speaker
Oh, and the Church of Emax, here he is, the preacher. Don't go back there. Don't go back there, my daughter.
01:30:57
Speaker
I think this whole, there is a lot of confluence of ideas, right? Things are coming in across different things. And one of the things that with WIM was the WIM scripting that I... All right, enough of this now, come on. Okay, very realistically, let's stop this. I think with Neovim, with Lua, it's likely better, I think. I don't know.
01:31:17
Speaker
I would not know because I don't use them. Anyway. Well, I just want a tool which makes my coding easier and faster. Yeah. I navigate faster with Vim than I do in anything else. So Evil EMAX gives me that. Yeah. I get a lot of the benefits of CIDR. Yeah.
01:31:43
Speaker
So, you know, I do enjoy working with CIDR, and that'll keep me in Emacs, but I like evil mode. And I just need to switch out of it occasionally, like if I want to try debugging. But honestly, I rarely use the debugger at all. And yeah. But it is, I keep seeing like this kind of, I also read the internet gossip that, you know, people like Linus, they never use debuggers, apparently.
01:32:11
Speaker
It seems to be only for mere mortals like me putting breakpoints somewhere and then going through them. But debugging seems to be like, I never use that. All this awesome programmers that are the legends of the internet and they're like, we don't use debugger. No, they use pre-K. Yeah, everywhere. That's basically one thing.
01:32:36
Speaker
Look, I do I've been known to abuse PRN. Yeah. And yeah, I mean, closure I find is really useful. Not doing this. I guess I could use a debugger to break point something and go through. But normally what I like to do is just throw a
01:33:01
Speaker
print one or PRN into a function and then through my tests which failed, what was the data at this point? I was expecting this, I was that, why? When I was early in my career, I had a mentor who taught me not to do so much of that, but instead to think, not to debug things, but to think why did this fail?
01:33:29
Speaker
What's the expression? Something like hours and hours of printing out different options can be
01:33:48
Speaker
can be used to save yourself from just one minute of thinking you'd problem through. I do try to do that. Why is this going wrong? What is happening? It makes me think about my code a lot more. I'm not as deep a thinker as Rich Hickey is. He tends to have things pretty much right before he starts typing things out or before he releases something.
01:34:17
Speaker
I am a little more iterative, whereas I like to think it through. I like to have the general shape of it in my head.

Coding Philosophy and Conclusion

01:34:27
Speaker
But then as I'm building things, I'll sometimes realize, oh, I was wrong there. I've got to do this. And I'll iterate my way towards the solution more.
01:34:35
Speaker
And it would be more disciplined to think it through better. But I find these are the habits that I've developed are more effective this way. But I do think about it a lot before
01:34:51
Speaker
writing the first bit of code. And then when things aren't working, I'll think it through a lot before I try to figure out how to fix it. Because oftentimes that quick fix is not the correct way to do it. You can introduce new problems. So I like to think and understand my issues.
01:35:12
Speaker
Well, it's a good bit of advice to start to wind up on, I think. I just look at the time, it's like we're... Yeah. It's getting very dark around here as well. I also noticed that my system just switched to the night mode. I was like, I just flipped it because I said it on auto night mode. What is happening? I haven't been paying attention at the time because I wasn't sure when we were starting after that pizza debacle. Sorry about that.
01:35:42
Speaker
Yeah, totally. I think it was delayed because of Ray's Pizza Adventure. Well, I think it is looking pretty good, actually. So I hope it was nice. It was delicious. Yes, absolutely. Awesome.
01:35:54
Speaker
OK, so I think it's a good time to wrap up the discussion. And yeah, I could talk to you for hours and hours and then pick your brain more on this kind of stuff. But I think we've got to stop at some point. So thanks a lot. That's how you finish all the podcasts, isn't it?
01:36:16
Speaker
No, not really. I mean, we say that to other people, you know, to plan it for them. But this time is real. It really means it. Exactly. No, honestly, because as I told you, I've been looking into graph databases a lot because doing I mean, for as a user, not not I don't have the level of knowledge to actually build a ship.
01:36:36
Speaker
So neither do I. Come on, I mean, you already built like four of them. So there's been like, you know, at least for me, it's very inspirational and to understand where you're coming from and what kind of things that you're thinking about and how you're building all this stuff. So very grateful and thank you for joining us. And it's been a pleasure, I think. You're welcome. It's been a fun afternoon. Thank you. Yeah, it's been fantastic.
01:37:04
Speaker
So I think that's it from us for the episode 73. And we will be there depending on when you listen to this podcast. We'll be there at ClosureD. Virtually, we have been there. So if we publish this before ClosureD, please join us. If we publish this after ClosureD, thanks for talking to us at ClosureD. Saying hi to us. So I think that's it from us. And thanks again, Paula, for joining us.
01:37:33
Speaker
Talking, walking us through the graph traversing. We always like to have a podcast with a little bit of edge. Yes. Boom. Yes. Finally, all the graph jokes are done now.
01:37:51
Speaker
Thank you for listening to this episode of DeafN and the awesome vegetarian music on the track is Melon Hamburger by Pizzeri and the show's audio is mixed by Wouter Dullert. I'm pretty sure I butchered his name. Maybe you should insert your own name here, Dullert.
01:38:09
Speaker
If you'd like to support us, please do check out our Patreon page and you can show your appreciation to all the hard work or the lack of hard work that we're doing. And you can also catch up with either Ray with me for some unexplainable reason you want to interact with us, then do check us out on Slack, Closureion Slack or Closureverse or on Zulip or just at us at Deafened Podcast on Twitter.
01:38:37
Speaker
Enjoy your day and see you in the next episode.
01:39:06
Speaker
you