Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
jOOQ - Crossing the Object-Relational Bridge (with Lukas Eder) image

jOOQ - Crossing the Object-Relational Bridge (with Lukas Eder)

Developer Voices
Avatar
813 Plays1 year ago

Sooner or later, every programmer will have to cross the gap between their programming language and their database. It feels like it should be easy, but in practice it’s always a much wider chasm than it seems, and every tool that bridges that gaps comes with its own strengths, weaknesses and opinions.

This week we take a look at a relatively new library for database access—jOOQ—by chatting with its author, Lukas Eder. This episode takes in the simple questions like syntax, the thornier ones like supporting multiple databases, and the deeply philosophical ones like how we even think about data and data-processing.

If you’re a Java (or JVM) programmer, there’s a new tool to learn here, and even if you're not there’s food for thought and ideas to borrow for the next time you need to SELECT…

jOOQ: https://www.jooq.org/
YesQL: https://github.com/krisajenkins/yesql
Datomic: https://www.datomic.com/
XTDB: https://www.xtdb.com/
The Elm Architecture: https://guide.elm-lang.org/architecture/

Kris on Twitter: https://twitter.com/krisajenkins
Kris on LinkedIn: https://www.linkedin.com/in/krisjenkins/

Recommended
Transcript

Challenges of SQL in Programming Languages

00:00:00
Speaker
On Developer Voices this week, we're exploring the ever thorny problem of SQL or SQL. How you pronounce it is the minor problem. The big problem is how do you use it from different programming languages? How do you bridge that gap between your favorite language and SQL, which is the de facto language of databases?
00:00:23
Speaker
We've seen a lot of different approaches to this problem over the years, from the basic of munging SQL strings together by hand, really error-prone, kind of a security risk, I wouldn't recommend it, to very abstract approaches like Hibernate or Active Record that, as far as possible, try and shield you from knowing that the database even exists.
00:00:47
Speaker
which is great for the simple stuff. In my experience, it breaks down pretty quickly as soon as you want to do anything really sophisticated.
00:00:56
Speaker
Honestly, I've never been happy, genuinely happy, crossing that object relational bridge. So when I heard about a new library that's making a few waves, I thought we should get the author in, and he can break down what his solution is. So my guest today is Lucas Eda. He's written a library called Duke. That's J-O-O-Q. And it's firmly of the opinion that SQL is the right tool for dealing with data and databases.
00:01:24
Speaker
And what you should do is adapt the programming language to fit it really, really well. Duke's a JVM library, so you could use it today in Java, Scala, Kotlin. But more generally, I think his approach is also ripe for taking inspiration from and porting to other statically-typed languages.
00:01:44
Speaker
So let's dig in and break it down. Maybe you'll learn a new and less painful way to talk to your database. I'm your host, Chris Jenkins. This is Developer Voices, and today's voice is Lucas Eater.

Database Models vs. Object-Oriented Models

00:02:09
Speaker
I'm joined today by database programming guru, Lucas Eda. Can I call you that? Yeah, sure. Lucas, how are you? Lucas? Yeah, I'm Lucas. I'm great. How are you, Chris? I'm very well, very well. Looking forward to chewing over one of... I don't want to say it's the biggest topic in our industry, but it's certainly a topic that just about bothers every programmer at some point.
00:02:33
Speaker
And that's the mismatch between objects in their programming language and the database they're trying to store things in, which is one you've wrestled with actively. So, so have I. So there's most programmers, but you think you've tried, you've solved it, right?
00:02:52
Speaker
Well, I would say I've solved it by circumventing it in the sense that I just don't believe in the object or into paradigm at the client side as a representation of the model in the database. The database model is superior, in my opinion. So that kind of makes things easier. So I know you're a Java programmer. You are rejecting the Java notion that everything is an object.
00:03:19
Speaker
Yeah, so actually even Java starts rejecting that notion since recently since they started adding more data oriented data structures like records and then now they're really abstract data types pretty soon. So I think this kind of paradigm that is very familiar to SQL developers will be more popular in the near future.
00:03:40
Speaker
So this object-oriented thing where you have objects with identities and you have graphs of objects that are related to each other, like in an ORM, is one side of the coin. You can obviously model your stuff like that and it makes sense for some use cases, but for other use cases which are more data-oriented.
00:03:58
Speaker
the SQL version of the representation of data, the relational one might make more sense. In my solution, which is Duke, I've never tried to bridge between the two worlds. I just ignored the object-oriented one and you stream the tuples directly from the database and you still treat them as tuples in the client.
00:04:21
Speaker
There's this dichotomy in programming. Some people treat Java as the center of their world and Postgres as fancy storage. And some people treat Postgres as the center of their world and Java as fancy processing. You could say it like that, yeah. And you're very much in the programming languages, fancy programming for tuples, side of the fence. It's just how I feel about it. I don't impose this way of thinking on anyone. I mean, if you don't think like that,
00:04:51
Speaker
the SQL centric way, then maybe Juke wouldn't be the right solution for you anyway. So I'm not trying to convince anyone to change their way of thought, but if someone thinks already the way I do, or I think you as well, you did something similar enclosure, which is also very data oriented, of course. Then people already think this way and then they will look for a solution like the one I'm providing.
00:05:13
Speaker
Okay, so you're not trying to change people's mental model, you're just providing the tool if they're already in that space. Yeah, it's a lot of work to change people and often it's not even the right way. I mean, it's my way of thinking and they have good reasons to think differently, so I'm not trying to convince anyone of that.

Differences Between SQL and ORM

00:05:31
Speaker
No. I'm kind of of the opinion that if all you're doing is storing objects by ID and pulling those same object back out by ID, probably this
00:05:42
Speaker
Let's characterize it as the Hibernate way of doing things, where we're very much just trying to pretend the database doesn't exist. No, not necessarily. I mean, IDs are very ubiquitous also in the relational model, right? So every table has a primary key if you're in the first normal form.
00:06:00
Speaker
And you probably are, right? So you're normalizing your database schema. So you have those IDs, and it makes sense to think of the records as well, not just the data sets. So the way I usually compare ORMs and SQL, that gap that you've mentioned is it's basically just an inversion of arrows. So in the relational world, the child points to the parent, whereas in the object-oriented world, the parent points to the child.
00:06:25
Speaker
much more often, right? So the parent contains the child, which might still point back to the parent, but the main thing is you have a book with authors. So you have a list of authors in the book, and that's how you think about it. Also in a graph database, you do it this way. And then the relational model, it's always the opposite. It's always the other way around. So the author, in this case, would point to a book via a relationship table.
00:06:52
Speaker
Do you think that's what makes all this hard, then, the way relationships are modeled? Not necessarily. I think if you think about it thoroughly, then it's just really this inversion of arrows, right? So most of the gap that has been experienced in the past is that SQL as a language makes it kind of hard to serialize this data in one or the other format. So if you serialize your data with SQL, in the classic sense, SQL 92 didn't have many constructs to nest data structures.
00:07:21
Speaker
So you would have probably used joins to denormalize your data and then you have duplicates which you have deduplicate again in the client.
00:07:29
Speaker
And you kind of need an ORM to do all this boring mapping work. So in the end, you have the same representation of the data in the database and in the client just in version of arrows, but the rest is the same, right? So if you're using, for instance, JPA, then you will also use a relational model of your data. You will use the same concepts, the same kind of annotations, like many-to-many, many-to-one, one-to-many, or whatever.
00:07:56
Speaker
So you have the same way of thinking as the relational one. You're just inverting those arrows. And then you let JPA do the serialization, which is the hard work, right? So the problem is that SQL itself, the old part, the old standard didn't embrace nesting of data structures. So you had to do this manually somehow with lots of tricks. But in more recent versions, you can actually do all of this in SQL as well. OK, so take me through what,

Juke's Integration with Java

00:08:24
Speaker
Duke. It's J-O-O-K.
00:08:26
Speaker
J-O-O-Q and you're pronouncing it Duke. Right. Okay. Take me through how that actually plays out as a solution. What does it look like as a programmer?
00:08:35
Speaker
So Juke is an internal domain-specific language which models the SQL language directly as a Java API. So you can write SQL in Java. It looks like Java code, but it also looks like SQL code at the same time. And the Java compiler will type check your SQL statements directly while you write it. So first off, this makes for syntax correctness in terms of the SQL language.
00:08:56
Speaker
But there's also a source code generator that reverse-engineers your database schema, which means that you have all your tables and your columns directed as Java objects with type information associated with them, so you can always have types of access to your database model as well. That's pretty much it. So it's a SQL builder with a lot of type safety embedded in the Java language, or also Kotlin and Scala, of course. It works nicely as well there. You have three versions of it for the three different
00:09:24
Speaker
No, it's one API with some extensions for the two other languages. So it's basically written in Java. And there's some extensions for Kotlin and some for Scala, where it makes sense. OK, so the code generator. So you're saying, I don't come to this from Java land. I probably got a schema already set up in the database. And it's just going to generate from that.
00:09:46
Speaker
So that's at the core of Juke is the idea that you have your database already. You are a database-centric developer probably. It still works if you're not, but if you are, then Juke is perfect for you because no matter what the schema is, how many tables you can have, hundreds and thousands of tables and views and store procedures or whatever, and Juke can do something with those and generate code for them so you can access those objects.
00:10:09
Speaker
And you're probably going to have a schema that survives your client application, or you may have multiple client applications, maybe a Python script that accesses the same SQL database from a different endpoint. And the SQL database is always more important than the application accessing it. So this is the mindset of Juke. Yeah, that makes a lot of sense. Nearly always the database will outlive the particular application that talks to it. Yeah.
00:10:39
Speaker
Let's dig into something like you say these nested relationships in SQL, right? If I have say one table that models nested data as an array inside a column and another table that models it as a relation to another table, how is that going to look by the time I get to Java Land?
00:11:06
Speaker
Well, on Juke's side, it's a one-to-one representation. So if you have a foreign key relationship, then that's a different table for Juke as well. If you have a nested data structure, like an array or a user-defined type, then Juke will notice those things and will nest those things as well in the table where you reference those types. So Juke will just have a one-to-one representation of your actual database model. You will never try doing fancy on your behalf. OK.
00:11:34
Speaker
Yeah. And then presumably it's because you already designed this model and you want to use it the way you designed it, right? Yeah. Yeah. That makes a lot of sense. So you're not taking any opinions on how the data should be changed to fit the world of Java. Yeah. Okay. So you don't even need primary keys. You don't even need any constraints. You don't need any normal forms. You can do whatever you want and still query your data because you designed this database the way you want it, right? The juke won't judge you.
00:12:04
Speaker
Yeah, yeah. I'd like to not be judged by my programming languages for a change. I feel judged a lot. I will judge you, but not you. Oh, okay. That raises questions about support, but we'll dodge that. Right. So if you're largely generating the code entirely generating the driving code from, um,
00:12:28
Speaker
from the schema of the database. Are you doing something like regenerating it on every release and storing that code in change control so you can see if the schema's changed in unexpected ways?
00:12:41
Speaker
That's how I would do it personally, but Juke isn't opinionated about this. So there's just a code generator. It's a standalone Java program, which you can use as a Maven or Gradle plugin as well if you want. You can make it a part of your CI-CD pipeline. I've also documented how to integrate that with test containers, for example. So you could spin up a test container's Postgres instance just for code generation, where you migrate your database changes directly into this in-memory Postgres instance.
00:13:11
Speaker
and then run the code generator against it. So you can do it in any way you want. So typically, the ideal usage would be indeed every time you change something to the database in whatever way you want, you regenerate the code immediately and you test everything. You have your integration test suite, which runs on the regenerated code. So presumably a system like that means you can start to get compile time errors for schema changes.

Handling Schema Changes and SQL Syntax Variations

00:13:39
Speaker
Yes, that's the point, right? So whenever you change a schema object, like for instance, you change the data type of some column or you rename it, or you move stuff around, you want your client code to break because there's a lot of stuff you have to fix. So this way with Juke, you will immediately notice where all the places you have to fix right now.
00:13:59
Speaker
So I think this is superior to approaches where you have embedded SQL in string form. So maybe IDEs these days can help you with the syntax correctness while you write the SQL, but to validate if it's correct at compile time, you probably still have to run the SQL first. You don't have to do that necessarily. I mean, semantically, you still have to check it at runtime, but at least syntactically, you can be quite sure that all of these changes will actually affect your code.
00:14:26
Speaker
Yeah, at least you're guaranteed that those column names still exist. You can also search, you can use your ID to search references to columns and stuff like that. So there's a lot of added value to having generated code from your database schema. So you can document your database schema with comments in SQL. There's a common statement in Postgres and other databases. So you will have a Java documentation of that comment directly on your column, also in Java code, which is really nice. So there's a lot of benefits to this approach.
00:14:56
Speaker
Yeah, if you can persuade the people doing the DDL to comment their code, which could be tricky. Yeah, but once you notice this, that it exists, you start using it, right? Yeah, yeah. Having the tools there to support it is the first step to actually making use of it. Yeah. And so if we get from the code generation side into the actual programming side, I've had a look at Duke's syntax and is trying to be very much
00:15:24
Speaker
Java flavored SQL. So there's a fluid API where you'd say dot select column names dot from. Is that hard to make it work across all the different features of all the different databases?
00:15:41
Speaker
Yeah, there were a couple of caveats that are not easy to solve, obviously, because, for instance, some SQL keywords are also Java keywords, and you can't use them as method names, like if or else or these kind of things, case. So the simple solution is just to append an underscore and be done with it. Yeah.
00:16:01
Speaker
Yeah. Other things are some methods are weird because they're already reserved on the object API like weight. There's actually a weight keyword in SQL and I can't use a weight method because object weight is a thing. So that would have a different semantics. Yeah. Or also equals is a method on object and it would be a nice method to use when you compare two things in SQL. So it's not equals in Juke.
00:16:29
Speaker
So some things are just little caveats. There are no showstoppers, obviously. But other than that, I think pretty much every SQL language feature can map to this sort of DSL. Pretty straightforward way. I've documented this a very long time ago in the Juke blog, how you can automate this even using an API generator from a BNF notation of your language. So when you have a BNF notation of the language, you have
00:16:57
Speaker
choices or repetitions and stuff like that. That always translates one-to-one to some API usage in Java, almost. Sometimes I make a couple of compromises. For instance, if you have a keyword and then you have a parameter like fetch first five rows only, which is a SQL syntax, it would be a bit weird to have to write rows only as well in Java.
00:17:24
Speaker
So it's just limit five. So this is the MySQL or Postgres old syntax, kind of easier this way than having to use superfluous keywords in a Java API. So sometimes there's a compromise to be made. But in most cases, if you think in terms of SQL, you have the SQL syntax in your head, you want to write it exactly the same way in juke. So this is the main philosophy behind the API design. So there's no new invention.
00:17:52
Speaker
of something that exists in SQL and in Juke, it's called entirely differently for no reason. Because that would be cognitive friction. You would have to think about this and learn the API in a much harder way than it already is. So it's not hard to learn because once you know the SQL syntax, you know, okay, it's got to be like this in Juke as well. Yeah, that makes sense.

Standard SQL vs. Vendor-Specific Features

00:18:13
Speaker
And again, you're going for this like path of least friction, like what people already know and believe is the right way to do it, just make that possible.
00:18:20
Speaker
Yeah, I think this is a wise choice to make. Otherwise, I've seen a lot of similar libraries in other languages, like Slick and Scala, or there's Doobie as well, I think. And in other languages, in Kotlin, there's Exposed, and they all invented new DSL constructs.
00:18:37
Speaker
Exposed, for instance, I think they use slice, and I don't even know what slice means. Slice is something coupling specific. When I hear slice, I would think it's a collection and you want to slice this into pieces, but I have no mental model of what this would mean in SQL. And the other way around, if I want to write a SQL statement and I have a where clause, is where slice or is where something else?
00:19:00
Speaker
I wouldn't know. So I would always have to think about it. So if you're an experienced exposed user, obviously you will learn this language. But if you're the new person on the team and you have to read this code, maybe it's much harder for that person to understand what's going on. And I really wanted to focus on not having this friction at all. So a SQL developer will immediately see, okay, where is where? And having is having. And group by is group by. There's really no surprise here.
00:19:30
Speaker
That raises a big, juicy question about where the developers should learn SQL. But I just have one more thing about your mapping first, which I want to check. So one problem I've had with, I've seen libraries that do something like this before.
00:19:45
Speaker
where they're trying to translate SQL into a native programming language API. Some of them say, OK, so I'll just support the standard. And then the particular feature that you like in your particular database isn't there. Some say, OK, Postgres is the best one, so it will support the standard plus everything in Postgres. What do you do for the things that are kind of specific to specific databases?
00:20:13
Speaker
So in general, the standard should be the thing that Juke decides on. So if a feature is in the standard, then Juke will try to use that syntax, assuming that some implementations also implemented it, of course. So if it's a very esoteric standard, then maybe it's not a priority. But a lot of times, the standard feature is the one that everyone else will converge to eventually, maybe in 20 years or so. It takes a long time. But you can assume that eventually,
00:20:41
Speaker
database implementations will implement the standard. As I said before, Postgres already had limit and then I think SQL 2008 introduced the standard fetch syntax, which was already available in DB2 and then Oracle implemented it. Usually, once DB2 and or Oracle implement something, they put it in the standard because they're very generous with us and invent all these things that they make available to all the implementations.
00:21:06
Speaker
So now Postgres adopted the standard as well. You can still write limit, but I think they even switched now the default in the documentation, what they recommend you use. So you start using fetch first instead of a limit. Also because fetch has more features, right? So you can have the with ties clause, which is not very well known, but it can be occasionally useful when you have 10 rows, you want to fetch 10 rows, but if the 10th and 11th rows are tied, you get both.
00:21:36
Speaker
So you can do that with the fetch clause, you can't do that with the limit clause. Tying in the sense that your order by could have- Yes, yes, your order by is not deterministic, so you have two ties in the ordering, so you want to have both.
00:21:49
Speaker
Yeah, I didn't know about that one. That's great. The standard also has percent. You can have fetched the first 10% of your rows, which is also occasionally useful. I think Postgres didn't implement it yet, but they will eventually end. It won't be available in the old syntax. It will only be available in the standard syntax.
00:22:09
Speaker
Yeah, so that would be the default. But on the other hand, there are a lot of vendor specific features which are really interesting as well. People love to use them. So the standard, for instance, would be to use merge. Postgres didn't have merge for a very long time. I think 15 introduced it finally.
00:22:26
Speaker
They invented their own syntax. One of the few times the Postgres invented something on their own rather than following with the standard, which is a bit weird in my opinion. So they inserted on conflict.
00:22:41
Speaker
which is probably more useful most of the time, but it's less powerful and lacks features that merge has. And now they implemented merge as well. So what to do in Juke? So Juke obviously had to support on conflict.
00:22:57
Speaker
as well. So Juke users could use this powerful feature for upsetting. Juke already had MySQLs insert on duplicate key update, which is almost the same thing, but simply different. So Juke now has all three, right? So on duplicate key update for MySQL, which is the simplest one, but the least powerful one. And then on conflict, which is more powerful, and then merge, which is the most powerful.
00:23:22
Speaker
Okay. Let's say I, I have picked a relational database that you've never heard of and it follows the standard. And, but they have one extra feature that you've never heard of. And I'd like to use that. Can I use Duke? Because it's mostly standard. Can I add this feature myself without begging you as a pull request?
00:23:44
Speaker
I got a lot of support requests from people who use Juke with some arbitrary database that I've never heard of, and they were successful because, I mean, some dialects are very standards compliant. I wouldn't say Postgres. Postgres has a lot of Postgres specifics. For instance, the cast syntax was different for a long time in Juke, or maybe some data types which are different.
00:24:07
Speaker
But for instance, I don't know if you know about hypersonic HSKLDB, predecessor of H2. It's very standards compliant, H2 as well. So those people usually have been quite successful using those two dialects and pointing it to some entirely different database, which is kind of standards compliant.
00:24:30
Speaker
Of course, they will always run into regressions eventually. So it's not necessarily a good idea to go all in on this integration because, for instance, if H2 or HSQLDB adds a new feature, which is a better feature than what they had before, Juke will support that and maybe change the default rendering of some implementation, right? So you relied on an emulation of some feature.
00:24:55
Speaker
which wasn't optimal and now you can make it more optimal for the target database product. But your database product, which is a different one, doesn't support this. So you will get a regression at runtime. So it's not supported, but you could get it to run. So is there anything I can do to extend it? Like can I, in my own code base, teach you new syntaxes?
00:25:17
Speaker
So if you're doing function calls or stuff like that, like if you have vendor specific functions that are not supported, that's very easy. You can build a small library. There's always a plain SQL templating feature that you can use. So you can use string templates.
00:25:32
Speaker
where you can compose strings based on strings and nest them arbitrarily, kind of like in mybatis, which is the XML based library, where you can also use XML files and nest them arbitrarily to create templates. And you can do that with juke as well, and extends the DSL in any way you want.
00:25:52
Speaker
This works for simple expressions, but once you want to have something like a select clause, it's more complicated because you can't easily put that inside of Juke. You wouldn't know where to place your clause. But you can patch the generated SQL with regular expressions if you want, so there are... There's a last resort.
00:26:13
Speaker
Yeah, it's the last resort, but it works well. I mean, if you know it's always something that happens right before the wear clause, you just look where's the wear clause and replace. Yeah. I mean, you're using a database product that's not supported. So I guess you're open to this kind of a solution if Juke still adds value. Okay. That raises the question, which databases do you actively support?
00:26:37
Speaker
Yeah, there are about 30 of them. So I'm not gonna list them all here. So all the big ones plus a lot of smaller ones. So our Juke has been immensely popular in recent years with, especially with those number crunching database products, where it makes a lot of sense.

Supporting Analytical Databases

00:26:54
Speaker
So
00:26:55
Speaker
I think if you have a number crunching database like BigQuery or Redshift or Snowflake, you're not going to use an ORM anyway. I mean, it doesn't make sense. You're not storing transactional data, you're querying it for analytics, so your queries are super complex. You have tons of dynamic SQL as well, which is
00:27:14
Speaker
which is where Juke really shines. So if your SQL is dynamic, you can compose arbitrary SQL fragments very easily. So with those database products, Juke makes a lot of sense. And inventors have been adding new Jallax all the time. So they all compete in terms of performance, obviously, but also in terms of syntax, unfortunately, for users, right? Yeah. So you're not just supporting relational databases, like anything with a standard SQL interface.
00:27:43
Speaker
Well, it really has to be SQL. I mean, some products claim they now also support SQL like MongoDB or, what's it called again, Elastic. I mean, they have select from and where maybe, and they call that SQL. It doesn't make sense to use Juke with that. But yes, I mean,
00:28:06
Speaker
What is a relational database product? In the end, if it lacks primary key, can it still be relational? In my opinion, yes, it could be. I mean, you can still store your data in terms of SQL tables and columns and these kind of standard data structures. So once they actually embrace the SQL language and the SQL standard, then yes, you can support that. OK, that's interesting.
00:28:31
Speaker
When we should go back to that honking great topic, which you hinted at, which is, do you think that programmers should know SQL? I mean, should you expect programmers these days to learn that kind of way of thinking and come into a new project knowing it?
00:28:50
Speaker
Absolutely. I have an entire talk about it on YouTube, which has about 50,000 views by now. It's been a really popular talk, which I've given at Vox Zurich, I think. The talk is about getting the most out of your relational database. You can do it two ways. Either you do it in a Java way,
00:29:12
Speaker
I kind of make fun of the job developer approach which is creating photos and creating getters and centers and writing equals and hash code and writing abstractions like factories and factory builders and stuff like that and they have so much infrastructure and not even a single line of business logic.
00:29:28
Speaker
And it's really hard to change the system very easily. You have to change so many components to have a new business requirement. Whereas in SQL, you just describe the requirement and you have your results and you're done. So that's kind of exaggerated, of course. But SQL is a wonderful language for a lot of things.
00:29:46
Speaker
So once you have a query, something that can be expressed as a query and a lot of requirements are queries, right? How much did we earn in this year? How much did we earn per store in this year? Whatever. These kinds of things, you shouldn't calculate them by hand because you'll be always doing low level stuff like putting stuff in a hash map, putting stuff in the list, sorting lists and extracting stuff. And eventually your stuff, your code is slow. You have N plus one problems irrespective if you're using ORMs or if you're writing handwriting SQL.
00:30:16
Speaker
If you're not thinking in terms of datasets, then you will run in tons of performance problems, but not just performance problems. You're also performing yourself very slowly because in SQL, you can just change a little bit of the syntax and you have an entirely different query very fast and it's going to be correct.
00:30:33
Speaker
because all the algorithms behind the scenes to fetch the data and materialize it and put it in the form that you want are certainly correct, right? The only thing that could be wrong is the requirement itself or your way to translate it into a formal language.
00:30:48
Speaker
But I'm always surprised about developers who think SQL is something old or wrong. I mean, the syntax reminds of COBOL, and that may be the only criticism that I think is valid. But if someone is data-oriented, they're going to use for comprehension in any language. And what is a for comprehension, right?
00:31:07
Speaker
If you use the mathematical notation with the curly braces you have the curly braces which is the data set and then you have for all x which is the from clause and then you have the pipe and then the where clause. It's the same thing it's always the same thing i mean this way of thinking of sequel is no different from any other mathematical way of thinking about data is just a kind of weird and quirky syntax but.
00:31:31
Speaker
I mean, it's a wonderful paradigm, and I think people should embrace it for the purposes where it makes sense. Personally, I also like XSLT, for instance, which is also apparently a controversial opinion.
00:31:46
Speaker
I mean the syntax is also the problem there you have a lot of opening and closing brackets and attacks but the programming model is wonderful you have a stream of elements that goes through your processor and you transform it in a functional way with this wonderful x query language or just x path you pattern match your XML document and your output is a new XML document.
00:32:08
Speaker
So you really have to hate XML itself for weird reasons to not like XSLT. But other than that, it's a wonderful paradigm, really. I don't understand why it's not more popular anymore. I mean, somehow people think JSON is better, but JSON has to reinvent everything XML had because people actually wanted that stuff, like JSON schema and JSON processing. Now we have JSON path. It's all the same thing. It's just XML had it from the start.
00:32:39
Speaker
Yeah, yeah. I wonder why it can't just be syntax. Actually, it could just be syntax knowing the programming world.
00:32:47
Speaker
No, it was over-engineered. All the namespacing stuff, I don't think I've ever needed that. You can have nested namespaces and declare everything in XSD files and you would have tons of errors and it would take forever to get it right. Soap was so over-engineered. To just have a single endpoint on the internet somewhere, you had to declare this huge, huge, huge WSDL file. It took forever to understand what you had to do.
00:33:14
Speaker
Other than that, the core of XML didn't really have a problem in my opinion. Except I could never figure out whether something should be an attribute or a nested element.
00:33:26
Speaker
The attributes, yeah, I guess that was a mistake. The attributes, they came from the document-oriented usage of XML, so you had those two basic usages, right? Document-oriented and data-oriented, and probably didn't need attributes in the data-oriented one. Yeah. So let's talk a bit about data-oriented programming. Yeah. Because that seems to be the unifying idea here that brings in SQL and
00:33:55
Speaker
data processing generally in functional programming.

Importance of SQL Knowledge for Developers

00:33:59
Speaker
Do you think that is back on the rise? Yeah, I couldn't tell you. Everything is going up and down all the time, like a server and client side computation. We used to have mainframes and then we did everything in serverlets and back to the server and back to the client. It's the same thing with imperative or object-oriented, which to me are the same thing.
00:34:24
Speaker
and functional programming so either you have state and you want to manipulate state and that has its purpose for instance if you have a rich client UI you want to have that you have a window which is kind of an object and you want to send messages to it and manipulate its internal state.
00:34:40
Speaker
Whereas when you have data, you kind of think of a stream of a flow of stuff that goes through a processor and the process doesn't change the input, it just generates output. And it's both are one wonderful paradigms. And for some reasons, developers always think it must be either or they create these falls dichotomies.
00:35:01
Speaker
which don't really have to exist. Both things have their purpose, right? So I wouldn't implement the UI entirely functionally. I can't imagine it. Probably people do it, but I kind of find it hard to reason about a rich client, like a Windows application with Windows in an entirely functional way. Probably could be elegant, but I think object-oriented programming is better suited there.
00:35:27
Speaker
Okay, I could grind my own axe on that one and try and convince you. I could grind an axe and yeah, I could try and convince you that actually the functional way is really elegant in the front end. Could be, yeah. Well, I did it as well with XSLT in the past, so it generated HTML in the end, so it was a complete stateless request response model. We should do a pair programming video where we thrash this one out. Yeah.
00:35:54
Speaker
But I do think they are slightly. I mean, we love a holy war in programming from the days of editors all the way up to how you actually implement problems. But I do think there is a difficult to reconcile difference between imperative or object-oriented programming and data-oriented functional programming. And the biggest one is
00:36:18
Speaker
Mutation and how you deal with statefulness. They just seem to have such different ideas of how that works that I think you have to choose one or the other.
00:36:31
Speaker
In theory, you shouldn't have to choose, but practically, obviously, you have to, right? So I always say to people who object that SQL is kind of stateful, I tell them, actually, it isn't. So if you have a perfect database product, which is entirely perfect and doesn't actually destroy any state with your update or delete statement,
00:36:52
Speaker
In theory, you have a database and you have an append-only database and your update is also an append operation. You create a new record that obsolets an existing old record and you don't mutate the actual record, you just append stuff. You could design a SQL database in an entirely side-effect-free way in theory.
00:37:12
Speaker
I don't think anyone actually implemented this because ultimately you have to remove data again to avoid a huge data pile that you can't manage anymore so if you never delete anything in a large system that might not work but the SQL standard has a version and
00:37:32
Speaker
and bi-temporal features where it can actually tell the SQL engine to not actually update the record, but to split it in two and mark the old one as deleted, but it's still there. That would be very, very nice. I think in the closure world,
00:37:52
Speaker
which you're more familiar with, I guess, the atomic works this way, more or less, you have an append-only database, and it feels like you update stuff, but you're actually not updating it, you're changing, you're adding to it, and the result is a new database, so the input is a database and the output is a database, and it doesn't modify anything.
00:38:14
Speaker
I think this elegant world can exist and you can interact with it in an imperative style, but it's not actually mutating stuff. So I think imperative languages in principle could be made entirely functional in that sense, I guess.
00:38:30
Speaker
Yeah, because we have that in the programming world. It looks like you're mutating an immutable data structure, but really you're creating a new one. And the past doesn't disappear. We actually had James Henderson on this podcast a few weeks ago about talking about XTDB, which is another attempt like Datomic to do exactly that. Are you ever tempted to go beyond the interface library and build your own database?
00:38:59
Speaker
No, I still have a life next to my job. I don't think that will be very easy. It'd be very challenging and fun to do, but I mean, Duke is already enough work and I have a lot of plans with Duke itself. So I think I'll let the database stuff to those experts. Yeah, it's a very, very deep rabbit hole. Yeah, much deeper, I

Juke's Roadmap and Future Enhancements

00:39:24
Speaker
guess. Yeah. So what are your plans for Duke?
00:39:29
Speaker
One of the features that I've been working on in the recent past has been multi-set support. Multi-set is a standard SQL operator that allows for nesting tuples in a relational way. So you can have arrays, multi-sets, or sets that are nested inside of a record. In the early 2000s, ORDBMS were a thing. So there were object databases at the time, which tried to say, OK, SQL is now over, and we have to do everything object-oriented.
00:39:57
Speaker
and the SQL databases kind of adopted these features into OR DBMS like, as I said, arrays and stuff like that.
00:40:04
Speaker
So Postgres is one of the implementations, Oracle is another one, and the third one, which is less popular in Formix, have these features. So Juke implements this and behind the scenes usually generates SQL JSON because other dialects don't actually support this natively. So you can actually nest stuff in a Juke query as if all those databases have supported the SQL standard. So this is really powerful. You can then map it to lists or sets in your Java APIs.
00:40:33
Speaker
and everything is type safe. So you have a type safe tree mapper that can map arbitrary levels of depth of arbitrary objects into Java classes, which is super powerful. So everyone who has tried this will never look back. It doesn't get enough buzz, I think. But once people try this, they really see the superpower of having type safe embedded SQL. Yeah. That sounds like we're getting to the point where
00:41:02
Speaker
Mainstream relational databases might consume the market share they lost to document databases.
00:41:09
Speaker
Yeah, Oracle also did its part for that. So they implemented something really fancy in Oracle 23c, which has been released as a developer preview and will be soon GA. They have this JSON relational duality feature where it can declare a JSON tree being backed as your relational tables. And you can update the tree and it actually translates to updates to the nested collections. So this is super powerful. They kind of did the same thing I did, but in the database, you can declare a view.
00:41:39
Speaker
and then do arbitrary things so you can even update it. So this is something you can't do with Juke directly. So I'm going to be looking into that very soon, supporting that feature in Juke. So they expose the nested JSON data structure as a JSON schema. So you can have type safe queries on your nested data structures of your relational model.
00:42:01
Speaker
And I guess Juke's code generator will then generate Java classes for those JSON objects. So JSON will just be the serialization format behind the scenes in that use case. So you can have, I guess, a REST API and have it completely implemented with Juke and between, and then Oracle, so you can manipulate stuff in between.
00:42:23
Speaker
Oracle has REST APIs directly into the database, so you wouldn't need middleware. But in case you actually need the middleware to do any additional manipulations, Juke will help you there. There's a lot of really cool new features coming in all the SQL databases, so I'm hoping Oracle will standardize this particular feature. Once it's standardized, I'm pretty sure Postgres will implement it as well.
00:42:47
Speaker
There are graph features which have now shipped in the latest SQL standard, which I find very exciting. Neo4j has implemented the Cypher query language and invented a lot of cool features there with their own syntax where you have arrows to declare your graph relationships. That has now been part of the SQL standard as well. I've downloaded and bought
00:43:10
Speaker
the status document from ISO and looks really well done. Oracle already has shipped an implementation as well.
00:43:17
Speaker
So they were big in the team of standardization and implementation. So I've tried it. So it kind of really makes sense. You kind of have your relational model. And as I said before, what are you doing is you invert the arrows. So now the parent points to the children and you can model your query in terms of parent-child relationships like this and pattern match your graph across your schema in this way. So I'm not sure about the implementation yet. So I'm not sure if it's fast.
00:43:47
Speaker
But it's an elegant way of thinking, and I'm pretty sure they will make it faster if it's not fast yet. Yeah, I've seen people do it just in like, probably about 10 years ago, I saw people trying to do it in, I think it was Postgres. Yeah, yeah, Postgres has an extension, I think. Yeah, it was possible. It was kind of slow, and all the queries are really hairy.
00:44:11
Speaker
The generated ones or the ones you usually inevitably hand rolled with. Yeah. Um, not to be recommended, but if that's being adopted into the standard, that might well get better eventually. Yeah. It is. Um, how do you figure out your development roadmap? Do you look at what other databases are doing? Do you look at the, do you read the ISO standard and see what features you're going to need to do next? Or you just listened to the people who are using it.
00:44:39
Speaker
Well, the ISO standard isn't really advancing very fast, so it's not the biggest source of inspiration. So this graph stuff is kind of the biggest change in a lot of years, I've seen in the stands. Of course, SQL JSON is a big new feature of the standard, but Juke already supported some of it because Postgres had a lot of vendor-specific API in the past already, so it made sense to support this as well.
00:45:05
Speaker
But yeah, I do check out release notes of database products, and if there's something really cool that you didn't do yet, for instance, once I supported Teradata, a very old and popular database from the 90s, which is still around in a lot of systems, and they have this qualify
00:45:22
Speaker
A clause to the SELECT statement qualifies like having, but for window functions, which is really powerful. For instance, if you want to have a query where you have the ties, like I said before, and your database doesn't support it, you can query for the rank of your row, ordered by your ordering criteria, and you don't have to nest a query to calculate the window function.
00:45:43
Speaker
So qualify is a vendor specific extension which has been adopted by I think at least snowflake and I think access all maybe has it and H2 has it. And Doc DB also has it. So I see this being adopted by a lot of vendors because it's so obvious to do. It's such a simple feature and such an obvious value addition. And I think there are people discussing standardizing it now. I've met someone
00:46:12
Speaker
who is looking into standardizing it from the Postgres community.
00:46:19
Speaker
Can't promise anything, can't say if it's happening, but it does make sense. This is a discovery I've immediately implemented in Juke, for instance, once I started supporting Teradata. Juke can emulate it in any database. You can use now Qualify and any dialect that supports window functions because for Juke, it's just either it's supported natively or Juke just generates a derived table, nested query, and filters on the window function.
00:46:47
Speaker
This is a detail I've missed. Sometimes Duke is generating not just the direct translation from what you've typed, but actually stepping in and saying, what you mean in SQL is this.
00:47:00
Speaker
Yes. If there's really no vendor-specific way to do it, then there's an emulation. Like Unconflict in the past. So if you write Unconflict from Postgres and you want to start supporting Oracle as well in your application, not just Postgres, then Juke will automatically generate the merge statement for you, which is equivalent to whatever you wrote with Unconflict. So if you need to tell Juke which database you're using? Yeah. There's a SQL dialect that you pass to Juke.

Exploring Duke and Future Topics

00:47:30
Speaker
If people want to give Duke a try, where's the best place to find it? Duke.org. Duke.org, easy. And it's just a Java library. Right, so you download it, you probably use it with Maven or example projects where you can start or demo project.
00:47:48
Speaker
There's not much you have to do to set up. All you need is a JDBC driver, a database connection, and you can get started. Boom. And Scarlett and Kotlin, too? Yes. Perfect. Absolutely. OK. Well, next time I'm connecting to a SQL database, which shouldn't be too far away, I'll give it a try. Excellent. Lucas, thanks very much for joining us. Thank you, Chris, for having me. Cheers.
00:48:13
Speaker
Thank you, Lucas. Do check out Duke. There are a lot of ideas there to use if you're on the JVM and still some ideas to steal if you're elsewhere. I'll put a link to it in the show notes, but if you're Googling for it, it's J-O-O-Q. Also, while we were talking, we hinted at a good way to implement functional user interfaces. If you're interested in that, I think the best place to start is by searching for the Elm architecture. But again, I'll put a link to that in the show notes.
00:48:43
Speaker
Maybe we should do an episode on functional user interfaces in the near future. That'd be fun. If you want to make sure you catch future episodes, now is an excellent time to hit like and subscribe and follow and all those buttons, and you will let me and the algorithms know that you want to hear more.
00:49:02
Speaker
As ever, if you want to get in touch, my contact details are in the show notes for Twitter and LinkedIn and all the usual. And with all that said, I think we're done. Until next time, I've been your host, Chris Jenkins. This has been Developer Voices with Lucas Ida. Thanks for listening.