Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
The $500 Billion Integration Problem, And One Possible Solution (with Marty Pitt) image

The $500 Billion Integration Problem, And One Possible Solution (with Marty Pitt)

Developer Voices
Avatar
0 Plays2 seconds ago

Ever wondered why data integration is still such a nightmare in 2025? Marty Pitt has built something that might finally solve it.

TaxiQL isn't just another query language - it's a semantic layer that lets you query across any system without caring about field names, API differences, or where the data actually lives. Instead of writing endless mapping code between your microservices, databases, and APIs, you describe what your data *means* and let TaxiQL figure out how to get it.

In this conversation, Marty walks through the “All Powerful Spreadsheet” moment that sparked TaxiQL, how semantic types work in practice, and why this approach might finally decouple producers from consumers in large organizations. We dive deep into query execution, data lineage, streaming integration, and the technical challenges of building a system that can connect anything to anything.

If you've ever spent months mapping fields between systems or maintaining brittle integration code, this one's for you.

Support Developer Voices on Patreon: https://patreon.com/DeveloperVoices

Support Developer Voices on YouTube: https://www.youtube.com/@developervoices/join

TaxiLang Homepage: https://taxilang.org/

TaxiLang Playground: https://playground.taxilang.org/examples/message-queue-and-database

Taxi Lang GitHub repository: https://github.com/taxilang/taxilang

OpenAPI Specification (formerly Swagger): https://swagger.io/specification/

YOW! Conference - Australian software conference series: https://yowconference.com/

Spring Framework Kotlin support: https://spring.io/guides/tutorials/spring-boot-kotlin/

Ubiquitous Language (DDD Concept): https://martinfowler.com/bliki/UbiquitousLanguage.html

Kris on Bluesky: https://bsky.app/profile/krisajenkins.bsky.social

Kris on Mastodon: http://mastodon.social/@krisajenkins

Kris on LinkedIn: https://www.linkedin.com/in/krisjenkins/

0:00 Intro

Recommended
Transcript

The Massive Cost of Systems Integration

00:00:00
Speaker
Apparently, the systems integration market is worth $500 billion dollars a year. $500 billion just spent connecting different systems together meaningfully.
00:00:13
Speaker
And I don't know whether I should be impressed by that figure or depressed by it, because on the one hand, it's great that there's so much value in connecting different computer systems together that that money is worth spending.
00:00:26
Speaker
On the other, it's depressing that it's so hard that that's what it costs. And I can imagine plenty of senior managers out there with their heads in their hands, wondering why it's this difficult, but having to write the checks anyway.
00:00:41
Speaker
But I think we know why it's so difficult. As programmers, you figure that out the first time you try and connect two systems together. You end up saying, sorry, but it's taking longer than we thought because we've always used JSON, but they've standardized on protobuf.
00:00:58
Speaker
Or they use REST and we've been busily moving over to RabbitMQ and the things don't line up. It's the way systems disagree that bite you. And nearly always, every different system's design decisions made sense in isolation.
00:01:14
Speaker
It's just when you bring the things together that you've got this translation problem. And then you get deeper problems. What about the decisions that aren't just technical? Like all our user IDs are numbers and we went to this other system and they don't have a user ID.
00:01:30
Speaker
They do have a customer ID, but it's a string and it seems to be our user ID prefixed with the letter C, but we're not 100% sure yet. And how do you reconcile those kinds of data problems?
00:01:42
Speaker
You can spend a lot of time and money trying to stitch things together and still go mad. And you'll have to judge the madness of this week's guest because he thinks he's got a solution.

Marty Pitt and TaxiQL: A New Approach

00:01:54
Speaker
I'm joined this week by Marty Pitt, who clearly went to one data integration meeting too many and decided to create TaxiQL, which is a query language that doesn't care about serialization formats or which storage system you're using, but it does care very deeply about what your data means.
00:02:15
Speaker
And then it works its way outwards from there. It's an idea that's definitely related to domain-driven design and this concept of ubiquitous language, but it's a software solution, it's open source, and most importantly, it doesn't need everyone to agree on ubiquitous language up front before it starts being useful.
00:02:37
Speaker
So, what is TaxEQL and how does it work? Very conveniently, we only need to query one data source to get the answer to that. So let's get going. I'm your host, Chris Jenkins. This is Developer Voices.
00:02:50
Speaker
And today's voice is Marty Pitt.
00:03:05
Speaker
Marty Pitt, how are you doing? Very well, thank you. How are you, Chris? I'm very well. and glad to see you here. I last saw you on the other side of the planet. That's right. We were at conference for the YOW conference tour, right?
00:03:17
Speaker
That's right. Three conferences, three three different cities. Yeah, it was great. That was very good. um Yeah, hopefully we get to do that again this year. Yeah, if you're listening now, um yeah, we'd we'd love and ah we'd love and love another go.
00:03:32
Speaker
Yeah, if we if we manage to get through the next hour and sound good, maybe this is our application to come back. Absolutely. Right, well, the best thing for us, Supper. So you I want to talk about what you've built, but I want to start with the familiar and very thorny place where you found the problems worth tackling.
00:03:51
Speaker
Because you've worked in like banking, right? Yeah, so, um yeah, my background is software developer for a long time, and we were I was kind of the tech lead or CTO kind of director at tech consultancy, and we we did a lot of work with kind of large banks, and a lot of those projects turned out to be fundamentally data integration problem um projects, right? So we'd be pulling data from databases or from APIs or from like CSV files.

Manual Mapping vs. Automation in Integration

00:04:25
Speaker
And so often we would end up with these like massive um mapping spreadsheets that would say, you know, this field and this file needs to map to that field and that API and then needs to map to some other thing and some other thing.
00:04:43
Speaker
And like, you know these these bas would spend literally months before a project started curating this these spreadsheets and kind of tracking down all the different data sources that you needed for a project and then they would hand them over to the dev team to say right you know for this project we need to stitch together all of this data um and so that would be that would be the thing that they would hand over and you know kind of Prior to that, I'd been working in, like I've been doing a huge amount of stuff in open source and I've been doing um a bunch of stuff with, at the time it was called Swagger. These days we know it as OpenAPI. Yeah.
00:05:22
Speaker
yeah um I'd written a bunch of kind of like spring Java-y kind of tooling for Swagger. um and so I was kind of in the headspace of automated machine-readable API specs.
00:05:37
Speaker
That was at the time when, you know if for those of us that old enough to remember, when it was just super cool that you could you know with a little bit of tooling, you could get this UI that looked great over an API and like it gave you this great way of browsing your API and interacting with it.
00:05:53
Speaker
Swagger was really cool at the time. Yeah. it Still is now. would make your front-end people happy. Exactly. finally had a way to discover this stuff, right? and and just and you know And we were doing these projects we had these spreadsheets, these mapping spreadsheets, and then inevitably someone would go and try and track down you know the the Swagger spec or the OpenAPI spec and and Then we'd look at that and we'd look at the mapping spreadsheet and we generate our clients and we'd wire together these integrations and like maps and fields together and then write some code to go and fetch data from another API and then map that together. It was just this string of field mappings and integrations. and
00:06:34
Speaker
and It just felt like it felt nuts. right and The question I was asking myself was, we have these machine-readable specs, these machine-readable specs that describe where all of this data is.
00:06:46
Speaker
And yet the very first thing that we're kind of doing is handing those machine-readable specs off to a person to look at and start coding up integration. i'm like, this is this is kind of broken, right? Like, why can't these machines just read the specs and work out how how to stitch everything together? And, and you know, this you the the idea kind of formed of this amazing nirvana where...
00:07:11
Speaker
if everything has an OpenAPI spec or an Avro spec or a protobuf spec, you know, it has some kind of machine-readable spec, not particularly fussy about what kind, but the idea formed if we had all of these machine-readable specs and software kind of was able to read them, would software be able to build all of the glue for us?

The Burden of Maintaining Integration Code

00:07:30
Speaker
Because it's... It's not just the time spent kind of finding the APIs and doing all of those mapping exercises, but it's the coding of the... It's kind of boilerplate code, right, to stitch the stuff together.
00:07:44
Speaker
And then there's a huge amount of maintenance that goes on after you ship it when an API changes or, you know, someone needs to move a database... but They're like, you can't talk to a database anymore. You've got to go through a REST API. So everyone has to refactor to the REST API.
00:07:58
Speaker
then they go, you can't go to a REST API anymore because now we have Snowflake. So it everyone goes back to the database. And like as organizations are changing, all this glue code is um it it it has to be repaired and maintained. And it's just it's really burdensome.
00:08:14
Speaker
Can I ask you a deliberately challenging question on that? if If you're one of those companies that supplies banks that keep getting into that same situation, is there not a bread and butter kind of boring integration and constantly stitching things together is how we make money thing?
00:08:31
Speaker
um Well, i I think it breaks down into two parts, right? I think um for For kind of third-party APIs, I think that's a really nicely solved problem. Like I think tools like Fivetran or Airbyte or, you know, these tools that books that they come out and like Airbyte's open-sourced 50 kajillion different connectors for connecting Slack to Google Sheets to HubSpot.
00:09:00
Speaker
So for for kind of off-the-shelf third-party APIs, API connectors, it feels like it feels like there are platforms out there doing a really good job in that space.
00:09:12
Speaker
I think the reality in kind of mid to large size organizations, organizations where software developers sit and work that are building Kafka streams and that are building microservices and that are building like building bespoke internal software.

Challenges in Custom Software Integration

00:09:27
Speaker
like Microservices is a great example. if If you work in an organization that has you know, 100, 200, 1,000 microservices, you can't go to Fivetran or AirBite and get a connector for the microservice that Tim, who sits next to you, wrote, right? Like you did there's no off-the-shelf connector for that.
00:09:46
Speaker
And so then you end up falling down to, they have like this generic REST connector where actually it's not giving you a huge amount of value, right? It's not, you still end up having to do all of the tracking down of the fields and the mapping of the data. You still have to do all of that, and that gets encoded in your integration. it's not like It's kind of lost institutional knowledge. All of that knowledge of how does this API connect to that API is held in the head of the developer who solved for that problem at that point in time, and then they move on and and you know all that's left behind is the is the code that they wrote.
00:10:23
Speaker
So I think that's That's a much more interesting, much more difficult problem. And the problem of how do we connect bespoke software? um but Yeah, kind of which there's, if you want to find a large number of bespoke data services, a bank or an insurance broker is where you go to, right? Yeah.
00:10:40
Speaker
ah But, well, and I mean, not just not just them. I mean, you know, microservices, we're all the rage. Lambdas these days, you know, if you've got serverless functions, they are the same thing.
00:10:51
Speaker
You know, Kafka events that are, spin that you know, dropping messages onto a Kafka topic, that's essentially a bespoke API. um our Our organizations, I think, are more and more ah ah blend of third parties, like cloud-based sassers that we're pulling in,
00:11:09
Speaker
And then the internal logic that makes our stuff kind of internally unique and, and tick. And often that is microservices or landers or, you know, Nest JS apps or something like that.
00:11:22
Speaker
Yeah. And they, they need to talk to each other. So did you have like a moment where you had some big insight that made you say, i'm going to jump ship and build this as my own company? um i had this um I had this idea of it shouldn't be this hard. right like cool fair enough you know yeah Surely there's a simpler way of

Critique of Enterprise Domain Models

00:11:42
Speaker
doing this. um and so I started i started kind of noodling on it as as an idea. um and the um you know, the idea of how can we how can we add information into these API specs or into these machine-readable specs so that we can work out how they relate.
00:12:00
Speaker
And I think what we did do is we spent a lot of time looking at um kind of prior art in the space that I think hasn't worked out as well as people i had thought it was. So, you know, that client in particular had gone, and we'd seen this a lot, it had gone down like this big road of enterprise domain models where you have like one representation, know, thou shalt only represent a customer in this way. So there is one way of representing customers and and all systems that exchange data about a customer must adhere to the same contract.
00:12:34
Speaker
And we kind of, you know, we looked at that and we thought, well, that's a really bad idea because, you know, if you ever want to change that, now every single system has to change. And if you ever have piece of information that doesn't neatly conform the golden view of what a customer is, you end up with like, you know, extra data 23 and you start shoving, know, you start shoving business critical information into extra data 23. Yeah, because you have to ship.
00:13:04
Speaker
You've got to stick it somewhere, right? And so then yeah then you that you ping the other team on Slack and go, yeah, look, that that you know the customer's account balance, we've stuck that in extra data number 23. And they're like, cool, great. And then i they go and and they go and pull it like go And it sounds like you're joking, but this kind of stuff actually does happen, right? Yeah, no. we were I mean, genuinely, the...
00:13:25
Speaker
the customers are back and they were trading an entire financial product based on extra data fields. You know, it was because the products couldn't be represented neatly inside of the existing pre-thought of concepts.
00:13:38
Speaker
So we knew we didn't want to do that. And, you know, we knew, we knew from kind of talking to different teams that, um, that, engineering teams um will argue deeply about and will care passionately about how their domain models an idea. And I think that's right. you know I think that two different teams should have different representations of different things.
00:14:04
Speaker
but i Sorry, do the same thing. So two different teams that are modeling the idea of a customer or of a account or of you know, they will name those and have different attributes that make sense to them and and kind of their domain.
00:14:14
Speaker
um And so we knew that we didn't want to align around field names because that, you know, trying to get teams to align around field names is just, it's just a battle. Yeah. um Yeah. The recipe for a holy war, right? Yeah, exactly. exactly yeah But no, that I mean, the the aha moment was the spreadsheet, this this awful spreadsheet that had... Sentences that have never been uttered in history. The aha moment was the spreadsheet.
00:14:41
Speaker
I was looking at it and I was like, this spreadsheet has

Embedding Business Concepts in APIs

00:14:44
Speaker
got like a thousand rows and you know they um they've numbered everything. like They've created this catalog and they've said, okay, well, this is this is business concept one and business concept two. and and like They gave everything like this numeric concept and then they went into their spec documents and they did a great job of this. It was a little bit higher...
00:15:03
Speaker
hieroglyphics because they used like this weird numbering scheme but they did a great job of saying yeah these business rules can be expressed as kind of business concept one and business concept two and then you'd go and look it up on the spreadsheet and you'd go okay business concept one comes from that system so i've got to go and get the api doc from that system so they had this huge spreadsheet that for this really large project described all the data and all the systems that it came from and how it related and how to do like entity resolution because like one team uses one set of industry standards and another team uses a different set of industry standards. And so you've got to resolve those. And all this information was in the spreadsheet, right?
00:15:44
Speaker
It's like, man, why don't we... Why don't we take those business concept ideas, those catalogs, and just embed them directly in the API specs? Don't say this field is this is the same as that field from that other system but because they're couples the two systems together.
00:16:00
Speaker
But instead, just say this field is this business concept. right This field is an ISIN or this field is a first name or a customer ID or a date of birth. and And if you do it at the field level, not at the object level, but if you do it kind of at the scalar field level,
00:16:17
Speaker
yeah then you don't have a holy war around what fields belong in the concept of customer because each team is deciding what a customer is for themselves. um But what you do have agreement on is this thing that I've called first name and this thing that you've called given name, like it's the same it' the same business idea. They're interchangeable. The semantics of these fields are interchangeable.
00:16:39
Speaker
doesn't really matter what the field names are, but the semantics are interchangeable. um And that was really what the spreadsheet was trying to show, right? was Was these are all the business concepts that relate to this huge big project. And this is how each of these different systems represent these business concepts.
00:16:54
Speaker
And this is how those business concepts are used in our rules and our requirements. And so, yeah, the idea was, why don't we just take those and embed them in the specs?
00:17:06
Speaker
and yeah The big question there is surely how? Because the centralized spreadsheet, that's equivalent to your one department besides the domain model for the entire organization.
00:17:19
Speaker
Are you pushing that down to individual teams? And surely then you still have an agreement problem? Well, the agreement problem is a little bit different. So you agree on... you agree on semantics, you agree on the on on the definition of a business concept, right?
00:17:35
Speaker
and And that's a little bit more difficult to be contentious about because when you think about the way that we build integration, when like when an engineer sits down and they map two different fields, like, you know, two...
00:17:47
Speaker
two different fields from two different systems together and they say, this thing connects to that thing. and Maybe they'll do it by like firing up Postman and and looking at the results that are coming in and kind of eyeballing the data and go this field and that field are the same thing, so I'm going to write some code to glue them together.
00:18:03
Speaker
That's not contentious because it's just a fact. right This thing and that thing are the same thing. or or they're not. and That's also a thing that you should capture. right If this thing and that thing look the same, but maybe you're missing some some context around it. right um like They both look like dates, but one is um you know maybe one is the date that you moved into a house and another is the date that you started paying the electricity for that house.
00:18:27
Speaker
right these are These are things that they're either the same idea or they're not. And it's not as contentious as, well, how do I model the concept of an account holder for the house? Like that is a contentious topic.
00:18:39
Speaker
um But the semantics of the data is is not is not contentious. And a lot of teams are kind of, a lot of organizations are already thinking about this by having things like data catalogs, or if you go back even further, having things like your Bitquitist language, right? Your Bitquitist language is not a you not a new idea.
00:18:57
Speaker
um And it's just a way of saying, hey, these are the these are the ideas of our business.

Semantic Taxonomies in Data Integration

00:19:02
Speaker
And it's important that we understand what these are. Right. Like the account start date, like the date that you moved in versus the date that you started paying, paying for the bills.
00:19:11
Speaker
It's important that we understand the difference between those things. And it's not contentious. It's factual. OK, tell me how you actually do that, because, okay let's say we've got five departments and they all have different names for my social security number.
00:19:25
Speaker
Yep. Pick something universal. They all have different names, but you recognize that that's one semantic concept you could potentially join on. Who is, how do you get that? Do you have to have those five departments agree and say, this is semantic, semantic concept number 12?
00:19:44
Speaker
I think i think um ultimately you need to get to agreement, right? But I also happen to think that if you try and get five departments to come together and you say, right, write down all of the ideas of your business, um ah that's kind of a recipe for failure. So I think like any good project, the way that you get there is incrementally and iteratively.
00:20:03
Speaker
um And generally what happens is, is yeah, you are introducing a new library of ideas. um And you know in our software, we call that a taxonomy, right? Which is, it's independent of any given system. It is just a collection of nouns that describe ideas from your business.
00:20:21
Speaker
First name, in your case, search social security number. um and And you just assign an ID to it. And it's not, sorry, a name to it. And it doesn't have to be the same, like we don't have to have the same field names because what we're going to do is add that as additional metadata on top of your domain model. So what you call is SSN and what I call is social security i number and someone else calls it, you know, SSID or whatever. What we're going to do is we're going to say, yeah, like fundamentally that is this piece of information. So I'm just going to go to the taxonomy, look up,
00:20:54
Speaker
what is the name and our kind of ubiquitous language for this? And I'm just going to say out this field name is a social security number. And it's, it's that idea from our kind of central business catalog.
00:21:05
Speaker
And you just do that incrementally and iteratively. but um But who's taking responsibility for that? Is it one is it all of those five departments or is it the sixth department that wants to use the information?
00:21:16
Speaker
I think that's, I think it's a question of maturity, right? Like the, um, the first bank that we did this with, they had a team that was directly responsible for kind of curating a data catalog of ideas. and so That was the team that had pulled together the spreadsheet and that was a team who said, yeah like here's 100 different ideas that describe our our business.
00:21:40
Speaker
um That's kind of like an ultimate level of maturity, but I think you can get a long way by starting much simpler than that. and You have a shared code repo. like For us, all of this stuff lives in Git.
00:21:51
Speaker
you have a shared code repo and a nice browsable catalog that sits over that. so When you want to start like sharing your data semantically, um You have a catalogue that you can go to and you can look it up and say, right, well and what what is the name that we're using for social security numbers? Like, what is the concept in the in the um in the taxonomy for that?
00:22:12
Speaker
And browsing that and finding the funding the tag and then adding that into your add that into your stack. What does that look like then? If I start writing this, I... Just take me through the files that will actually get written to describe this. so there's So this is where we kind of get into taxi.
00:22:30
Speaker
um So taxi is a type system for kind of system agnostically describing semantic types. It's a very kind of simple type system that lets you say, hey, we have this idea of um first name, of last name, and first name is a string, and last name is a string, and social security number is...
00:22:51
Speaker
string, I think, isn't it? i ah Yeah, it's formatted digits, isn't it? Yeah, so maybe it's a bit stringy, yeah. um ah And so you just create a taxi project that just kind of declares those ideas and you add some nice markdown documentation to say this is what this idea is.
00:23:09
Speaker
and um And that's the thing that because these are not systems, because they're the the semantic terms of our business, it's ah generally a thing that you work on collaboratively, but it doesn't change a lot. like We don't often change.
00:23:24
Speaker
There's a bit of activity when it's first getting going, but then it settles down pretty quickly right because the semantics of our business don't really evolve a huge amount. um So you have a taxi project that has those ideas sitting inside of it.
00:23:36
Speaker
And then each of the individual teams that are wanting to add the semantic metadata into their systems, but they do it with whatever system they're already using. So teams that are using Avro, you know, they add...
00:23:48
Speaker
metadata into their Avro specs. Teams that are using Protobuf, they can add ah metadata into their Protobuf. Teams that are using OpenAPI. and All it is, it's just a little bit of additional metadata that says this field is not just any old string, but it is a social security number as defined in this taxonomy.
00:24:05
Speaker
um and it's just semantic metadata. So it's everything that our specs were before, plus a little bit more, plus a little bit more that talks about what is this information um and how does it relate to other ideas across across the organization.
00:24:21
Speaker
Okay. how I've got a couple of questions on that. The the first one is, if im let's say I'm doing that and I persuade four out of my five departments to to publish this data fairly quickly.

Decoupling Data with Semantic Metadata

00:24:33
Speaker
yeah Am I locked out of querying the fifth? Or can I just put a stub in on my side until the fifth one is ready? Yeah, I mean, you can um you can do that on their behalf, right? if If the fifth team is just not it' just not ready. And a good example is um is external data, right? Like if you're pulling in external data, you're not going to go up to Stripe and say, hey, um would you mind embedding these tags that describe you know esoteric banks' concept for social security? But like that's not going to happen. So you can just polyfill it yourself. Okay.
00:25:05
Speaker
probably fill it yourself um But there's a bunch of reasons why producers don't mind doing it. like it's not It's not actually particularly burdensome to do it.
00:25:17
Speaker
And then for teams that are connecting to the um to their APIs semantically, it it actually decouples producers and consumers, which for the producers of the system, frees them up.
00:25:31
Speaker
um because the whole the whole idea around this, the whole kind of why, was not to create some massive data catalogue that tells you what every field and every different system means.
00:25:42
Speaker
That's a benefit that you get from it, but that's not the real reason. and The real reason is to decouple producers and consumers so that consumers can ask for data semantically um without really having to be aware of which producers serving it, and then the middleware becomes responsible for it. And so the the benefit for the producers is they can then start evolving their APIs without breaking their consumers.
00:26:06
Speaker
but So today when we build integration traditionally, you give me your open API spec, I chuck it into my IDE and I generate a client, and we are tightly coupled.
00:26:17
Speaker
but Your spec and my my client must evolve in lockstep. um you know Or you can introduce you can kind of go through the gymnastics of how do you version a REST API.
00:26:29
Speaker
um But yeah, um yeah like these these things have to evolve to together very, very, very carefully. But if you can decouple them and the consumers are consuming data semantically and the producers are producing data semantically, then you become coupled to the semantic concepts.
00:26:47
Speaker
which actually don't change very much, right? Like the idea of a social security number doesn't change in a business nearly as frequently as the name of the field that models it or where you stick that field in your JSON structure.
00:27:01
Speaker
Like that changes all the time. But a social security number semantically as an idea just doesn't change very much. So for you as a consumer, you say, hey, i i I want to know about Chris and I want to know his no um his name, age, date birth and a social security number and where that comes from and how I get that data and which fields the producer wrote that in are not super interesting to me as a consumer. I just want like i just want my data vended.
00:27:26
Speaker
um Yeah. So I'm thinking, so the unit of decoupling here, it's like there are a lot of people in the Kafka world who say decouple these producers and consumers by putting the data in one format in one place.
00:27:40
Speaker
not this you know Let's stick everything in Avro in a long log. What you're saying is let's decouple it by focusing on individual meanings of fields. Well, yeah, I would i would argue um with all the love in the world that that Kafka example, they're not decoupled, right?
00:27:57
Speaker
um They're very tightly coupled. they're tough They're coupled to the Avro spec. So you might not be coupled to the actual producing system, but you're coupled to the contract that the producing system evolved. You can't rename a field in Avro and then without breaking the consumers, right?
00:28:14
Speaker
um If i yesterday wrote my data to a field called SSN and today I write it to a field called social security number, my consumers need to be aware of that change and they need to evolve with it.
00:28:25
Speaker
So they are tightly coupled. Yeah, I think there might be support in Avro for straight renames, but it quickly gets possible. but then yeah But there are cases where it's more difficult.
00:28:38
Speaker
um Yeah, or if you if you want to if you want to um if you want to you know decompose data, if you want to reshape it, put it into different locations, if you want to nest it in different ways, right? like you need to the consumers need to be aware of those changes because they need to know where they're reading this data from.
00:28:57
Speaker
Yeah, yeah. And certainly, it's like Avro, if it has support for renaming fields, they are definitely thinking at the field level, the meaning of individual fields. Right?
00:29:09
Speaker
Yeah. in In this semantic sense, oh, we've renamed it, but it still means the same thing. And that's what you're getting at. Yeah, exactly. exactly Like the the meaning of the field, the meaning of the data that's sitting in the field hasn't changed.
00:29:22
Speaker
Okay. So I write or get other departments to write all these files that ascribe meaning to particular fields. they They ascribe, so teams publish um the idea of a taxonomy, which is these ideas exist in our business.
00:29:40
Speaker
And then ah can then publishing teams add that metadata into their data contracts, right? They say, this field is a social security number, this field is a first name, is a last name.
00:29:51
Speaker
So they are consume they become consumers of the taxonomy. yeah Yeah, okay. and it And it would end up being something not unlike Swagger, but machine readable. In that we're annotating all our APIs with.

Optimizing Data Retrieval with TaxiQL

00:30:05
Speaker
Yeah, I mean, it is it is Swagger. If OpenAPI is what you're using today, you carry on using OpenAPI, but it's just like OpenAPI has support for custom metadata.
00:30:13
Speaker
So just add additional tags in there. but So you you're not migrating away from OpenAPI. You're still using OpenAPI or Avro or Protobuf or whatever. You're just adding additional metadata in there.
00:30:26
Speaker
Okay, then I think the next place we have to go is how do you use it? There's some kind of querying, query execution layer to this, right? Exactly, yeah.
00:30:38
Speaker
So um if you think about, yeah so the way that it works is is consumers of data write a ah query, right? And that query is basically a data contract. So they say, right, okay,
00:30:49
Speaker
given this thing I know, so um i know you know Chris's customer ID, for example, um and this is the data contract that I want satisfied, and I define the the shape of the data that I care about.
00:31:03
Speaker
And so I published that up to this query language this query execution layer, and it's that layer's job to go, okay, well, how do i how do I satisfy that? Like, which systems do I need to talk to? And how do I transform the data that's coming from those systems to build together an object that satisfies that specific data contract?
00:31:21
Speaker
And so in that regard, it looks and feels remarkably like GraphQL. um but GraphQL already has this idea of consumers get to, you know here is a schema that is potentially composed from lots of different data sources.
00:31:36
Speaker
um And consumers of data get to, in the case of GraphQL, consumers of data get to cherry pick the fields that they are interested in. So here is one graph and you get to pick a subset of fields that you ah you care about.
00:31:51
Speaker
yeah um We take that ah another step further where actually the consumers don't have to become tightly coupled to this one global schema. Consumers just publish a data contract that they care about.
00:32:02
Speaker
So this is the data that I care about. um I publish it up to some kind of GraphQL-y kind of thing. it's ah It's a taxi query language, sorry, taxi query engine. So they publish it to the taxi query engine and that satisfies that data contract. And that might be calling a couple of REST APIs and querying a database and, you know, doing some gRPC calls and then stitches the data back together and gives it back to the client.
00:32:27
Speaker
So it's looking more like GraphQL than say something like SQL, where I not only specify what I want, but where I expect that data to be found. Exactly. The whole point is to decouple the consumers of data from having to know where the producers are, um which to your point about Kafka is exactly the same goal there, right? Like producers are dropping messages onto a queue or onto a topic and the consumers don't really need to know where it from whence it came.
00:32:55
Speaker
They just know where they can go to go and get it. So how I have to ask, so how does that resolve if I say something like... um here's Chris's social security number, but and I want that first name and last name.
00:33:12
Speaker
And there are several competing systems in the bank that will offer up that information given the right social security number. but How do you choose which one to resolve it from? um So typically...
00:33:25
Speaker
but it It depends, right? There's lots of there's lots of different strategies. um You can, just like with any kind of with any kind of um query, you can control that. So there are hooks for you to say, right, I want data specifically from this system.
00:33:39
Speaker
that That's kind of a trade-off because you are then saying, right, I only want, you know, the consumers have become aware of the producing systems, but it's not the end of the world, right? So you can say, yeah I want social security data, but I only want it from the golden source, for example. Mm-hmm.
00:33:54
Speaker
um you can um You can also add constraints into the into the data contract. So you can say, i want um i want Chris's social security number, but I want it where you know ah where the age of the data is less than 24 hours old, for example, in case you're rapidly changing ah your social security number.
00:34:16
Speaker
Anyway, I'll stay ahead of the system. I wish we hadn't picked that as an example, because the whole this whole thing feels like an identity fraud like podcast now. And now by saying those words, you flagged us in a database. Thanks very much, Martin.
00:34:33
Speaker
Yeah, so like out of the gate, the query engine is designed to optimize for, given given all other constraints being equal, the query engine will optimize for the fewest number of hops, right? So what is the what is the fastest way of resolving this data product on a field-by-field basis?
00:34:52
Speaker
And it applies kind of sensible things like caching and and um and, you know, if one given route doesn't work, then it will fail over to the next. So if if if you're happy to just leave it entirely up to the query engine, it will, by default, um by default go for kind of the shortest possible path.
00:35:11
Speaker
when we when we When we built the query engine, we built it so that those heuristics were pluggable um because we thought someone would come along and they would say, well, actually, i want... um You know, these consumers get access to high quality data.
00:35:25
Speaker
And so this is a real thing, right? Like, in and especially in banks, you go, right, these consumers get access to this data, which is up to the second, but every read costs us a lot of money.
00:35:37
Speaker
um These consumers, yeah they don't need it to be as fresh. And so they get access to this other stuff. So the the intention is that those kind of algorithms are tunable. No one's ever done it, so we don't know how well that's going to work when when that requirement comes along. But the intent is that those heuristics are tunable so you can and kind of massage the execution engine.
00:35:58
Speaker
right yeah But, you know, without that, you can say, right, I want it from, i want to restrict it down to these sources or I want it from any source but not these sources or I want it where the data satisfies these specific constraints.
00:36:11
Speaker
So you get like a lot of levers that you can pull. So i'm another question another thorny thing in that is like data cardinality, I'm thinking. right So I'm trying to come up with an example here. Imagine you are querying a system and you want each customer and their address.
00:36:28
Speaker
That's fine. this Customer, nested address. Easy. One day we add in the high net worth investors list. database and now they want to return all the different addresses you have because you're very rich and you live at several places around the world and in their world one person maps to many different addresses.
00:36:51
Speaker
ye How do you resolve something like that? um Probably not as well as we could i think is the honest answer at the moment. Like an honest answer? And I i think...
00:37:02
Speaker
you know, as we've been evolving the platform, we have seen different places where we the language hasn't the language has kind of struggled a little bit. And I do think that cardinality, especially kind of the idea of going from a one-to-one, one-to-many, many-to-one kind of thing, like it's it's fine for um it's it's fine for saying, um you know, give me a collection of addresses. But where a consumer has previously expected there to be one and now there are many, know,
00:37:32
Speaker
that would at the moment result in an error.

Handling Data Cardinality Changes

00:37:34
Speaker
and I think ah definitely think that's a space that we would we would want to kind of evolve and and work out what better looks like for that. um I think the challenge there is around um is around where there are many, where previously there was one, which one do you pick?
00:37:51
Speaker
and you know We don't have a great way of of knowing that. There are ways of expressing that to say, um you know I want to filter this down to only pick the primary residence. you can express that in the query language.
00:38:03
Speaker
um But that it requires you ahead of time to know that it was a one-to-many relationship. And I think in your in your example, it was kind of one-to-one and then changed. So yeah I definitely think it's ah it's a space that we could probably evolve and do better with.
00:38:17
Speaker
yeah That's in that I'm going to pick on this, even though you've admitted that it's perhaps not a strong point, but what a bully. and I'm sorry, but I'm curious. My curiosity leads me in these things.
00:38:30
Speaker
If you were, ah if I wrote a query that only touched two of the regular systems and I got back the single address for each customer, and then I added one field that I didn't realize is now going to tag into the high net worth database. Yeah.
00:38:46
Speaker
Will it suddenly change the address field just because I've added added in one column without realizing that it matters? No, because the the query engine is is doing, it's basically doing a graph navigation, right, to work out which data sources to to go to.
00:39:01
Speaker
So um adding in a field to pull in a data from from another source doesn't pull in all of the other data that was available at that source, right? um But I think going back to the earlier point where um if across the graph, um historically there was only you know one place to get address from and it was a single value, and now there is another place that you can get addresses from and it's a collection, when that kind of change gets introduced, then yeah the the graph doesn't behave as well as it should.
00:39:36
Speaker
and Okay. okay So what does this language look like? Does it look more like GraphQL than SQL? Or is it its own thing? Yeah, look um it looks like, on the query language, that it feels a bit like um like ah GraphQL object. We have, so Taxi has an expression language that sits inside of it.
00:39:57
Speaker
So um i think this is something that GraphQL kind of has, but, you know, it's a little bit, you know, it's a little bit awkward in terms of everything's kind of a decorator or an annotation. And it's so it's kind of there. Whereas in in taxi, we have a very rich expression language that lets you do kind of arbitrary things. For example, you can say, right, I want ah give me you know, give me the customer's name, but um I want the customer name transformed to uppercase.
00:40:25
Speaker
So you can embed like kind of transformation logic inside your data contract um and then you get the benefit of that transformation logic running on the on the query engine rather than on on your consuming client.

TaxiQL and GraphQL: A Comparative Insight

00:40:37
Speaker
And that their logic can become really quite complex as well. So you can say, right, I want i want to know... um you know, you can define a concept that says, right, the profit for a given instrument or for a given thing is the wholesale price plus the cost goods minus the apportioned warehouse cost.
00:40:59
Speaker
and I don't know. um That's kind of the concept of profit. So you can define that as a semantic concept, and then you can say, right, I want to know um this product, want to know what the profit is, and you get the value back. But the semantic layer has described all of the different um logic behind that, and so you get kind of consistent evaluation.
00:41:18
Speaker
Right, yeah. So it's you know the it looks and feels a lot like GraphQL, um but the underlying language is is is quite a bit richer.
00:41:29
Speaker
Yeah. ah yeah If I defined profit, just to check, I understand this, if I to defined profit the way you've described, and I queried one system and got the profit back and then queried another system which had all the same fields with different names, would that automatically resolve correctly?
00:41:46
Speaker
Well, the resolution logic would be applied consistently across the graph, which is slightly different from what you're asking. But um ah across across the graph, um because if you if you ask the system, you're not saying, um I want to know,
00:42:03
Speaker
I want to know the profit as defined in this Postgres database, you're saying across the graph here is the definition, like the semantic definition of profit. And for a given product, it's calculated, sorry, profit is calculated using this formula.
00:42:17
Speaker
So when i ask for information about a product, if I say I want to know the profit, What that might do by adding that one field into my query, it might mean all of a sudden now we need to talk to seven other systems because we need to go and get the cost of the warehouse rental and the cost, like whatever the different things are that are coming into it.
00:42:34
Speaker
So that evaluation will get applied consistently every time you ask every time you ask the question. yeah um Because we're not talking, we're not the semantic concepts are not about reading a specific field from a specific um table, for example. It's about saying semantically, this is how this concept exists.
00:42:54
Speaker
And then when you furnish that idea, um it's up to the it's up to the graph to say, well, this is the places that you can go and fetch that data. Okay, yeah. So that would suggest to me that if I ah think of horrible problems I've faced in the past.
00:43:08
Speaker
This is so fun. Yeah. i This is like a game show that will be watched by very few people. We ask the guests difficult questions against the clock. Okay, so you can imagine in a company like that saying, oh, we're all moving to SAP or whatever, and we're building in a new system that contains all that data in one place.
00:43:30
Speaker
The day we switch that on and switch the old systems off, am I going to expect my taxi queries to just magically carry across? Yeah, it's yeah that that's the way it's supposed to work. Yeah. um assuming um Assuming that the data is available, it doesn't have to come from the same place, but um assuming that the data is available, then yeah, it will it will switch across.
00:43:53
Speaker
And then that kind of... um So all of the all of these questions are like ah super fun, right? And they they come up all the time when we start working with with new teams, right? And it's things like... um what if like what if we tag that value wrong?
00:44:10
Speaker
Like what if a team tags takes a number as but takes a field as the date moved in and it's actually the date that they became responsible for the account, right? And they they get that wrong. yeah And like that's a really common question that comes up when we start working with new teams.
00:44:25
Speaker
and And it's like, yeah, like what if you do get that wrong? like That's a bad thing, right? like We can agree that there's real problems if you use this data incorrectly.
00:44:37
Speaker
um and like if you If we compare that to the way that we build integration today, the people who make those decisions today are not the people who know the system, But then there's people who are consuming the data from the system, right? Because every time you build an integration, you start from zero.
00:44:52
Speaker
You go, right, I've got to go and pull in the data from the account system. There's these two different date fields. Which one am I going to read? And the the guy that's building the integration goes, well, I'm going to read that one. And but what if they get it wrong, right?
00:45:06
Speaker
And yeah when we build integration today, we make that decision over and over and over and over. Every team that builds that integration kind of goes, yeah, well, but I've got to pick the field and i'm going to pick that one and you just hope that they get the right one.
00:45:19
Speaker
And what this does is it inverts it and it says, well, the producer becomes responsible for saying, hey, cement this is my c semantic contract. um So this field is relates to this business concept. So as long as you know the business concept that you're asking for, it doesn't really matter what the field name is.
00:45:35
Speaker
And you know we see it in legacy systems. A lot of the time they're like yeah know completely nonsensical names, right? Yeah. Yeah. Depending on how far you go back, you back to the mainframes it's like three character codes, right? Yeah, exactly.
00:45:49
Speaker
um And so, you know, in the example of kind of switching off one system and and switching on another, like, yeah, assuming that you got all your semantic modeling done correctly, then yeah, it will just kind of switch across.

The Role of Semantic Contracts in Integration

00:46:03
Speaker
But that feels like such an important decision. that it should be left to the teams that author the systems and know the systems to make. and that should Those should be decisions that we make once, categorizing our data. in fact, defining products, defining contracts for our data and our API specs is a thing that should be done once by the teams that publish the API specs. right They're the ones who know the data. it's the same reason that you know, OpenAPI exists so that teams that are authoring these API specs can say, this is the contract of my API. Yeah, yeah, yeah. And then we don't leave it to the consuming teams to kind of eyeball the JSON and go, oh, I guess I can work out what the contract I can reverse engineer it, right?
00:46:45
Speaker
Yeah. So that same responsibility, like it sits with those teams, it sits with the teams that know the data the best, they're the best placed to describe and and, yeah, describe the data.
00:46:58
Speaker
That makes sense. And I've already asked you this question, I'm going ask it again and labour the point because I want to be doubly sure. If I'm in the situation where i'm building up this system and I find that the producing team is busy, overloaded, reluctant, can't get their act together, and I as the consumer write this taxi definition,
00:47:17
Speaker
I can then down the line say to them when they're less busy, would you like to adopt my taxi definition? Exactly. evolved in that yeah Okay, so I'm not blocked on them defining the semantics. I can just benefit from it.
00:47:29
Speaker
Yeah, exactly. Okay. Then I think we have to get into how you actually execute one of these definitions. It's not always obvious to separate out like the query from the query execution, but here it seems like a very natural partitioning.
00:47:46
Speaker
When I give you a query that says, give me these four fields, how do you actually execute it? um Okay, so it's super fun. we have So before that happens, we've kind of consumed all of these different API specs with their semantics. So a taxi server has, it's got the kind of system agnostic taxonomy. So that's just your collection of nouns that's defined in taxi. So first name is a string and last name is a string.
00:48:15
Speaker
So it's read that probably from a Git repository somewhere. And then it's got a bunch of different, you kind of systems that have published their API specs with these semantics inside of them.
00:48:25
Speaker
Right. So, and is that just a list of endpoints where it can find the latest definition? It depends, right? Like it's a combination of endpoints that we're reading from. It's Avro files chucked in ah in a schema repository somewhere. It's Git repositories that we're reading. It's all of those different things. okay You know, what one of the goals of Taxi when we set out with this was that we really wanted to meet developers where they are, right? Whatever your tool set and whatever your way of working is,
00:48:56
Speaker
you should be able to carry on doing that, right? You shouldn't have to rebuild your stack or adopt a new way of working. Yeah, that's both nice and the reality of, like, you can't get banks to change just for you, right? Well, yeah, and and also, but this isn't just banks, but large organisations have...
00:49:11
Speaker
different ways of doing stuff. Different teams follow different get branching strategies. They follow different, like they've got different technologies, so they follow different conventions. The C-sharp team will not follow the same conventions as the Go team or as the is the Java team, right? Like they just won't. um And the tools are different. So you've you got to meet them there.
00:49:31
Speaker
if if And I guess this is one of my challenges, like the way that GraphQL has approached this is like, you have to, the producing team is going to have to adopt one standard for this to work, and that feels like quite a bit of friction.
00:49:44
Speaker
but we just If you're an Avro team, you should carry on being an Avro team. If you're a protobuf team, like we don't care, but carry on being a protobuf team. Our job is to work in that within that stack. So okay you you have these specs, and so like the query engines kind gobble them all up, and internally we've built this massive graph. So the API specs tell us kind of where a system is, like what's its...
00:50:08
Speaker
you know, what's its IP address, what port is it on? um It tells us what protocol to speak, like is it JSON or is it protobuf or is it Avro? is it So we know kind of where you are and how to talk to you through your API specs.
00:50:21
Speaker
yeah And then the semantics tell us what that means, right? and So it tells us then that um you know and Like an example that I use all the time is you know given a customer ID, I want to know their account balance, and that is maybe split across three different systems. There's a team that goes from say yeah there has customer information where i can look up a customer from a customer ID, and then there's a cards team that has kind of um the card information, and there's another transaction system that has balances. and so We've got these three different specs that talk about this data and how how it kind of connects.
00:51:00
Speaker
And so we gobble all that stuff up and then we build this big, massive graph that that then we can do know relatively unexciting like graph traversal, right? To say, given the set of things that I know, like you know Chris's customer number, um this is the data that i want to find out. I want to know these 20 different attributes.
00:51:21
Speaker
And we can just do kind of graph traversal from the thing that I know to the thing that I'm trying to get to. And if we if we traverse that semantically, Along the way, we can also find out that in order to get from, you know, Chris's customer number to his account balance, we have first call a system to get his card number and then call another system to get his account balance. And that might be ah database lookup along the way or, ah um for you know, soap request that we do along the way. You're not allowed to swear on the podcast. Sorry.
00:51:55
Speaker
um Yeah, and so it becomes like the process of how do I execute this query is it's it is just graph traversal, right?

Graph Traversal Techniques in TaxiQL

00:52:03
Speaker
It's like... Yeah, yeah. Those algorithms we were supposed to have learned at university. Yeah, it's it's just it's plain old shortest path.
00:52:09
Speaker
Can you do things like like you must end up with situations where you gather all this data together and you find that you can get this field from three different systems and you try and sort and optimize the result to say two systems would be enough, so let's just go to two.
00:52:27
Speaker
Yeah, but yeah so we're doing we are doing the shortest possible path. So across across the query, we are looking to optimize. um Again, assuming that both pieces of information are semantically equivalent, which is really important, assuming that they are both semantically equivalent and there's no constraints that say you must read from the system or it must satisfy these other kind of attributes.
00:52:48
Speaker
If there are genuinely two candidates to go and pick from, um it will optimize for the smallest number of network hops. Right, network hops specifically. Well, yeah.
00:53:00
Speaker
Well, I mean, yeah, we are part, I mean, like in any good graph, we apply ah cost across each of the different transitions from um you know across the graph.
00:53:11
Speaker
And we apply ah higher cost to a network hop than we do to, say, reading a field from a JSON blog that we've already got in memory. um Okay, so that implies costing models and caching. Tell me about costing models. Is it just number of network hops or do you do things like... the You mentioned this is an expensive system to talk to or this is an overloaded system. Try not to go to that one. Go to one of the read-only copies.
00:53:41
Speaker
Yeah, so we we want to do that. um And like I say, we're kind of built with the expectation of doing that kind of stuff because it's a it a cool problem. um we just want we just want a use case to come along.
00:53:55
Speaker
um So if you're out there and have a use case like that, like yeah we want to we want to solve for that. The intent too is to be able to do that dynamically, to be able to say um like over time, this there are two different systems that I can go to and I noticed that performance on system A is slower than performance on system B.
00:54:11
Speaker
and so i will you know gradually send for a while, send more traffic to system B because because I'm getting better performance results from that. and so kind of Dynamically adjusting the the performance of the graph as you're going. so It's designed to handle that kind of stuff.
00:54:27
Speaker
We haven't built those implementations because we want a real person to work with to to do it. um ah you know, a real a real use case. But they are really fun problems.
00:54:39
Speaker
Like, yeah that's that kind of the heart of it. and And getting to a point where, and again, where you can where consumers can say um consumers can say, I want this data and I am happy for my request to take an extra 30 seconds because it doesn't really matter.
00:54:57
Speaker
um versus I want this data and I need my answer as fast as possible. So consumers and a lot of the time have a say in that, right? That's part of the data contract that they they're submitting. And yeah the the teams ah teams that kind of build and think about data contracts today, data contracts is a really exciting space. there's a lot of stuff going on in there.
00:55:17
Speaker
And they are thinking about like these data quality attributes around data like timeliness. and And, you know, the response times is another one of those, right? I'm going to ask this thing and I want this data back as fast as possible or um i want it.
00:55:30
Speaker
And so, you know, you can cache this aggressively or I want it as up to date as possible and I'm happy for that call to take longer. So those are the kind of you know, quality attributes that in theory a consumer should be able to say. And then likewise, a producer might want to be able to say, well, um you know, there are rate limits calling me or you get rate limited at a different rate.
00:55:51
Speaker
You know team A gets rate limited at a different rate than team B. And so ah a query engine should be able to consider those kinds of heuristics and tune its graph accordingly. Yeah. Okay.
00:56:02
Speaker
Where is the query engine running? Is it like one centralized query engine or am I expected or am I running a query engine locally in my process that's doing the consumption? you can do ah You can do it in lots of different ways.
00:56:16
Speaker
right um So ah the query engine itself is it's written on the JVM. If you want to, you can actually embed that in your microservice or in in your consumer if you're built on the JVM.
00:56:30
Speaker
um So you can entirely like distribute this stuff. You still have the problem of, and he ah i need like a central registry to go and fetch all of those schemas from. um ah But that's a solvable problem, but you can kind of embed it or you can run this on a Lambda.
00:56:46
Speaker
So you can have a Lambda that scales out infinitely to handle the request execution. um Or, you know, um we have a product that we build, which is Orbital, which is our kind of, know, our commercial version of the TaxiQL query engine.
00:57:01
Speaker
um And that's a thing that you can, you know, scale out as well. So you can send it to that. So there's lots of different query execution models. You should definitely use orbital because that's the one that pays the bills and keeps the lights on.
00:57:13
Speaker
i'm least honest But if you want to, yeah, you can embed a taxi query engine directly inside your like spring boot app, for example. Right. And that is open source.
00:57:23
Speaker
Yeah. Yeah. Apache too. Okay. um JVM, written in... i think I think I've heard you say this before. It's written in Kotlin, right? It is, yeah.
00:57:34
Speaker
Why Kotlin? We're Kotlin OGs. We've been building on Kotlin since like since before it was cool, man.
00:57:43
Speaker
Why Kotlin? Well... um ah I'm a JVM guy, have been for a really long time. um Kotlin has, it's just beautiful ergonomics for working. it like It's a super productive language.
00:57:59
Speaker
its um It gets out of your way, it's really functional. It's just a fun language to program in. I wish there was a better answer than that, but... You know, the JVM is really fast. so It gets a lot of hype, but man, it's fast. And there's some really cool stuff around like native images that means that memory consumption is kind of, and startup time is not what it used to be.
00:58:20
Speaker
um So it's it's a really cool ecosystem to be in. Yeah. Yeah. Okay. Just love of language on that platform. Yeah. Fair enough. Yeah. Yeah. That's good enough reason. and it ah It really is a great language. ah um Kotlin, a lot of people think of it as like an Android language, but like it was a it was a server-side JVM language, like first and first and foremost. and you see like And you see like the Spring Team embracing Kotlin really heavily. it's It is just a great <unk> a great language.
00:58:50
Speaker
I heart Kotlin. What do you like about it? Give me your favorite thing about it It's like I say, it's very succinct. but So you get all the performance of the JVM, right?

The Choice of Kotlin for TaxiQL

00:59:00
Speaker
So it's fast.
00:59:01
Speaker
um But it's very ergonomic. It's very succinct. so you can um And I will say that that Java has caught up a lot, right? So I think i think we were on like... maybe Java 11 Java 17 when we started working on this stuff, which is a lot of releases ago.
00:59:17
Speaker
Yeah. um But Kotlin has this idea of of data classes. think none of these are um None of these are like brand new ideas, but they just kind of came along at a timely way. So data classes, which is a way of just saying, hey, this is immutable and um and you have the equals and hash code, all the kind of boilerplate stuff, um the system will worry about that for you. like The compiler will deal with that for you.
00:59:42
Speaker
um So it's it's very succinct in that way. It's got you know strong ideas around immutability, strong ideas about nullability. um It's got extension functions that allow you to write so write things that are very... um that become very kind of fluent and you can you can take a class and then you can add an extension function later on, which lets you do nice things with collections and ah ah it's just a great language.
01:00:10
Speaker
Okay. I feel that. And I think you have to give ah Kotlin and Scala credit for like getting Java to get out of its doldrums and start innovating again, right? Yeah, absolutely. I i think you know there was this big dearth of innovation where kind of Java didn't move on from 8. And I actually also think... ah know I don't know. I've got no evidence to support this.
01:00:36
Speaker
um But I think... I think that the JavaScript and TypeScript community deserves some praise here as well because I think JavaScript and TypeScript as another general purpose kind of easy access programming language was adding things and really kind of helping i think drive functional programming into the mainstream at the time that Java was still very anonymous classes and lots and lots and lots of boilerplate.
01:01:01
Speaker
so i think you saw like JavaScript and TypeScript are really kind of driving functional programming as like a mainstream concept. And then Kotlin came along and and um So did Scala, and they kind of were embracing these concepts on the JVM and with varying degrees of ceremony. And I think what Kotlin did really nicely was they made functional concepts really you know really easy to do well.
01:01:26
Speaker
you know It that looks and feels a lot like writing JavaScript, except you have you know types and a compiler that will help you if you make a mistake. um and So it was yeah i think I think JavaScript um sort of re-energized. and you know we saw At the time, we saw Node starting to get traction as a server-side language.
01:01:44
Speaker
Yeah, they must have felt the threat from that. Yeah, yeah they must have gone well. Because for ages, you would just laugh and say, well there's no way that JavaScript can really be fast. And then all of a sudden it was. And there's no way you'd build so you build a serious application in JavaScript on the back end.
01:01:59
Speaker
Then all of a sudden people did it and yeah no they they did it really well and it was good. and um yeah and so i i don't know. I have no proof that it has anything to do with the kind of revitalization of Java, but um I do think that people kind of looked at what was going on and went,
01:02:17
Speaker
there's some great ideas out there that are kind of in the mainstream and we should, we should adopt them and move them in. Yeah. It was definitely a sense that Java thought it was finished and then yeah various things conspired to make it realize actually there's more innovation left to find.
01:02:29
Speaker
Yeah. Yeah. yeah Anyway. Right. That's, that's your brief respite from being grilled over query query. Because one thing, um let me, give me you some more hard things to deal with in the query engine world. Right.
01:02:45
Speaker
You can easily imagine, we've touched on this with Avro, right? Imagine you've got a system where ah ah there are like, actually, no, but I'm going to get, ah I'm going skip back. There's another question I've got to ask you first.
01:03:00
Speaker
How do you deal with um like streaming data? Because I know you you've got in Kafka integration. yeah Yeah, we do a lot with Kafka, yeah. And it's a very different thing to go to a semi-static database and query stuff versus a log which has like the past billion records all just streaming in real time. So how do handle those differences?

Managing Streaming Data in TaxiQL

01:03:22
Speaker
We don't at the moment, um, support the idea of, I want to read a specific index from Kafka. And, and, ah you know, and I don't, a lot of people aren't like, there are people who use kind of Kafka purely as a log and apply compaction. And, um ah but also a lot of people are just using it as a pure event stream.
01:03:42
Speaker
So if you want to use it as a pure event stream, like that's fine. but We can consume an event stream and and you can say, right, I want to stream this data. um ah But again, consumers of the consumers of the data get to say, um I want to stream the data, but if this is the data contract that I want to consume.
01:04:01
Speaker
And i think I think in the world of streaming data and kind of event-based architectures, I think that often teams that are building these systems kind of have a choice to make around the way that they design their events, right? So producers have to go, well, um am I going to have like anemic events that describe the bare minimum pieces of information required that describe this event? So, um you know, a yeah user clicked a link, so we might might want to know the user ID and the
01:04:35
Speaker
I don't know, like four or five fields that kind of describe that event. Yeah, who clicked what and when, and that's the whole event. That's the whole thing, yeah. And then consumers of that data, um they need richer information. Like they don't need they don't just want to know the customer ID, but they want to know first name, last name, like what their customer status is, that kind of stuff, right? So consumers, in order to consume that data, they're they have this burden of of enrichment and different consumers will need to enrich in different ways. So they will consume the data and then they in order to make use of it, they need to query a bunch of other systems or do a bunch of other REST API calls or or join a bunch of other streams together in order to get the data that's meaningful to them.
01:05:13
Speaker
And so if you have these anemic events, the burden kind of shifts to consumers to to do the enrichment. Or if you have these, or you can flip that on its head and you have these really fat events where producers go, okay, well, in order for this event data to be useful, I'm not just going to tell you like the bare minimum three things. I'm going to pump it full with lots of information, which I think you guys might want to know. And so then the producer has to wear the burden of enrichment. And so they become coupled to all the other enrichment services that they need to talk to, right? But we may emit the data.
01:05:46
Speaker
So it's a trade-off. and I think what's really nice about um about the way that TaxiLets approach this is it's neither of those things. right The consumer defines the data contract that they want.
01:05:58
Speaker
The producer defines the data contract that they're going to satisfy. Where those two things don't line up, you know the query engine sitting in the middle ah has the job of enrichment. so The consumer doesn't have to know that in order to enrich this event, they have to talk to three other microservices and query an application database.
01:06:15
Speaker
um they just publish a data contract that says, I want a stream of um yeah i a stream of customer signup events, um and I want these 20 fields on my stream.
01:06:29
Speaker
and they publish that as a data contract. And then the query engine goes, okay, well, in order to get that, first of all, I need to find which system is streaming these events, and it's a RabbitMQ or it's Kafka or it's SQS or something.
01:06:44
Speaker
So going to subscribe to that stream. And then as each event comes, I'm going to enrich that to satisfy the consumer contract. And so the the producer and the consumer remain both decoupled from each other, but also decoupled from all the enrichment services.
01:06:57
Speaker
um So when those services change, yeah they don't have to go and fix their integration. um But the consumer is getting the data they want and the producer is publishing the data a way that makes sense to them. Okay, that raises the question of how you deal with time and versioning. Because like if i if I query a Kafka Stream and Taxi is enriching it with address data today...
01:07:20
Speaker
A month later, they've changed their address. I go and re-query the same data. Do I get the old events with the new address? Or do I get the old events with the old address?
01:07:31
Speaker
I mean, that's a great question. um we don't solve that problem out of the box. But again, if you think about the way that consumer the consumer would have to solve that problem, they also would need to... So like there's no natural sorry there's no um ah there's no there's no magic to magically solve that problem. But if a consumer wanted to be able to say, hey, I want, um you know in a bitemporal system, I want...
01:07:57
Speaker
you know, a customer signed up event and I want their address at the time of that event being broadcast. If they publish those requirements in the data contract, then when the data contract is finished, we will query the system to satisfy those constraints.
01:08:13
Speaker
But if you don't specify those constraints, then you know You just get whichever system is is publishing address data. And so what it does is it forces consumers of data to be really explicit about what their requirements actually are.
01:08:27
Speaker
but Do you want... okay and And again, that's that's the right place. consumers shit Consumers should be saying, this is the contract of data that I need satisfied. see sort of there, you're specifying the event date as a join condition on the address.
01:08:43
Speaker
i kind of I look at it a different way. without without having to i don't want to have to think about how this query gets executed. i don't want to get it, like for me as a consumer, I don't want to have to specify join conditions.
01:08:58
Speaker
What I do want to specify is um constraints on data. so I want customer signed up event and I want their address, but I don't just want any address. I want the address that was true at a specific point in time.
01:09:12
Speaker
So if I can express that as a constraint against the address object, um and Taxi has this expression language for defining constraints, um then the query engine will go, okay, well, I need to fetch address data, but i can't just fetch I can't just fetch it from any system.
01:09:28
Speaker
I have to fetch it from a system that is published that they can satisfy these constraints. So they can p they can give me address data, but also they can give me address data at a specific point in time.
01:09:39
Speaker
And so then the query engine can go, great, of all of these different places that I can go and get address data from, for this data contract, given this set of requirements, I will go and talk to that system. What kind of error do you get when that constraint isn't satisfiable?
01:09:54
Speaker
ah You get a but an error that says there was no data so i think it' says there was no data available or no sources available to satisfy this data with the constraints provided. Okay.
01:10:06
Speaker
So then I would have to relax the constraint or get a new producer. to supply. Yeah. Which it's not it's not really a and it's not really a query engine problem, right? that's ah There just is no data available to go and satisfy data for the constraints that you've that you've specified.
01:10:22
Speaker
Yeah. Yeah. Okay. That's fair. That's fair. but do you Do you have things like, um and this I suppose relates to the optimization and costing thing, But do you have things like the query the query plan comes back and says, go to this system.
01:10:38
Speaker
You go to that system and it's throwing a 500 error and you can't get it. But, you know, there's another system you could have gone to. yeah You just chose not to use that one. What happens there? We fail over and we try the next one.
01:10:51
Speaker
Transparency. Well, ah we also give you back full lineage. So one of the one of the nice side effects of having a query engine do all of this is in addition to but data, we will give you end-to-end lineage that says not not just which systems did I speak to, but for every given field in the response that was served,
01:11:14
Speaker
show me or show me all of the computations and all of the different systems that we spoke to, including the systems that we tried and failed for that specific field. So you get kind of real soup to nuts kind of um lineage on an attribute level, because you know in a data contract, each attribute might have come from a different system or it might be the result of the composition of seven different systems working together, like in the example of profit and loss that we talked about before.

Lineage Tracking in Data Queries

01:11:39
Speaker
So you can trace all of those different attributes back, all of those different calls back on ah on a per attribute level. So yeah, we will, um if there is more than one way of satisfying it and the initial path fails, then we will fall fail over and try the next path.
01:11:56
Speaker
you know, if there are retry constraints on on the system, um well we'll set a sportop we'll try on all of those retry constraints. And if that fails, then we'll go, okay, what else can we do? Where else can we look set it to satisfy the state of contract?
01:12:09
Speaker
Okay, that's interesting. the The lineage stuff, I'm guessing that's the kind of thing gets really popular with large enterprises. Yeah, yeah. I mean, lineage and the edge is this really weird thing, right? Like,
01:12:22
Speaker
um
01:12:25
Speaker
I think it's, I think i've got to be careful here because, um so I think it's, It's the kind of thing that large enterprises like to buy as kind of an insurance policy.
01:12:37
Speaker
but So they have an often regulated obligation in order to capture store lineage. ah regulated obligation to capture and install lineage um And oftentimes what happens is lineage gets written down into a system and they say, right, when we, but when we at design time, we're going to build this integration and we're going to write down all of the different systems that it speaks to and we're going to stick it in our lineage system and then we're going to lock it down and we're going to make it really hard to evolve and to change because if that lineage is wrong, we're in trouble with the regulators.
01:13:08
Speaker
yeah um But over time, things happen and they drift, right? And so I think like design time lineage, is often, it's like that old adage, right? Like the one thing worth the no documentation is out of date documentation, right?
01:13:26
Speaker
And the same is kind of true of lineage. if if it's If it's a design time artifact, not a runtime artifact, how do you know that it's still correct? like how did And the thing that's ironic about that is the companies that spend huge amounts of money on lineage systems, which are kind of captured at design time,
01:13:46
Speaker
um ah the companies that have these massive fines if they get it wrong, um but don't really have a great answer for the question of, well, how do you know that that's the same as the logic that's written inside the script, like the java the Java application that is running is generating this this data? Like, how do you know that it's correct? correct And a lot of times, a lot of times they don't because like design time lineage, it's kind of there's a certain degree of hope mixed in that.
01:14:15
Speaker
I just hope that things haven't drifted. Yeah. I hope things match the spec document we wrote 18 months ago. good luck with that Yeah, exactly. And, and you know, how do you know that it, how do you know that it hasn't drifted?
01:14:26
Speaker
Yeah, oh almost certainly you know that it must have done. Yeah. Because that's life, right? Yeah. Yeah, okay. And, you know, in large complex um large complex organizations, you know even the like often go as far as column lineage, right? So the values in this column are theoretically computed like this.
01:14:47
Speaker
um But... if you have two rows in a table and and you know one represents an FX trade and another represents an interest rate swap, the way that those two fields were calculated will often be different depending on the type of product that you're that you're trading, right, in finance anyway.
01:15:06
Speaker
And so column-level lineage often isn't enough. You really need to be able to say this specific value was calculated in this specific way and here's he is all of the inputs that we use in order to calculate that that value, not just not just, you know,
01:15:21
Speaker
In theory, to talk to calculate this value, we spoke to these systems, but no, for this value of seven, we got there by getting four from the system and three from that system. And we got three by feeding in the, i this ID, we got that ID by talking to another system.
01:15:33
Speaker
So like really tracing it all the way back. That's really what, you know if you care about lineage, that's really the stuff that matters. Yeah. It's the only way to do it accurately. And it's not something you can bolt on later very easily.
01:15:46
Speaker
Yeah. Okay. Okay. I can see how this all begin begins to hang together. Let's talk about sticking it into the real horrible thorny world. Like if if this seemed like a good thing and I was using it at a different bank that you'd never heard of, or I'm saying bank too much. It's just my go-to shorthand for large organization that doesn't talk to itself.
01:16:11
Speaker
yeah But like inevitably, I'm going to come with a new system that not only hasn't annotated their API, but has a kind of API you haven't written a connection for yet.
01:16:23
Speaker
How much work is it to say, here's this weird may mainframe, and I want to use that as a new query source? Yeah. I mean, we've gotten pretty we've gotten it down to a fine art, so we can kind of crank out um you know a new connector within ah a day or two.
01:16:40
Speaker
um But it's all like it's all open source, and all like all of the code is up there. So if if you wanted to go and build a mainframe connector that speaks some esoteric mainframe protocol, yeah, you could go and implement one yourself.
01:16:54
Speaker
What does it look like? I mean, is there an interface that I write, and how large is that interface all? ah Yes, you implement a we we call them invokers. So you implement an invoker that that understands how to speak to Kafka or understands how to speak to MongoDB or understands how to speak to, know, weird mainframe.
01:17:14
Speaker
um so you implement so there's always There's always two parts to it. right One is, and you touched on both of them. One is, how do I so describe the semantics of the contract? and That is different from how do I interact with that system and how do I send messages backwards and forwards over the wire? so One is around structural, like what's the DNS name, what's the like how do I send data across on the wire? and One is semantic, what's the meaning of the data that I get back and how do I how do i interpret that?
01:17:44
Speaker
um and so those are implemented as separate concerns and a lot of the time um like open api is a great example where those two things kind of travel as one right um but sometimes they will be sometimes they will be separate so you could have you could implement to a invoker that knows how to speak to a mainframe and then you can have um a schema translator that understands you know, our weird um binary protocol language.
01:18:13
Speaker
And so you you need to build those two different things. Both of them are pretty quick um to build. And then all of a sudden, like the taxi engine knows now how to talk to your system. And it will call that invoker at the right time and it will go off and get data and it will give data back.
01:18:28
Speaker
And you can just magically join it into other stuff. yeah Yeah. well it's and yeah Yeah. once Once the query engine kind of understands how to speak to that system and there's a schema that describes what data that system can give back, yeah, we'll call it at the right time.
01:18:42
Speaker
How many systems, many kinds of system can you connect to these days? I mean, it's pretty modest at the moment. I think we've probably got 20 or so different combinations that are in there.

Beyond Connectors: System-Level Integration

01:18:56
Speaker
But given that we're not um like we're not trying to solve the problem, like you can't come to Orbital or Taxi and get a get a schema for HubSpot.
01:19:07
Speaker
but We're not trying to solve the problem of how do I speak to that application? Because again, I think other people have done a great job of solving that. um So we really only have to solve how do I speak to Mongo? How do I speak to Kafka? How do I speak to RabbitNQ? the so the and you get a lot more surface area with a a lot fewer number of connectors.
01:19:28
Speaker
And again, like we have support for SOAP, gRPC, Avro, JSON, CSV, XML. That's most of the enterprise systems kind of kind of covered there, right? and Yeah, yeah you you've covered a vast swathe with just that bunch.
01:19:45
Speaker
Yeah, and you know we can support um you know we can support speaking to Kafka, Rabbit. We can invoke Lambda functions. We can read from blob stores. We can stream data from S3 bucket.
01:19:58
Speaker
Again, like you pretty quickly get coverage of a lot of the surface area you need to cover to build something that's useful because we don't have to worry about... building but like our our goal isn't to build for HubSpot or to build for you know Google Sheets or you know those kind of things it's to build at the system level for system level integration we're like we're used by developers we're not used by like marketing teams that want to do kind of citizen integration we're used by developers who are comfortable ah and just want to spend less time on the boilerplate stuff yeah yeah okay that makes sense so what's left to do like where where do you need to take it next what's missing
01:20:40
Speaker
um I mean, i guess lots of stuff. There's um there's always different optimizations that we want to be able to add into the query engine.
01:20:53
Speaker
I think, you know, TaxiQL is a fun thing to build, but we are always having customers come to us and say, hey, how do I express this query? How do I query this kind of stuff?
01:21:05
Speaker
um And so we can kind of continually evolving the language as as new requirements come in. and That also in turn makes us look back at some of the some of the decisions that we made a couple of years ago. I'm look like ah i'm sure that was a good idea at the time. but yeah Can you give me an example?
01:21:24
Speaker
um I think, I mean, we kind of touched on one with cardinality before um and how we how you express um how you express going from an object an object of one to an object of meaning.
01:21:38
Speaker
um And I feel like we got that kind of very wrong in the early days. um And we've got we've kind of improved the syntax a little bit now to kind of talk about how you navigate an object and then how you iterate an object.
01:21:52
Speaker
um And given that this is supposed to be, like taxi is supposed to be very declarative, so you can't you can't really, like you don't define a for each loop, right? Instead you're just saying, hey, I want a collection of address objects, but I want that by know selecting I don't want just any selection of andd address objects. I want it from this i want it from this aspect of ah of source data.
01:22:16
Speaker
and That was the thing that we didn't really have great semantics for in the early days. um and We have kind of bolted it on, and I think we screwed it up a bit the first time. and I feel like just the last few months, we we've landed on something that is that is a lot a little bit more expressive and feels a bit more natural in the way that you kind of like you can write it.
01:22:35
Speaker
yeah um Yeah, so there's, um are we're doing a bunch of work around SDKs as well. So one one of the really interesting things is, um like we spent a bunch of time working on taxi and working on the query engine and on the like the tooling and tooling ecosystem, the compiler and all that kind of stuff and make it very hackable and very pluggable.
01:23:00
Speaker
We didn't put a huge amount of time effort into building SDKs. mean, we built some, but they were kind of rubbish. um And so we've been doing a bunch of work around, um and I think this is a thing that GraphQL does phenomenally well. Like the the GraphQL tooling for JavaScript developers is just gold standard. It's beautiful.

Catering to Diverse Programming Ecosystems

01:23:20
Speaker
And they've absolutely nailed it. And so we've kind of looked at that and gone, like, yeah, if I'm building a backend application and I happen to be coding and in JavaScript or in TypeScript, I'm
01:23:32
Speaker
like I would want to be able to write a taxi query that says, you know, stream data from an S3 bucket and enrich it against a Mongo thing and um and call a bunch of other REST APIs and then write it into my Postgres database, right? And that's ah a thing that, yeah, as a TypeScript engineer, I want to be able to do, but I don't want to have to pull in like all of those other drivers. And so taxi is a really nice way of solving that problem.
01:23:57
Speaker
um It's just our SDK tools will kind of... we're gross for that. and so we've we've um We've spent about a bunch of time kind of looking at what good looks like, and I do think that in that space, what GraphQL does is beautiful. and so We've borrowed heavily from that ecosystem and been inspired by it to go. you know like They have this this great tooling that You can define a query inside your JavaScript and there's this phenomenal tooling that's kind of parsing your JavaScript in real time and going, right, well, here's the shape of that object, so I'm going to run a compiler in the background to generate some classes for you so you get like nice strongly typed stuff as you're going um as as you're typing. and The GraphQL ecosystem does that really well, and so we've gone, we should we should be able to do that as well as that. so
01:24:40
Speaker
You know, we've got some got some stuff coming out in that space. And the same in kind of, again, we want to have great support across the languages that um developers are building in. So that's, for us, it's obviously Kotlin and JVM languages, um you know, C-sharp and.NET languages.
01:24:58
Speaker
um So, like, right now our focus is on TypeScript, JavaScript and the JVM, but, you know, we're excited to, we again, we should we want to meet developers where they are. So we want to you know, tooling for.NET developers and tooling for Go developers and tooling for, you know, line of business languages.
01:25:17
Speaker
Yeah, I imagine that's one of those things where, sure, there's an overhead for each language, but it's the design up front figuring out what tools you want to supply. That's the really tricky part. And also the idioms, right? Like the way that TypeScript developers work and their expectations of their iteration cycle are different from the way that um a like a java a Java developer works.
01:25:43
Speaker
um And like it's you can't you can't take something that works really well for a JavaScript developer and just transpile it into Java and dump it into the Java ecosystem because those developers work differently.
01:25:56
Speaker
but like ah ah JavaScript developer will quite happily have a node process that's transpiling code in the background as they're typing, whereas a Java developer is going to be running like a Maven build and do the transpiling and their and their code gen is part of ah you know an active thing that I run.
01:26:13
Speaker
um And so the way that those teams, the way that those developers work, the idioms and the practices, like you have to you have to do something that works well within that tool set, within that ecosystem. And so we're kind of picking the ones that we know because we feel like we can solve for that.
01:26:26
Speaker
We're not Go developers, so we don't know what the Go

The Future of Declarative Integration

01:26:28
Speaker
idioms are um and what good looks like in that ecosystem. yeah so So eventually you're going to have to hire like native speakers. You're going to have to hire Go developers. you know or Or they'll come and find us on GitHub and maybe tell us what good would look like and we can, you know... Knowing the internet, they'll tell you what bad looks like, and then they might tell you what good looks like. We won't tell you what good looks like, but we'll tell you why all of the decisions that you've made were wrong.
01:26:50
Speaker
Okay, well, that's a starting point. We'll move on from there. I'm going to circle back because there's one thing you mentioned that we didn't actually cover. You briefly hinted at it, which is writing.
01:27:03
Speaker
We've talked about querying this whole way, but you mentioned writing the data into Postgres. So it's not just that I'm querying like I'm taking these queries programmatically, I can also define a sync to dump the data into. Yeah, absolutely. And the so the syntax is a little bit different.
01:27:20
Speaker
So um i think philosophically, the query engine will read from wherever it can to satisfy the data contract, given the constraints that it's got. right So we'll look at all the different sources.
01:27:33
Speaker
But when you're writing, you do want to be a little bit more explicit. right you just You do want to say, i want to write this to... that specific database or I want to call that specific API. So the the syntax is a little bit more deliberate and in that space.
01:27:46
Speaker
But yeah, I mean, people who are using us are using us to like common use cases are, I want to read data from an S3 bucket, enrich it, transform it and write onto Kafka, for example, um or write it into ah database or cache it and then serve it over HTTP, like those kinds of those kinds of things.
01:28:07
Speaker
um So yeah, so reads and writes are part of the language. I think there's some really interesting stuff that's kind of in the background of cool experiments to write around um going beyond um going beyond explicitly saying, I want to call this service and going to, i want to perform this action.
01:28:30
Speaker
And I think there's some really nice um some really nice abstractions that you can apply in that space around kind of intent or action to say, I want to read some data and then I want to charge charge the customer.
01:28:42
Speaker
And I don't necessarily know which API I want to call to charge the customer. I just want to i just want to satisfy an intent. Yeah. And I think that's that's interesting. that That's nothing more than an idea at the moment. But I feel like, you know, our goal is to be declarative around the way that you write this integration rather than imperative. And so moving more towards, I want to satisfy this intent, feels kind of more natural in our space.
01:29:09
Speaker
You've been semantically

Conclusion and Reflections on AI Synergies

01:29:10
Speaker
tagging nouns. You're ready to try some verbs. Yeah, yeah, yeah. We'll get you back on the show when you get to adjectives. um However Matt will look.
01:29:23
Speaker
Oh, I know. That's the cost optimiser when you can do it quickly. Yeah, I want i i want this data cheap. yeah Awesome. I'll look forward to that. but In the meantime, best of luck with the next version.
01:29:36
Speaker
Thank you very much. Marty, thanks for joining me. Thank you so much for having me, Chris. It's been fantastic. Real fun. Cheers. Thanks. Thank you, Marty. If you want to give Taxi a closer look, you'll find links in the show notes as always.
01:29:50
Speaker
Personally, I would start with the link to the playground because it's a nice visual way to see what it looks like and get a feel for how it's going to work in practice. And then there are links to the docs and the GitHub repo and all that from there.
01:30:02
Speaker
Since we recorded that episode, I have to say there's one more question that I wish I'd asked Marty, which is about agentic AI, MCP support and all that, because I'm fairly sure that AI is going to change the integration space, especially in that we left it to the last minute angle of integration.
01:30:23
Speaker
because AI is actually really good at duct taping things together in a hurry. It's integration that's good enough in a pinch. But then it gets much, much better the more relevant context you can feed it up front.
01:30:38
Speaker
So having a machine readable description of how to meaningfully relate different systems will give you better results. I think there might be a really nice synergy between something like taxi and AI.
01:30:51
Speaker
I wish we'd explored that. Maybe we'll get Marty back for a bonus episode someday. If you'd like to hear that, then let me know by leaving a comment or liking this episode or sharing it with someone who's currently stuck in a systems integration meeting, slowly gnawing their arm off.
01:31:09
Speaker
And um tell them to use their remaining arm to click subscribe because we'll back soon with another look at the software world. Until then, I've been your host, Chris Jenkins. This has been Developer Voices with Marty Pitt.
01:31:21
Speaker
Thanks for listening.