Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Eric Broda & JGP: Implementing Data Mesh and Beyond image

Eric Broda & JGP: Implementing Data Mesh and Beyond

Straight Data Talk
Avatar
46 Plays1 month ago

Eric and Jean-Georges Perrin are talking about their upcoming book Implementing Data Mesh, but also about actually executing Data Mesh through an open-source project. 

Transcript

Introduction to the Podcast

00:00:01
Speaker
Hi everyone, it's Stree B to talk back on Monday by weekly podcast about data and interesting stuff and data. And today I'm going to do it a little results card. Let's go. I want to let you know that we missed you. So I know you're not. Okay. One of our guests just that he doesn't miss.

Meet the Guests: Georges Perin and Eric

00:00:25
Speaker
I'm sorry to acknowledge it, but what is important today, I have two very smart and fabulous guests to join me to talk about their coming book, GGP, also known as Beyonce of Data and Harry Prona. So YouTube, please introduce yourself shortly. And yeah, afterwards I'm going to jump with questions.
00:00:57
Speaker
Okay, let me let me let me let me let me start. ah So, I'm Georges Perin or JGP or Beyonce of Data. I think that we we need to make sure that your show is a little bit more popular because people don't It's the second time I'm on your show. Thank you, Julia. ah it's It's only years after years of the Beyonce of data, right? The thing is, I think i affect we need to push that a little bit ah um a little bit more because I really i really think it's ah ah it deserves to be more well known.
00:01:29
Speaker
ah But more but more more more seriously, I was a chief innovation officer at a company called Abia Data. um I've been working with what I would define modern data engineering for quite some time now. And Data Mesh for me is part of this modern data engineering paradigm.
00:01:51
Speaker
And yeah, and Eric, go. and Fantastic. i I have never heard you being called the Beyonce of data, but that yeah I will acknowledge the fact that JGP is an expert of all things data. So so there you go. As far as ah my background, 35 plus years, give or take ah in ah the industry, um almost all of it ah in technology as an executive at large banks, insured companies as ah as a consultant. Most recently, ah probably the last seven years, i actually, I have my own company. I mean, what we do is we build ecosystems, in particular, data ecosystems. ah And ah most recently, I was working with J.D. Peter to write this book.
00:02:43
Speaker
Okay. Thank you for the intro, folks.

Why Write 'Implementing Data Mesh'?

00:02:46
Speaker
I'm super thrilled to have you, and I'm super thrilled to talk with you about the book, so the book called Implementing Data Mesh. So first of all,
00:02:58
Speaker
ah Why did you guys decide to write the book?
00:03:05
Speaker
Well, I can take a stab at it, but I know JGP probably has some of his own ideas, but everybody knows that Zemeck D'Gani wrote this fantastic book on data mesh. She kicked it all off. um And the that the key question we had, at least I had in all of my travels, is this is great stuff, but ah where do I start? How do I actually go about implementing some of these capabilities? And that was kind of the genesis of of the book. and then ah I was writing a number of articles in this area in data mesh and then ran into JGP and we both had some, fan we thought were complementary ideas and that that led us to put in a proposal to O'Reilly. JGP, what about you? No, I think there's something pretty well of it.
00:03:58
Speaker
for For people knowing me, it started by a joke, right? But the thing is, it started by something saying, like, hey, yeah a hey yeah Eric, why don't we write a book and about about it? And it's it was kind of a half joke, okay? Not not really a joke, but not really. ah not not and ah And Eric said, yeah, or something something a bit like on the way of, ah yeah, let's do it, or chicken if you don't do it, or something, you know. And and then we started about a little bit over over a year. Yeah. So um know how big is the book? Like, it's 12 chapters? It is 16 chapters, almost 300 pages.
00:04:42
Speaker
I took you one, like a little bit more than in your proposal was ever simply editing. And so it's about to come out, right? It will be available October 8th. I just checked the other day on Amazon. Uh, and it'll be, I think a little bit earlier on the O'Reilly portal. But it's super impressive folks. Like how, how did you manage to write the book this quickly?
00:05:12
Speaker
i I think I owe it to to to to eric Eric. Eric was definitely not a pushover. And he kind of drove this project and reminding us of the deadlines. okay so ah and and and and and And we made it work. okay But the thing is, it was it was an ambitious project.
00:05:39
Speaker
ah Yeah, we started I think we started in July, and right now as the book is being has been printed like a week ago. ah Around there. so so So we are really in the middle of the logistics of bringing the book to the retailers. And this morning we had a request from our publisher to confirm our mailing address to get our personal

The Collaborative Writing Process

00:06:03
Speaker
copies. okay So it's really like a, it's hot, hot suppress.
00:06:08
Speaker
this This sounds so exciting. And talking about the book itself, and can you please tell me, first of all, how did you divide the job of writing in the book, but also what was your favorite parts of the book like ah and and why?
00:06:28
Speaker
don want JGP, why don't you start? oh ah So we we divided by number of chapters. okay and yeah we we we we we We keep that confidence short between Eric and I and and our editors. okay So you you will have to read the book and and see who wrote what chapter. ah we were We were told a little bit that we have different style, which is not completely surprising.
00:06:54
Speaker
um But ah but since the the overall zero aspect of the book is consistent. And of course we we did a lot of cross-review. okay So so um I wrote something, ah Eric reviewed it, then our editor reviewed it.
00:07:14
Speaker
same thing Eric wrote something um there is really I think one of the first chapter where we've got a really we we combined a lot of both our work but otherwise ah otherwise ah all the chapters are pretty much done by by someone okay so so so that's how we divided the work and I think it was oh
00:07:39
Speaker
I think i think it was it it was a fair way to work it out. ah I think it kept all our ideas kind of ah pretty much alive. okay So we did not have to distort ourselves too much on on what we wanted to to to implement. And we were able to touch all the topics we wanted to to to talk about. um So so i think I think it gives us a pretty
00:08:09
Speaker
good sense of where we want it to be. um Yeah. Yeah. I think like for me, first I echo everything that JGP mentioned. It's it's it's always fun to um try and figure out how you can actually implement such it talk about implementing such a broad topic. And I think the the ability, JGP, I give him credit. He was the one who probably organized how the chapters would be structured.
00:08:39
Speaker
for the most part, and i I think it led to what I think is, a and hopefully everybody else thinks so too, a fantastic book. But as to you know what was my favorite part, one of the, that it was always a a pleasure to to interact and and write stuff with JGP. That was a great experience. But the part of the book that I really enjoyed the most is where we had our use case. It was a client, we called it Climate Quantum,
00:09:06
Speaker
where we tried to actually show in each of the chapters, how do we actually apply the things that we actually talked about in the chapter, whether it's the architecture, whether it's data contracts, whether it's operating model, whether it's the how to apply generative AI,
00:09:22
Speaker
to data products and such, every one of those was bound together through our use case. So we actually, and and by the way, the use case is a real one for the most part.

Generative AI in Data Quality and Products

00:09:33
Speaker
It was a little bit of artistic liberty, I suppose, but but it is, we we are, my my team is actually building with a global team this capability right now. So it's based off of some of the the stuff that we're actually doing right now, implementing a global data mesh in the climate landscape, but but being able to to actually define how to go about using the book to deliver real results. And a real use case was probably the the most fun part ah for me because it was very tangible and I could actually see the results.
00:10:04
Speaker
great No, no, absolutely. I think that you are the one, um you are the right people for the job because both of you were implementing data mesh. And I think you have a lot of to say about it. um I'm specific, but like Eric, you kind of stole the question from me, did you actually mention how to enable generative AI?
00:10:34
Speaker
that with Literally everyone you know listening to us are wondering if there are any generative AI in Google. Yeah, so so let me give you a few examples um of how we're using it today. um So so i can there's the climate example I can talk about, because it's it's open source, it's public. um And then there I'll try and give examples, but I can't mention names, and I'll have to kind of give it a little bit in the game. But I've implemented data mesh several times now. And in each of them, we've used capability as generative AI to
00:11:08
Speaker
improved data quality. um So being able to i you know enable faster decision making, being able to to ah build data contracts faster, being able to create artifacts or data products automatically, those are things that we actually are doing right now. um But when we actually not just define the data product, we build the data product using copilots.
00:11:32
Speaker
yeah so that's Everybody does that type of stuff today. so so this There's probably no rocket science, nothing new there. but What we ended up doing that I think was relatively ah unique to my clients, what we ended up doing is elevate models, generative AI models, and also the older AI machine learning type models, if you will.
00:11:51
Speaker
We elevated those as primary artifacts, primary artifacts and surfaced in a data product. So all of a sudden what was happening was the models did not stand alone from the data, but they were rather bound to it in a logical fashion. So we actually elevated generative AI models and other models as primary artifacts in a data product.
00:12:12
Speaker
We also used them to consume ah data within data products and be able to drive unique insights. um And obviously what we also did is is we used it to complement some of the capabilities. So a simple example is in our data marketplace um when somebody wants to search for, in our case, climate data, for example.
00:12:33
Speaker
There's ah there's ah a fair number of data products. How do you actually do that easily? Well, we use natural language type search, which goes into a generative AI, vector database, all that kind of stuff to to create a more natural way of interacting and finding data within a data mesh. So those are those there's a bunch more, but those are the kind of ah ah the top uses of generative AI. This one is a good example, by the way, I love it. You know, everything that comes to user interaction, especially with data, for sure, the GenerateVI, you know, is the right tool to simplify it and to make it smoother. But at GPI, I cannot hold myself, you know, not to flip this question on you. How do you feel about GenerateVI and data contracts? Are they, you know, born to be together or no?
00:13:23
Speaker
so so So we can spend the rest of the talk about that as well. so But at first, I realized I didn't answer your question about my my favorite part of of of the book. is I think, but be honestly, it was really working with Eric. Because even if he talks a lot, it's probably too much.
00:13:43
Speaker
is is he's a super nice guy okay and and and what went what what do we hear about Canadians being nice that's that's an understatement is the nicest guy i've worked with in so long okay so so that that was my favorite um Gen AI i i i
00:14:07
Speaker
I don't think there is an opposition at all between data contracts on Gen A. I think i think quite the opposite effects it complements them themselves almost okay so So first the thing is um is it is this at a different level. okay For example, you want to build a model, okay or you want to enrich your model with with um ah with with ah vector database and you want to ingest your own data into your model. okay You've got to trust the data you're bringing there. And one way to trust the data is through data contract, data product, and eventually data mesh. All that is described in the book.
00:14:46
Speaker
um the that Eric is using me. okay And I'm seeing a lot of people giving me a recommendations about how I can enhance my data control. So one

Generative AI and Data Infrastructure

00:14:58
Speaker
thing is when I'm talking about this modern data engineering hello consideration or power than compared to what we used to be doing data engineering is. we One of the most important thing is to think product. And as you're thinking product, you can think evolution. You can of think iteration of that. okay So this is because you're not stuck to a pipeline you built 20 years ago, and it doesn't move for 20 years. okay
00:15:29
Speaker
you've got you can evolve, you can make your, you can make your your architecture evolve, you can of make your product evolve. And what nothing and that's where zi the iteration that can be helped and and brought by GenAI for enhancing documentation, for ah comparing data contract, making recommendation for the evolution of your data infrastructure, all that. Because we finally have a source of data, a trust to a trustworthy source of metadata
00:16:07
Speaker
through data contracts, data product, and data mesh. okay so so All that is really combined. Yeah, one one I want to echo it what JGP mentioned, but I also want to give you another use case.
00:16:19
Speaker
um ah Data Mesh obviously is an ecosystem of data products. We want to build data products as quick as possible. So so here's a real-life example on how we we ended up accelerating that process. so So a lot of the data product is has to be defined in in simple text and documentation that people can actually understand and find.
00:16:41
Speaker
so So in the climate space, when we create a climate data product, what we do is we actually point, for example, to the NASA site or you know this particular climate data site. And we we actually have processes that read that, take that site using BeautifulSoup or whatever, ingest the the information on there, and actually generate the data product summary.
00:17:02
Speaker
And we have templates for things like ah the the data product artifacts, the data contract even. What we do is we provide those templates and and ask it generative AI to actually create a first draft of some of those materials. So what ends up happening is we can create the scaffolding and arguably a reasonably well documented um ah data product in five minutes or less. And then what you do is you need to apply the real thinking around what is the data product, ah sorry, the the the core artifacts, how to access them, what is the data contract that governs some of their their and use and such like that. But again, it lets you get started very, very, very quickly, which is one of the obstacles that many folks have is where do I start? How do I do this? Well, in five minutes or less, we we use generative AI.
00:17:50
Speaker
to ingest a whole bunch of information and create a wonderful starting point for a data product all in five minutes or less. That's a huge accelerator. Yeah, but it also depends on the data products you need, right? So it's not like it's not the approach and it's also size and all the use cases. But I do, I do genuinely like how you see it. Like again, it's so beautifully the way you define and see opportunities because, you know, lots of folks in data space been skeptic about how
00:18:28
Speaker
Gen-I can help us, you know, extract value from data. Yeah, lots of, but and I'm like, I totally can see all the changes there. It just, when it comes to implementation, I tend to think about Gen-I as a flavor to think of products more in the sense that, you know,
00:18:55
Speaker
you different shapes since also I think very much what you're saying. Yeah. Yeah. like Like generative AI is distinct from data products, but it's another tool in your tool belt that will help you you know with agility, speed, or otherwise. It's it's another tool in your tool belt. Yeah. Yeah. Nice approach to think like this. Yeah, I like it. Okay. That was a question.
00:19:24
Speaker
And a know also I wanted to highlight the GGP, on like he said it so nicely, think product. And this is where we start. And I really enjoy it because that would eliminate so many problems. Like you will will start thinking about data as a product and basically work backwards. You know, we have to set what we need and then think about architecture and Um, I never seen that, uh, direction. So who do you recommend this book or like, is it supposed to be like some junior beginning, uh, data specialist professional or it's, you know, for fairly seasoned professionals, how do you feel about it?
00:20:20
Speaker
I can start. um Go ahead. So I'm going to try and obviously give you an answer that I think provides the largest audience, but I truly believe this actually does ah address a wide audience. So for for first foremost practitioners, um if you have to go and figure out how to go and implement a data product, data contracts,
00:20:43
Speaker
and capabilities related to that, ah this is the place to go. ah There's hands-on examples, the definition, how to do go and define a data contract, ah the key components and the interfaces, literally the down to the interfaces in a data product, that's in the book. So the practitioner will definitely gain value from it.
00:21:02
Speaker
there's ah There's a fair amount of architecture capability defined, ah whether it's ah architecture to support development, to support runtime, or even the forgotten realm of operability that we actually, my experience is heavily in that area. um ah We try and bring that to the forefront too. So so as an architect,
00:21:23
Speaker
This is a book that I think is going to be invaluable if you are implementing data products or data contracts or even the broader data mesh.

Target Audience for the Book

00:21:30
Speaker
But I think there's there's some significant thinking that JGP and I did around team structure, ah operating models, roadmaps on how to go about implementing your your data map, which I think is is targeted at your you know your manager and executive level also.
00:21:51
Speaker
um so So, I think there's a pretty broad audience on who can benefit from this. um And j b JGP, I don't know if you ah would like to add to that or refine that answer a bit. i Yeah, it's definitely a wide audience. and and um you know
00:22:15
Speaker
One of the discussions that was, that that both Eric and I were were having with a lot of people at the moment that we created, we we decided to to to write a book, was a lot of people came to us because we we have some experience.
00:22:33
Speaker
And they were asking questions ah because this is a question they didn't find in Jammak Stegany's book. okay So so um she did a fantastic job. i'm I've got an awful lot of respect for for the work she's done i and her book because I think that was kind of the foundation there. And when we when we When we went to O'Reilly, in a way, what what are we told our acquisition nature resist so many interest any interest when you go through the book publishing?
00:23:07
Speaker
thing is that we want to write the second volume of what Zhamak was writing. okay so So if you um if you read Zhamak's book and you loved it, like many people, ah but you had a lot of questions at the end of it, we wanted to have a book that answers those questions.
00:23:34
Speaker
oh so and And the challenge for us was because the questions were very technical at times, and and the and the questions were also very sociological at times. okay And as Eric was saying, so um so the ah we ah Are we yeah ah talking to engineers or are we talking to ah to data product owners or how we or to to
00:24:09
Speaker
It's probably not a book for VPs, but it's it's maybe directors or at this level where where you still have to to make to make this kind of decisions which are a a a bit more practical than financial.
00:24:24
Speaker
um but but that that was That was kind of what we wanted. okay so i've And the red lines of the so so overall what we delivered through the book is the consistency through this example of the climate point.
00:24:45
Speaker
So whether it was in Eric's chapters or in my chapters, we stuck to that. so and it's not ah it it's we and Even if Eric is is is working already in this field, both of us, um we felt really motivated by using an example that is also making a difference when it comes to something. you know it was It was climate change. okay So climate data to fight climate change or to not fight it, but understand it better.
00:25:23
Speaker
So something that is also in a way for kind of a, almost like a not for so much profit kind of way. you see if in in ah as So so that that's but also motivators quite a bit. That's very interesting. Listen, you mentioned it that you see this book as a second volume after Shmuck's book.
00:25:51
Speaker
And I have a question. Did she give you her blessing?
00:26:00
Speaker
we i I would say we we we we have a good relationship <unk> we with Zhamak. We did not look for blessing and in in ze and um in the book.
00:26:15
Speaker
Well, maybe we should have, ah but um but the but I think we we partnered with another celebrity of the data mesh community,
00:26:29
Speaker
ah who is actually, very twenty i um I don't know, actually maybe just ignoring you these days. I don't know what's going on with you. Oh, you mean that bald guy?
00:26:43
Speaker
i Well, not Eric. The other board. Eric is here. I'm like, there you know, one of the celebrities is here. so yeah so so i'm i'm ah you know you know e Eric and I, and you, Yulia, actually, this three there no but the three the three of us, we we we we got to know each other thanks to Scott Elamin, okay? And I yeah and ah think Scott built an impressive community ah and an impressive show with with with his ah with his podcast around the idea of of data mesh, okay?
00:27:25
Speaker
and That's why we wanted to have him involved in the project. okay So that's why he wrote the foreword of our book, ah which is a great foreword. And I don't know if he's giving us his blessing for the book. I haven't asked him, but it was also a way to thank him because of all the work he did for for the community to make sure that he would be part of the book.
00:27:56
Speaker
You know what, if you're going to put it in numbers, so Scott recorded 300 conversations, you know, about data machine. This is just recorded and there is a fair number of those that he hadn't recorded, right? And she just, you know, i talked with people and I think.
00:28:14
Speaker
And it translates into the professionalism around this area. I mean, around, around topic of data mesh, because he heard, you know, he range stormed, he, you know, like, so he is a big professional in data mesh. That's, that's, that is for sure. Um, but, uh, what I, you know, when I said lesson, like I was wondering, um,
00:28:40
Speaker
not the blessing version, but yeah, i I'm going to discord as code is involved. Yeah. Okay. my My next question is about this climate quantum. This is really fascinating. Eric, you mentioned that this is open source, right?
00:29:04
Speaker
Is there are any way like cloud folks you know can poke around and see what is in there? And yeah would you be able to share that? Yeah, sure. yeah It's an open source project. It's run by a group called OS Climate, so ah OS being open source.

OS Climate Project Overview

00:29:23
Speaker
ah You can actually Google them and you'll find them. And and what we what what what they're trying to do is, is you know it's it's it's a huge vision.
00:29:34
Speaker
um they want to so They want to make climate data easier to find, assume, share, and trust. And because climate data is so federated, ah there's no there's no governing body, there there's thousands of owners. um The only way that we believe we could actually tackle that was by creating a federated data environment.
00:29:59
Speaker
That's what Data Mesh is. so So Data Mesh was the logical choice, and that's that's why they asked me to participate in this. ah But it is OS climate. um There's a variety of different projects within that. I'm i'm running two of them. ah But there's a variety of different projects related to to climate change, and we welcome any and all folks that want to participate.
00:30:23
Speaker
And tell me if I'm if i'm mistaken here, but OS Climate is part of the Linux Foundation. Absolutely correct. yeah so So the work is is modeled on some of the stuff we're doing with OS Climate. um so So like I said, welcome any and all volunteers.
00:30:44
Speaker
ah No, just calculus, because um even you know when I'm doing the webinar and I invite any speaker to have this topic, I found it so helpful to have a tangible case, you know, like to set up set up the stage and have a scenario. And if you have this use case of um, climate quantum going through the entire book. This is a huge asset, you know, just to showcase what you are and how to actually implement data mesh. That's really interesting. Yeah, I think like, I don't want to take too much time on this, but as JGP mentioned, this is an opportunity for us to give back. But but here's here's why we selected the the climate quantum as an example.
00:31:34
Speaker
is it would be it be difficult to find a a data domain that is more voluminous, more complex, and less governed ah anywhere. this is this is kind of All the challenges that you have in enterprise data magnify them by 10 or 100 fold. That's what we're talking about.
00:31:52
Speaker
um Which is why data mesh is so compelling. Like the volume satellite images are gigabytes. Flood data for Europe will alone is like 30 gigabytes. And there's like many, many, many, many, many of these things around. The complexity, the formats are completely, you know, everybody's a little bit different. Normalized, not normalized. Public, private formats. No easy versioning of any of these things. There's no governing body, inconsistent rules around in each of the regions that we're dealing with. um transparency is mandatory but very very difficult to achieve data mesh actually is the answer which is it provides a mechanism to provide uh to manage federated data and this is the most federated example that i could possibly think of and it's what we're applying data mesh today for so so it is it is literally the perfect use case and like i said the fact that we can give a little bit back is
00:32:49
Speaker
No, it's beautiful. i like yeah you know i can I can see how your eyes are shining and yeah and it's kind of inspiring very much. Did you want to add anything? I was just thinking about an example where You know, you've you've lived in a few countries, ah ah eric eka's Eric as well, and um we we are always confronted to all these units, okay? And then then and and just just looking at, so i was I was kind of new to the climate data before before that, okay? So so oh so I did some research and things, I looked at data sets,
00:33:35
Speaker
And and it's as as Eric said, it was it's a mess. okay even even like okay am i Am I going to measure the rain in millimeters, inches, centimeters, whatever? okay All this thing. And then the pressure is all is all different as well. So so we've got to really have this very wide,
00:34:00
Speaker
you know and how
00:34:04
Speaker
You need to be open-minded just to understand the variety of the data itself. okay so it's it's ah and And we try to simplify that as well. So even one audience we didn't think of, but even if you're just interested in that kind of data, I want to understand a little bit. It can be interesting to buy the book just for that purpose. but but No, it's highly, very interesting.
00:34:27
Speaker
glad of This is very interesting and I can see the use case and how how nice the data mesh actually fits in. But here's the thing that, you know, I was thinking before jumping on a call with you guys. So I don't know how I'm going to come across for you. But it feels to me, and this is my internal feeling that data mesh talks
00:34:55
Speaker
you know, sort of the share of data mesh talks and data community started to decrease, but still the share of data products talks, you know, as a topic increases, you know, two-fold, three-fold, and it's all based on my internal feelings, like no data backup on this. So how do you feel like is data mesh go in a way, and I don't believe it's going to go away because the there are use cases like you guys are, you know, just fold us. The data mesh is an answer, but it's still a very complex, ah complex, I want to say concept of a plan.
00:35:44
Speaker
hello it it it it made it It made it to Spain. oh hi Well, it was definitely hard to implement and I had, he of of course, it made it to Spain. It's definitely hard to implement.
00:36:01
Speaker
And I have lots of appreciation of the people who are doing it. But for some reason, the number of talks are too decreased. How do you feel about it? like and um yeah I don't know if there is a question. but yeah yeah i cant I can take a little stab at that one because it's kind of one of my favorite topics.
00:36:20
Speaker
oh
00:36:24
Speaker
so So I don't think you're wrong. I think you're actually pretty right that there is ah see in the overall interest in um data mesh seems to be a little lower than from a year ago.
00:36:43
Speaker
ah However, what has reason dramatically in term of popularity is data products. And in a way, a little bit data constructs as well. sir and So so one one thing is that in in oversimplifying things, a data mesh. So if you have if you if you decide to go data products and build data products, okay and i've I've met with people, and Julia, you've met with those people who have more than a thousand data products already. okay So are you not just transferring one level of a mess to another level of a mess?
00:37:38
Speaker
Okay. and ah And you're just changing the paradigm. Okay. So like, oh, data warehouse is too complicated. So I'm going to do data lakes because there seems like it's easier. Okay. Oh, I'm not going to do datasets anymore. I'm going to do data products because it's easier.
00:37:54
Speaker
But the thing is, then are you just transferring so the product? okay There's a problem to something else. And I think i think that's kind of going because of the reasoning there is, oh, data mesh is a little bit too complicated. and So I'm going to do something a little bit simpler, which is managing data product, problem um which is not really a good definition of what a data product is anyway. okay So I can do a little bit of what I want. And to add to that,
00:38:22
Speaker
um I think that people have a tendency to not see the between the three. like okay So that's why the subtitle of our book is is really about data contracts, data products, and data mesh. And depending on the maturity you are in,
00:38:49
Speaker
you are you can evolve from one to another. So simple simple use case, okay? oh I'm starting with with this, I've got problems with too many pipelines, blah, blah, blah, blah, blah, blah. Okay, well, have a look at data contracts, okay?
00:39:05
Speaker
ah And then you can already ah start ah increasing your quality. You can already start seeing some evolution of your situation with having a good set of data contracts. Then the thing is, okay, data contracts are a little bit cryptic, so don't talk to my end users. um Let's build that in data products.
00:39:26
Speaker
Okay, then I've got data product and then I've got 1000, 2000, 3000 data products. How am I going to handle those things? Well, that's where data mesh is also helping because it's kind of having this overseeing view of of all of that. Okay, so so um so I think it's all linked. Okay, you cannot oppose one to the other. It's just a level of maturity for me.
00:39:51
Speaker
Yeah, I would echo what JGP mentioned. I mean, here's here's the deal is, um I think data mesh in Xamax grand vision, which I strongly support, is we're trying to bring agile to data, we want to make data more agile.

The Role of Data Mesh in Agile Data Products

00:40:08
Speaker
And she she she advocates ah federated architecture, federated organization structures that gives you faster decision-making, which leads to faster time to market, delegated ownership to those closest to the data, which is what gives you the faster decision-making setter. Nobody's against that. Everybody wants their organization to go better, faster, and if you go faster, maybe even cheaper.
00:40:33
Speaker
DataMesh gives you the way to look at that. And if you boil it all down, it is all about going from centralized to federate. And the and question you have is, how do you do that? now DataMesh has a recipe. we had We've articulated in our book. Here's the the challenge that everybody has is you can think about DataMesh upfront and and recognize you're not building an enterprise DataMesh. You build it one data product at a time. The problem you have, if you don't think about team structure, operating model, some semblance of a roadmap, what you're going to have is the first one to five data products are easy to manage.
00:41:12
Speaker
okay yeah The next five to 20? Not so easy. Now that's where you start to get the so-called data mess, as everybody's calling it. When you go from 20 plus, it is near impossible to manage these things without actually thinking about an operating model, team structure, common interfaces, data products, data contracts.
00:41:31
Speaker
All of a sudden you realize that data mesh is just the the it is the thing that brings these things together. It is perhaps the infrastructure, whether it's based on you know airflow data warehouses or DBT or otherwise that provides the data movement.
00:41:48
Speaker
But it's a way of organizing your teams to get to make data agile. That's the key here. So so people call it data mesh, and then they evolve that to, well, data mesh is just, I'm just building a lot of data products. Then if I have data products, if I do them well, I have data contracts and maybe data factories to build them at scale.
00:42:08
Speaker
You're still doing data mesh, you just may not call them the same things, but when you start to get to so anything so resembling data products at scale, you have to, have to think about all the things that Zmack had in her book that we actually have articulated and they made very specifically provided guidance on how to actually do this. This book is really about data products and data contracts at scale.
00:42:32
Speaker
It just happens to be in a framework called data mesh that makes it easier to you find data, consume data, share data, and trust data. And that's what this is all about. So so maybe it's called differently. It doesn't bother me one way or another. But if you don't think about these concepts, which we call data mesh, you're going to have, yeah, if you don't think about them upfront, you're going to think about them at your 21st data product when you have that proverbial mess.
00:42:56
Speaker
And that you really, that that's when you really have a problem. So, so that, that's my perspective. Things are just evolving in that. Eric, this is one of the most fantastic and easiest explanation of data mesh I ever heard. Sorry, GGB. It wasn't you. I have to credit Eric. yeah There we go. It's fine. Okay. Don't make it too easy, Eric, because I need to see my book. Okay. So.
00:43:23
Speaker
oh No, no, no, this is fantastic, but I have a question like always, this you know. So, ah and this is actually helps me to understand lots of things like the Eric's explanation about data product scale. Of course, you can have data product, but how they interconnected, how they govern.
00:43:45
Speaker
hot work you pge state and How do you manage that at scale? Here's a question. So what is, you mentioned, and I wrote it down precisely, data mesh is a deep structure, beautiful, love it. Operational model and UI, common enterprise. Here's a question.
00:44:07
Speaker
Why is it, what did it all have to do with data platform? Because more and more teams out there and organization, organizations started to have data platform teams. So basically they are putting together, they have team, okay, they have architecture decisions, you know, which tools they are using and everything. They started to have operational model, for sure, budgets and and you know, responsible for the cost of this data platform.
00:44:37
Speaker
And now, of course, why? um So what is the difference between data mesh and data platform teams? If there are any? No, there's there's there's I think there's a ah distinct difference between the two. they're they're therere They collaborate. So um we we when we look at the operating model and the team structure for data mesh and data products in particular, we we rely heavily on the team topologies book by Matthew Skelton in Manual Pass.
00:45:06
Speaker
I recommend that book to anybody that actually is is doing any semblance of data work. It is probably the seminal book on how you should be structuring your teams. Zemeck highlighted it in her book. ah we We use it extensively in ours also. So so we're building, as they say, we're we're standing on the shoulders of giants as we talk about how to do how to how to lay this out.
00:45:30
Speaker
So here's here's how they have laid it out. um And I would argue that this is probably the best way of looking at it. They they have a variety of different teams. One is a stream-aligned team. These are the folks that do the primary work that that the business actually wants to have done. They're supported by platform teams. They provide the infrastructure. In our case, that may be data platform you your your database. It may be your airflow. It may be your DBT.
00:45:55
Speaker
It may be if you're in a in hyper heavy analytics environment, the Dask environment, all all those type of capabilities. That's what the platform team does. and okay But if you understand what what that means is they have very little, they have some exposure to the business. In fact, the best platform teams have you know as much knowledge knowledge about the business as possible.
00:46:15
Speaker
That's not their job. Their job is to make sure that Airflow DBT works at scale in a fashion that the biz that the consumers actually want. So their job is very different. The data product teams are actually the teams that do the business work. Those are the folks that ingest the data, do some transformations, build some models, and then make it available to the consuming teams, which again, there's ah perhaps a pipeline of those things that are available.
00:46:40
Speaker
The data product team is close to the business. In fact, the best data product teams have knowledge of the business. They're supported by a data platform team. They're also supported by an enabling set of teams. There could be a training team.
00:46:54
Speaker
yeah These are these other teams that may not make sense to put inside each individual data product team, but rather you you want to have some economies of scale, if you will. So maybe you a trained team. You have a governance team that provides the enterprise or global standards, privacy, security, or otherwise that every data product needs to adhere to, but they get implemented in the data product team. So there's a variety of teams that actually need to interact to actually make this work.
00:47:21
Speaker
The core in data mesh is the data product team, the stream align team, the one that's closest to the business that implements, makes data easy to find, share, consume, and trust. They're supported by the data platform team. There is a a very clear distinction. The challenge that Zammak tried to address is, for a variety of reasons,
00:47:41
Speaker
we We tended to centralize all this capability in one place and what would end up happening was things got really slow, got gummed up, got very, very expensive and didn't scale. So so we we use the team topology structure to outline how an organization can actually scale on top of a platform technical capability that scales. And I think that's the the key difference. If all you have is a data product team, what you have is maybe a very fast ah streamlined airflow or DBT environment, but it doesn't deliver the value until you actually have a data product team that actually surfaces that value and makes use of it. Thank you, Eric. That was a really insightful explanation,
00:48:26
Speaker
each of you. Do you want to add anything? But I still have a question like if you allow me. No, no, I think I think that let's go to the next question. I'm I'm yeah, but you know, that's what you think that no, no, but that's why that's why Eric and I are partnered on this project. Okay, I think we share we share a lot of the we In some details, we might have a different perspective on stuff, but it's really minor compared to the global alignment that we're having. And that's whys that's why we're still talking to each other after writing a book. Yeah,
00:49:06
Speaker
um but yeah this is interesting. And I kind of asked you yeah before jumping on a call, how do you share the book and slash your relationship? um And you answered that it became better, as you got to know they each other ah better. have i But i I'm going to say something, I think, yeah I found a new friend ah as a result of doing it. Oh, So not only mutual respect, ah but ah gp GGP is just a cool guy. And yeah it's true I'm glad to to have my name on in the same book as him.
00:49:44
Speaker
Well, it's definitely the shared feeling. And we still have to get together because we did not get here for you to do that. One day I will be able to travel a bit more. I'm sorry, you guys still haven't met in person. We have not. Without great detail, I have some challenges right now that preclude travel. And JGP's been so kind and accommodating.
00:50:13
Speaker
But online has worked and we will meet in person. That's, that's, you know, and then you get a, you know, finally, uh, have it. Yeah. Okay. That's interesting to know. Um, no, really inspiring guys. You met online, you know, some people have romantic relationship, but you did the book together.
00:50:39
Speaker
Yulia, you cannot talk like that to everybody, okay? No, I hope it and just reminded me about Scott's remarks we had on the previous shows, you know, officially with you. That was not me, that was Scott talking with me. Okay, so this is really interesting. Folks, that's a question that bothers me about data mesh.
00:51:06
Speaker
um I can see why everyone is starting and with data products because the concept and or or the concept of that um the plan is easier to understand about data products. And even with data contracts, which actually I think is a little bit more advanced than just data products, but it's, you know,
00:51:25
Speaker
kind of layers, right? You have data products and data contracts and then with a lot of this you have data mesh. So the question is, when do you start thinking about data mesh? How many data products? Eric kind of mentioned it, but is it just about number of data products or what are other dimensions that you need to consider starting moving towards data mesh?
00:51:56
Speaker
JGP why don't you start I don't think it's a question of one TT but And you've got to understand what services you are expecting from the match So so for example One of just example is in data catalog. Okay ah is really are you going to capture all your data products into one data product or ah into one data catalog log or multiple data catalogs. I've been in an organization where they had multiple data catalogs and none of them were really is the source of truth.
00:52:42
Speaker
oh you know, merger acquisitions, ah new business units coming in, ways of working, political strategies, et cetera. Okay. So it's not, it's not as obvious as saying, Hey, we've got 100 price data.
00:52:57
Speaker
um but It also depends on the richness of what you also have. okay So in in that way, it's it's also I think it's also very interesting to ah to to to to see what what da up what is happening here. So so I think it's it all comes down to the maturity of of so of the different of the of the company.
00:53:22
Speaker
one thing we did not do in the book I don't think it's a regret because I think it would be too early to do it or to do it at the this level of detail is and I kind of wanted to think about it as a maturity model okay but the thing is really Do we want to have a maturity model, a detailed maturity model, or do we want to give the tools to be able to build your maturity

Creating Maturity Models for Data Mesh

00:53:47
Speaker
model? And I think we choose ah the way of an Eric, you can tell me if I'm completely wrong here, but to give the tools to define your maturity model for your company. Otherwise it would be too, something like too, too much in a box or too much canned in a way. Um, so.
00:54:04
Speaker
Yeah, I would agree. how Yeah. How do tools define your maturity level? Well, when you when, you know, um ah a maturity level is really whether, or yeah okay wizard am I ready? I've got this this knowledge of data products. okay is When am I going to think about data mesh? Is it a maturity low level question, or is it a quantity question, or is it a skills question? okay So I think it's a little bit of all of that. And that's what we tried to put in the book, rather than saying, okay
00:54:44
Speaker
it's this is this is where you are okay find your find find your position on the maturity model okay and you want to go to this step so this is what you've got to do and this is not what we wanted to do with the book okay so yeah
00:55:00
Speaker
so i just like I just wanted to build on what GGP mentioned. so There's a lot of maturity models out there. that are One of the most famous, I suppose, is the CMM, the Carnegie Mellon, the capability maturity model for software. it's often a Maturity model makes sense when there's been a body of practice where you can actually define what it means to be, for example, level one through level five.
00:55:25
Speaker
um We're not at that point yet in the data mesh journey. um People are still talking about data products. It hasn't gone through that trough of disillusionment or whatever that Gartner calls it yet, but that's inevitable. um And that usually happens and it comes back out when when people have started to realize that there is real value and that the problems of scale need to be addressed before you can actually become realize the value. So maturity model, is as JGP highlighted, is something that might be a great book a year from now.
00:55:54
Speaker
for a year and So who knows JB, JGP, maybe that's our second book together, you never know. But I think it's it's a good problem to have. But I wanted to come back to kind of one of the the things um that that I find that will drive you to the point where you can start to think about data products at scale.
00:56:13
Speaker
um At the end of the day, when you start your journey, you're not building a data mesh. You're not. In fact, if you start to build a data mesh and all that infrastructure and capability beyond what you already have, you're wasting your money. That's what you're doing. In fact, when you're building your data mesh, you don't even know but that you're building your you're building your first set of data products. And that first set of data products will eventually become your data mesh. And when you have a certain amount of those, you'll realize that you need to standardize certain things. You need to introduce data contracts. so so So data mesh is actually solving a problem of scale. The industry, there's only a few, and this is why I think maybe come back to the previous question, where is why is data mesh declining? is Because people are starting to realize that you can't build your intergalactic data mesh, you know your enterprise data mesh in a meaningful way until you've built a bunch of data products and understand what are the common capabilities that you actually need. so So I think the key here is recognition that you're building your first of hopefully many data products, but you need to think about the things that prepare you for scale. There's no point in having your 21st data product and realizing that you can't scale and you have to now spend a million bucks.
00:57:31
Speaker
you want to prepare for that, which is where our book comes in. So we're actually, this book is is for those that are starting your data mesh journey and want to make sure that you can build your 21st data product effectively and plan accordingly. And even if you're at your 21st, we give you, we give hopefully we give you some insight to actually fix some of the challenges that you may have. um But this is really,
00:57:54
Speaker
It's a question of ah a longer term journey. And Data Mesh has, like I said, the socio-technical capability that will help you address ah this and actually allow you to build data contracts and data products at scale. And I think that's the fundamental value.
00:58:12
Speaker
Every, today, it was your day to shine. Absolutely. Like I love every of your explanation. Well, beyond self data is being on.
00:58:23
Speaker
you know, in board today, unfortunately. yeah So i think I think we need to come up with some celebrity naming for Eric. What do you think the answer? Sure. I'm sorry I bored you, Julia. better well Well, you know, you have to keep up. There is a certain expectations I had. Well, I'm sorry to be honest here because Eric was, you know, kind of,
00:58:53
Speaker
all shining and beaming and Eric, it's so nice to see the state of yours. I'm happy I let Eric shine today. so you know it's a It's teamwork. I can't always be the sun.
00:59:13
Speaker
Well, this is all what the assembly is supposed to do. Just let me remind you of a different dimension in your life. But anyways, whatever, whatever feels right to you. So folks, I have a suggestion to you and I know that I'm going to cut you off guard right now. Why won't we, and if you're against it, I can sponsor it. Play out your book. Do you want to play out your book?
00:59:42
Speaker
play oh you you but what do you be mean but no no drug the performance but um I kind of, we can ask for the audience to came up with some celebrity name for everything. The best one he picks up. with Oh, I you know ah yeah we i think i figure that that's that that that's that's a great idea. oh Yeah, let's let's let's let's let's let's do that.
01:00:12
Speaker
um we we yeah We might not be able to give a physical copy outside of the Americas, okay?
01:00:24
Speaker
of the book part of that, but but definitely an ebook for the rest of the world. And if you're in the US or Canada, we can definitely send you ah send you a copy of the book. Yeah. Okay. So the contest or is it the contest like play out? How do you say it?
01:00:41
Speaker
Yeah, it's a context, yeah, it's a context. Okay, the context. So you have to let us know the celebrity name for Eric, but to do so, you obviously need to listen to the podcast and hear how smart and intelligent Eric is. Yes, so the result of the best nickname for Eric, can it be a physical or ebook or both? I don't know. um Yeah, but I'm excited. I don't know how you feel about it. Eric, are you excited about it? You know what? i i very I'm very open-minded. I don't know if we can talk at top you know ah the Beyonce of data, though. that's ah i don't I don't know if anything can be that one. That's ah that's a formidable term.
01:01:31
Speaker
I am not sure there is anything that could be better than the answer of data. remember i heard ive read table swift is pretty much but so thatboard for you in and I don't know if I could call call myself the Taylor Swift of data, I think. She is ah she is far too fantastic for for an me to be associated with her.
01:01:58
Speaker
ah Well, folks, as you can hear, the Taylor, um, the Taylor Studio video is taken. Uh, but yeah, uh, thank you so much for everyone who listed out, who listened to us. And, uh, I was so glad to host both of you. And, um, yeah, thank you so much for stopping by and sharing the insights about you. book Julia, thank you very much for having us, okay? i it's always It's always a pleasure to have felt a little ah teasing session, a teasing mutual session. I look forward to the next one. ah Eric, it was great to see you today and um well, thanks guys.
01:02:43
Speaker
Yeah. Likewise, Zulia, thank you very much for having us here. Great questions. And thank you very much for giving us the opportunity to talk about our book. i And I look forward to hopefully many more discussions. Absolutely. and My pleasure. And you stay here.