Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Andrew Jones - Reliable Data Platform image

Andrew Jones - Reliable Data Platform

Straight Data Talk
Avatar
43 Plays6 months ago

Andrew Jones, principal engineer at GoCardless, is the author of the book "Driving Data Quality with Data Contracts." During this session, we talked a lot about what a data platform is, who data platform engineers are, what it takes to make a data platform reliable, and, most importantly, how Andrew and his team managed to build a reliable platform at GoCardless. Sure enough, we touched a little on data contracts, their implementation, and the possibility of vendors doing the same as Andrew's team did.

Andrew's LinkedIn - https://www.linkedin.com/in/andrewrhysjones/

Recommended
Transcript

Introducing '3D Data Talk' and Its Purpose

00:00:03
Speaker
Hi, I'm Hewlett Kuchova, CEO and co-founder at Mesh Radio. Hi, I'm Scott Herlman. I'm a data industry analyst and consultant and the host of Data Mesh Radio. We're launching a podcast called 3D Data Talk and it's all about how you can data field and how this hype actually meets the reality. We invite interesting guests who are first of all data practitioners to tell us their are stories, how they are putting data into action and extracting value from it. But we also want to learn in their wins and struggles as a matter of fact. And as Ilya said, we're we're talking with these really interesting folks and that a lot of people don't necessarily have access to and that these awesome conversations are typically happening behind closed doors. And so we want to take those wins and losses, those those struggles, as well as the the the big value that they're getting and bring those to light so that others can and can learn from those.

Guest Introduction: Andrew Jones and Data Contracts

00:00:53
Speaker
And we're going to work to kind of distill those down into those insights so that you can apply
00:00:58
Speaker
these amazing learnings from these really interesting and fun people and apply them to your own organizations to drive significant value from data. Yeah, so every conversation is not scripted in a very friendly and casual way. So yes, this is us. um Meet our next guest, and yeah, I'm very excited. Hi, everyone. and ah It's a straight data talk, and today we have The only one, it's the one, the one, the only, the one, the only, yeah, the one. And jean and ah we are so thrilled to have you here together our with Scott. And ah just a quick reminder, Andrew, Andrew Jones, the author of the book, ah driving data quality with data contracts.
00:01:49
Speaker
And the one person, the one and only who coined the term data contracts, Andrew is so excited to have you here. And I am so thrilled you know to talk about quite a new topic with you than data contracts. So yeah, forward had to introduce yourself briefly, and then we will lead into the work kind of topic we're going to have today. Yeah, sure. Yeah. So, i'm man I'm mostly known for talking about date contracts and calling date contracts. um But because of my job really is building date platforms. And that's where, that's where the event date contracts, that's where the ideas around date contracts came from, ideas around building date platform, how date platforms should be built to enable quality data, enable people to deliver quality data, enable people to build bio on top of that quality data. um So yeah, no date contracts, but looking forward to talking a bit more about date platforms today as well.

Data Platform Reliability vs. Quality

00:02:40
Speaker
Yeah, so this is basically the reason why we haven't called with you, and I'm so excited about it, is we're going to be talking more about data platform reliability and ah data platform overall. And just, you know, as I mentioned, the transition should be smoother here, but ah You know what, so we have this common friend, Beyonce of Data, ah also known as GGP or Jean-Jacques. This is exactly what the reaction that she found.
00:03:19
Speaker
So, ah a I talked to him before contacting you and asked him if he would be okay to have a podcast together with you, because I was so mature whenever we talked to you, you're going to be saying, okay, let's talk about data contracts. And then he told me, oh no, I want to be talking about data platform and its reliability. And I was like, gosh, this is like exclusive. I am so excited because I think it's most real in topic. So sorry, GG video, not here, but there is a reason for that. he
00:03:52
Speaker
Okay, so first of all, you know what I did?

Journey and Challenges in Developing Data Contracts

00:03:56
Speaker
I read all your media on the post. from the most earliest one to the latest one. And one of the things that sticks with me that, I don't remember the date, sorry about that, but you started talking about data contracts and how they improve data quality. But over time, and when I say over time, within three or four hours, you started to include data reliability in there as well. So the data contracts approve not data quality only, but also data reliability.
00:04:29
Speaker
And that leads me to your question, what is the difference between data quality and data reliability? Yeah, that's a good question. I think data quality is one of those terms that it covers quite a lot of things and people use it to cover a lot of different things. Sometimes I mean just, is it time enough? Is it complete enough? Those kind of data quality dimensions. And sometimes it covers it to mean anything about data, like is it well-designed, well-modeled. Is it doesn't meet requirements? Bose could also be part of day quality. Day quality is one of those terms. It's quite broad. but Most people when they talk about day quality think about those kinds of dimensions, timeliness, completeness, those kinds of things. With reliability, I think that's more about
00:05:14
Speaker
it's a meeting Is it reliable enough for me to build what I need to build on top of it? Does um does it have SLOs around it? Does it meet those expectations? Does it have good change management

Significance of Reliable Data in Business Applications

00:05:24
Speaker
around it? So it's not changing overnight and breaking my fingers unexpectedly with some sort of change management with some sort of migration path from things breaking changes. all those kind of things like is it reliable enough is it dependable enough for me to use it to build this this process i want to build to build this product feature i want to build whatever it is i'm trying to do i think those must come i mean by reliability and actually for us when we start thinking about day contracts and the problems we wanted to solve most of them were around reliability so why is it
00:05:57
Speaker
the data structure changing overnight and breaking things and affecting important processes or really generating features. It wasn't really about like, like, is it, like, is completeness not accurate? Is it not complete enough? Is it not time enough? Those kind of things. That's still important, but the things that are causing most instances where I worked when I was thinking about that contracts and problems I want to solve were mostly around reliability and whether we could build on it with but confidence. and And I think that reliability, the root of that word is to rely, right? Rely on. And that's that's the what so many people are doing data for the sake of data work, right? Like they're doing data work because they think data work inherently has value. And it's like,
00:06:42
Speaker
You know, can somebody use it is one question, but can you get somebody to rely on it? And how do you communicate that reliability? So exactly those SLOs and monitoring but for those ah SLOs and being like, Hey, what does this mean? I think it's it's such an important topic. So I know youa you've got all the questions, so I'm going to, I'm going to kick to you for questions, but I think that's a really that can I rely on this? is the business partner question that data people don't focus on enough. And that's that's the root of exactly what you were talking about. Not just, am I building something that sound? It's, am I building something that someone can rely on?
00:07:18
Speaker
Yeah, that's right. And it's particularly important now when you're using data for more and more important things that need to be more reliable. So is it driving that product feature, maybe directly or maybe via AI or whatever? Is it driving some internal process that's really critical for business? Maybe it's around full detection or around movement of money or something like that, or something about regulatory issues, like they need to be reliable. And the data to help how those features of those applications also need to be reliable.
00:07:50
Speaker
um But I have so many questions, okay? But before we jump in further into, you know, why does this connection happen? it's And we can talk hours about that. The thing that you were talking about, it's more, what should refer to data reliability is actually resonated what I would imply in data platform reliability. Is it the same when we're talking um about, you know, but like basically the question is, if data reliability is the same as data platform reliability?

Platform vs. Data Product Reliability

00:08:27
Speaker
Um, I would say not quite the same. I say the platform itself needs to be reliable. I need to meet a certain level of reliability. The platform also needs to provide the features, the components to allow you to build reliable data products on top of it, or data sets, or data applications, or whatever you want to be building. Um, but when you're designing your, let's call it a data product, when you're designing a data product that needs to meet reliability goal. You need a platform that allows you to build that, but you also need to think about how you're going to build it and architecture that product itself in a way that is reliable. So I think there's two different things there. There's a platform that needs to be reliable itself and needs to provide capabilities to build reliable things on top of it. But then the actual thing you're building, the data product you're building, that needs to be architecture in a way that is also reliable, using those components from platform in the right way.
00:09:22
Speaker
and how do you ready so ah ahead you there go say now How do you think about communicating that reliability? Because I think that's really important to, again, engender that trust to get people to rely on it. How do you think about, does the platform have to do that? you know People have talked about, do you have standards for measuring your data quality um so that everything is measured in the same way? Or ah how do you think about communicating that? Yulia and I have talked about this many times, but like how do you and ensure that you're you're telling someone when something has has broken or that you're monitoring that you're doing actual reliability engineering practices and you know data reliability engineering. Some people think it's engineering for data reliability versus reliability engineering for data. And those are two kind of separate practices. So like how do you think about
00:10:10
Speaker
what has to communicate what and and who is getting those alerts because you know you don't want to alert your consumer every single time there may be an issue that you're looking into versus we have breached or you know you don't want to have constant alerts because if there isn't any immediate action once there's an alert, then you know you're just creating noise. so like How do you think about that? Yeah, that's a really good question, because is it isn't really a about the communication of that reliability. If the absence of that communication, typically, people who depend on something, the data, the platform, whatever it is, if it's not well communicated what reliability of that is, ten people tend to assume the best and they tend to be very optimistic. They look at what's happened in the last
00:10:53
Speaker
day or week or however long I've been looking at my data. but It always seems to be there within a couple of minutes and it's never gone down. I'll assume that's always the case and I'll build my my application with those assumptions in mind and that should be good. But then in practice, But when you realize that those assumptions were incorrect, then things start breaking, things start going down and your application itself becomes unreliable. So your users are not happy and they start doing trust in your application. When you go upstream and you're not happy with providing the data and it tends to
00:11:30
Speaker
tend to go wrong. People go unhappy, people start losing trust in each other, and that tends to be a big incident. I think that communication is important, or very important. The way I like to do that is by having the provider of the data in this case, or of the platform, if you look at that platform, provider of the thing they're providing. They need to set reputations. They need to be clear what they are setting and what they are committing to. So are they committing to a particular SLO around timeliness or completeness? Are they going to be on call for it, office hours or not? Those kind of things. so And they can define those
00:12:05
Speaker
in-day contract, and that's what we do. So they can define various kind of SLIs, their kind of expectations, in-day contract. So therefore it's codified. And then you can expose that to any anyone using the data. You search for data in that catalog or something, and you can see your SLA, and you can make an informed decision as a user, it's just reliable enough for my needs. And oftentimes you'd like it to be more reliable, but at least if you know it's not known on call for it, you can design applications around that. And so as long as you know, as long as it's well communicated, as long as it's explicit, um you can as a user of the data of this platform, whatever it's depending on as a user, you can you can work that, you can make informed choices.
00:12:46
Speaker
Listen, I have a question, and I understood that there is also maybe there is a misalignment in how I understand data platform, and it could be different.

Data Platform Components and SLOs

00:12:56
Speaker
What I just understood, you know, while you were talking, so basically there is data platform which collects the streamlines of data, then models it, and then gives ability for data consumers to create in their products. And so when you say data platform, do you imply the back part, like basically at the backend, which un includes, let's say at your organization, Google BigQuery pops up and then dataflow, whatever you guys use it up there on the backend. But also there is a UI and interface for data consumers. So when do you say data platform? Do you imply the entire system or you are talking about the front end?
00:13:35
Speaker
front-end, you know, like, not really front-end, but the data consumer facing part. Yeah. So what I'm talking about really here is the backend a bit, like the components, the part you're using from the counter, for example. So in our case, you know, like I said, it's the Google base, it's BigQuery and PubSub, things like that. So how do those how do you use those from your cloud platform? How do you use those in a way to build, um to provide tooling to your users so they can build data applications? So really a back-end. Some people want me to write a platform, like you're saying, they talk about the whole thing, which includes matching all the data and things like that. I see that as ah different. Really, I'm talking about the foundational platform that people then can build a platform. Yeah. Okay. Okay. That makes sense to me. One of the things that I had, for you
00:14:24
Speaker
questions, right? How you mentioned that you have a SLO implemented in data contract. um What are actual metrics of that SLO? So what we do is the owner of the date contract, the date producer, they can define the SLOs and they can say it's going to be ah this timely, this complete and things like that. They can define in there. We do a couple of things with that and we're still early days in the implementation so it's a bit um
00:14:56
Speaker
What I'm talking about here is a bit of what we're doing, a bit aspirational still, but this is what we plan to do as well. So the only thing to find is if we make that visible in that catalog, so that people who search for data, they can see what's been defined. We're also looking at how we can monitor those SLOs in real time and send alerts to the data in FFL. So deploying what we call like a sidecar alongside the infrastructure, which is basically a small service within the data platform that's deployed for every day contract. It kind of listens into the data, watches it as it goes through, collects metrics about how it's performing, about how timely it is, for example, and
00:15:37
Speaker
checks out over time. And if that falls below a certain threshold, WSLO will send alerts to the data owner. And those alerts will go through exactly as they get any other kind of platform level alert. So in that case, it might be via some century, doesn't really matter, but the same way that any and alert goes to, in that case, software engineering team, both get alert when their data is falling behind the SLO that they have defined. That's kind of cool in a mature way, but it also sounds like a lot of heavy lifting. I can understand, in your case, it's very much necessary in the case of your business because you're dealing with fraud. It's all kind of a financial ah a financial institution as far as I understood. One thing that I have
00:16:28
Speaker
question for you, because you also mentioned that in your medium ah but blog post that it took you two years to actually pull the data into data platform. And then you spend another one year to make it reliable, to have all the things implemented in case of data loss, like how's that feature called in PubSub, like data retention time to implement all of those things, which is a lot of work. I assume it took a long time to implement data contracts as well to collaborate with all the yeah um seems external teams to today, the team. My question to you, do you envision a third party vendor that are somehow there, you obviously to know them, that provide data contract coverage to come in and perform the same miracle as you did in your organization, tailored to your business?

Implementing and Overcoming Data Contract Challenges

00:17:21
Speaker
Maybe not 100%, but what portion of that they could cover out of the box.
00:17:27
Speaker
Yes, I think the reason why it took time to invent, it wasn't really on the tech side and platform side. That was pretty easy for us to implement. All we're doing really is kind of gluing together on like the Google cloud stuff and with deploying small services in Kubernetes that lists the data and and tracks it to like Prometheus or whatever. It's all fairly simple stuff. The platform itself is in fact, isn't where the time was spent. It's mainly on the people's side. So what we're saying is like, as data owners, you now need to start to find your slows, you need to start being pleased about what you're fighting. That was something they weren't doing before, just mainly on the culture side, I think, that um ah takes the time, I think.
00:18:08
Speaker
It does help that we have a fairly mature platform fact already. um We call it a core platform, which is basically, yeah, it's got things like Kubernetes, it's got some instructions code for those kinds of things already. And we just built our data platform on top of that. So we can't benefit from all of that investment already. um And that one actually for us is in built, but Open Source 1 is like preliminary things that exist as well. So you can use Open Source versions. so ah well like friday versions
00:18:39
Speaker
so build on top of that helps you have build all yourself um and you'll stick together components from your cloud as well A lot of time is spent um yeah on people's side but tells like what a vendor could do to make it easier what if a party provider could do to make it easier they try to provide this ah lot people try and do something like this in a space, I think. I think they're interesting. I keep an eye on many of them. I think it's interesting. Not just in the data type also. On the platform engineering side, there's a lot of innovation happening over there in terms of how to make it easy to build these platforms. Similar to what we do, I stitch things together by just making that easier and providing abstraction above that. I think they're all very interesting. I think the ones that are most likely to exceed are the ones that have a clear idea of what they are providing and to who they are providing it to. A lot of people, for example, just focusing on data contracts tooling.
00:19:25
Speaker
It could be that they are trying to sell it to like data people and get data people to use it. But if a lot of data you're using isn't provided by data, people are provided by engineering teams, for example, or provided by Salesforce teams, provided by whoever else, not data people, like the source, right? As less as you can go, the actual source comes from. If the tooling is designed for them to use, then they're not going to use it. So I think Yeah, if I was a vendor, if I was going to try to build something like myself, or if I was going to engage with a vendor, I'd be like, okay, these are the problems I want to solve, and these are people I want to use your tool. Are you building it for them, or are you building it for someone else? And that's where I think a lot of... This is really important. I love it. And one of the things I don't know, it's good to remember, when we were talking to as well from the level, while there are data reliability platforms that they are building at home, at home, in the house.
00:20:21
Speaker
In the house, the funny thing that he mentioned that business stakeholders do not care about reliability, what they care about what she did to make data reliable. They care about data trust, being able to trust the data and data accuracy. But how do you achieve that piece? They do not care about that. like He precisely mentioned data observability piece. Yeah, so she was like, they don't care about that. They care about the outcome on what kind of data, what what what kind of the quote jobs that they receive, basically. And this is an interesting idea because ah like you know i'm basically I'm a so I'm founder of a solution that helps team achieve data, available data platforms. And I don't see myself go into marketing team or sales team explaining how I'm gonna help them to have more reliable data. Like it's not happening. It's interior, sounds great, but it's not just just not gonna happen. I still have to convince data to team who will tell me, oh, Julia, but we we have these data boards. Like literally I had this call today
00:21:32
Speaker
The studio was showing me dashboards and I was like, do you guys use it? Do you really see something happening in there to catch anomalies? He was like, but we have it. And this is as a winner and the sentence, we cannot cover 100% of the issues they have, but My question is, is it really viable to be weather out there? like Because you have to convince the business that basically you shouldn't be investing or investing not that much money to build the same house.

Vendor Solutions for Data Contracts: Potential and Limitations

00:22:14
Speaker
And you actually what you did, you guys build that at house. like
00:22:18
Speaker
and to and But my question is, and we are like, obviously we're not doing the same as you build, like we are in different space, but do you really believe that there is a wonder that can can come in? There is lots of technical challenges as well. It's not it's easy in theory again, but for vendor, it's going to be a different different game to come in and do the same as you did, or at least 40%. I'm not even talking about 50. Is it possible? and Can I augment the question just a little bit? Because I have this phrase of of data people building data shit for data people. and that's kind of like Do you see it as can a vendor come in and sell it to only the data team?
00:23:02
Speaker
versus someone like you that's integrating into the platform thinking of the organization because like, is the data platform going to be separate from the platform going forward or is it all going to be because, you know, can somebody, can a vendor come in and actually do this if you were just pointing it only at the users being the data team? And then can they do it from kind of your you are perspective? Because those are two different perspectives, I think. Yuli, is that okay to augment the question? Yes, of course. And I'm i' like trying to look at myself in sideways, Andrew. I'm not trying to put you on the train. No, don't look at me like this. i'm just I just understand how much work you've done.
00:23:42
Speaker
And being a certified winner, I also know the hurdle of how to come in and and do something for the organization. I just think it's super difficult and I would love to get your opinion on that as you have done that miracle in-house. Yeah, so I think the best vendors when they come in and they're looking, and they're working with you to understand how that tool solves problems you have. Not just trying to try and sell the tool to you, not just like you're trying to solve a symptom of problem, but really trying to understand the deep problem you're trying to solve and how that tool helps solve that problem.
00:24:18
Speaker
I think we get and we work with vendors like this and yeah but not um mean obviously they are trying to sell but they are are deeply committed to solving problems that we have identified and we've gone to them and said these are our problems, can you solve it for us? And we're honest when they can solve problems and we're honest with problems that slightly outside their reach is not be part of a remit. They're not going to change the whole tool just to meet our economy in that particular case, but they are willing to understand it deeply, and they are honest about whether they think we have value, and we have a conversation about that. And yeah, there's a couple of spenders I work with that I like that, um and very, very, very, very, kind of relationship relationships I want with a vendor. So I think they can, I think, and ah but although we've built it all in a-house, I think,
00:25:06
Speaker
in terms of like whether it can be built in-house where we can just go and buy something like this, work on how much it might cover. That's an interesting question because
00:25:15
Speaker
I've got two opinions, really. On one hand, I'm thinking, like, we all start from the same problems. The only difference is our data. And most of our companies are are, we're not unique. um But there's a few hu da companies like yeah Meta and and people are that like that. They're all dealing with data at a different scale than almost everyone else. Family numbers and all that Google likes. But for most of us, our data is not massive. It's not that complicated. The issues are, we face it all very similar. So in theory, you should be able to find tooling that would solve many of these use cases. On the other hand, I think every organization is slightly different in terms of how it's set up, the culture they've got that's evolved over many years. Change of our culture is going to take years probably, if you want to try and change it, like we have tried to do to change our culture to be more, we've been a bit more disciplined on data upfront, and we're trying to shift things left and take more than we should have done. That's the culture changer, that's taken a while. And
00:26:13
Speaker
Although our problems are the same as someone else, it takes but can commitment takes a long-term view and a constant talking and a constant change of culture to push it through. And that's a bit difficult to sell in a as a vendor trying to sell that as A couple of calls with a solution engineer, it's a bit harder to do, I think. So yeah, I'm in two minds, really. I think our problems are very similar in all the different companies that are dealing with data. And what we want to achieve with data is very much the same. So in theory, the solution should be the same, but in practice, every organization is started different and they all need a bit of help to work out how to apply those solutions to their particular culture, to their particular organization.
00:26:56
Speaker
um I love how you highlight highlighted it because there are lots of moving parts in organizations. It can also lead to somebody's, you know, it's actually a living organism. Yeah, in a way, people can be not in more, people can be about to leave the organization and all of that influence even how the data culture being adopted. And I just genuinely love how you look at data contracts that this is a big culture shift rather than tool shift. And you emphasize that any of your blog posts that really have so much admiration for you because of that is just how CBU data contract is. It's not about the solution. It's the same as we are agile.
00:27:46
Speaker
saying that once we implemented JIRA, we are agile organization. It's literally the same. So yeah, I do agree with your point. There is no size that fits all, and we can all use some kind of help. Depends how much the organization needs help. I also was wondering, because you tell so much about data platform. And one of the things that I you know I also stumbled upon Tinkai meme today about data platform engineers and other data folks in one t-shirt, like two kids in one t-shirt and the t-shirt has a sign, ah get a loan shirt, like two really upset kids in one big t-shirt.
00:28:38
Speaker
yeah i didn't It was a t-shirt of getting along. right so as It was two kids that had been fighting and it was a two-headed t-shirt. Then he put, I think, data platform team and data product developers or data product owners or something like that. My question to you, like I would need to pull this meme here, but my question to you is, who are those data platform engineers? Who are those people? Tell us. Yeah, think I think they are different to state engineers. I think, at least in my team, and this has worked really well for us, is they tend to have more of a software engineering background, or more of a platform engineering background, or DevOps background. And they build they build platforms, and they want to build the foundational platforms that enable other people to do great work on top of them, and do so with autonomy, and do so um really in certain guardrails, which is good for state governance and things like that. They kind of want to provide those platforms. and
00:29:34
Speaker
can have a backend for additional platforms. I think it's more difficult. It's not impossible. We do have people moving in, for example, someone recently moved from data engineering into our team, and she's going to be learning all about platform stuff, but she brings with her fire i've had a lot of knowledge about exactly how the data teams are building these data products and how they want to build these data products in the future. So she's bringing a lot of great context with us, and she, with her, she's receiving more data into our team. So that's really good. And she can learn all the or the platform side. It's not impossible to learn if you come from that background area, maybe a side of analytics, move towards data engineering, now moving towards some platform engineering and software engineering. It's not impossible. And let's have the right skills in that team to allow you to build these data platforms that you wanna build. So make sure you've got those right skills.
00:30:27
Speaker
Do you think data platforms are more data or more platform? Like I think that that's the most loaded question of this, but like, you know, I interviewed some folks on data mesh radio a long time ago ah from nav, which is in in Norway. Um, and yeah, like their, their product manager. was from the data side, but their platform, I think it was their their product manager was from the data side, but their platform engineer was from the platform, you know, platform engineering and moved into data. And so it was like, you know, and he talked about, i you know, the product manager is like, I could make them build the most amazing data platform that no software engineer will ever want to touch.
00:31:05
Speaker
Like it will be the most performant, the most beautiful. So like, how do you think, because I've been having this, this problem with a lot of people, again, data people building data shit for data people um of, you said it earlier, who is your user? How do you think about building that platform that people want to use instead of it's just exposing the tools to them and it might give them some ability to better manage the tools versus that like. concept of what are they trying to accomplish and how do we make them accomplish that better? like what What have you read on that or who are you looking into? like Who should people be looking to when the if they're thinking about it as that? you know Who is your user of the platform and and why are you building this?
00:31:49
Speaker
Yeah, and that's a good question. I think it is more platform than data. ah You need to ah see partner with people using your platform and they might be there will be a lot of time data engineers. but Our data platform is used by lots of engineers and it powers a lot of our communication of data between services based architecture. That's all powered by our data platform and data contracts as well. So rebuild their contracts, not just to solve the issue of moving data from software engineering to data teams, but also from software engineering to software engineering teams, because they move data around a lot too. And they have the same issues around making sure it's reliable and making sure it's got SLOs and things like that. But yeah, I would say it's more of a platform, more platform than data, for sure. You can learn a lot about how users are using data. You can talk to them, they can tell you the requirements, but to build something that
00:32:40
Speaker
is a good platform that people want to use and that solves their problems and that influences behaviour in the right ways for where the organisation goals are. That is a platform engineering problem, a platform problem. And you asked about people who um I follow around this. i do I do read a lot about platform engineering, probably more so than I do about data. And together with teto Team Topology's book, it's a really good one for anyone building their platforms or even working in some sort of an environment team. Obviously went to a meetup in London by a talk by Greg, I hope. And that's, that's a really good talk to abstractions and platforms and behavior and things like that. And I don't think he's got a new book out as well. So those kinds of people, those platform from engineering people, people who are thinking about how platforms influence teams and behaviors and what's being built on those platforms. That's something I think about a lot. And yeah, those are two good places to start. If you're looking to learn more about aboutd this area. Andrea, have a question.
00:33:39
Speaker
How do you guys measure the return on investment? Basically, how do you measure, you mentioned a lot about the influence, right? How you guys help the team, but do you do any kind of attribution in terms of return on investment from the data products?

Measuring ROI and Aligning Data with Business Goals

00:33:57
Speaker
Different teams consume from the data platform. And if you do that, wouldn't be you know I would be thrilled to hear about that. I thought my question was loaded, but yeah, vasa that's a really good question and a really hard one. um It always is, when you're working in a management team or a platform team like this, it's often hard to measure your return on investment. But I spent years trying to work out how best to do this. Not just me, working with IBM and people like that. We have spent a long time trying to do this.
00:34:23
Speaker
and ah but you I put you down and got a perfect answer for you that everyone is going to do. Don't expect. But it is important, right? We need to have to measure our impact if we want to ask for more people or more resource to buy something or whatever it is. We need to be able to measure that. Otherwise why would anyone give us what we ask for? One measurement we have found that's quite useful for us is measuring the amount of data incidents that we have and therefore the impact that has on maybe a key process, maybe a critical process or that time it takes for people to recover from incidents so that we can convert our time into people hours and dollars. So looking at the negative impact of what happens when things go wrong,
00:35:04
Speaker
and can we invest in making sure things go wrong less often or recover or we recover them a lot quicker or the impact is a lot smaller from those. And we do that by encouraging every incident, every data issue to be treated as an incident and follow the same instant process that our software engineers follow. yeah we do this for we do this by default software engineering too like even if it's a small issue we create an instance for it we have a daycare stack channel we discuss it we often run a post-mortem process and dig into why that happened how can we prevent that happening in the future what can we learn from this that kind of instance process is great for like for getting people together from learning from mistakes, all those kind of good things. It's also great for collecting data around how often a data issue affected a key process. And if that's happening a lot, then you can get some investment to try and reduce that. And you can start maybe suggesting more radical, not radical, but more radical solutions trying to address those problems. Like, for example, shifting things left.
00:36:07
Speaker
signing more ownership responsibility to read upstream data, to source data, to some rating teams, move away from change date capture to date contracts. Those kind of things which are bigger projects, but you can justify them if you've got some data to justify this kind of good return on investment from those projects. I mean, a lot of this just sounds like you're just saying communicate with the rest of the organization, stopping the data team off in their data silo and just doing data work instead of integrating into the greater organization to drive business value via data work. Yeah. and And this is, I'm sorry to quote you, and this is, I listened to your podcast with Joris and Matt a while ago.
00:36:52
Speaker
And this is what resonates with me, you have such a healthy attitude towards data contracts. It's not about, okay, we're going to just, you know, you house like you say that, listen, this is a friction. Okay, this is a friction, but this friction helps us so to ah make sure that the ah team that produced the data takes accountability about this about the data that they know produce for us. And this is the baseline where we want them to take the accountability. And this is such a simple like explanation of what you guys do and where you had in with all of this in initiative of data contracts and then why. um My question to you is about
00:37:40
Speaker
did we trying to investment I'm still like, may I say I'm not satisfied with your answer. by you said like and I'm being inquisitive here, and I'm sorry about that. But um you explain that the way around, as I heard it. You explain like you measure the value of data and number of issues you have and how you treat these issues with data. like You actually measure the data if it's important with number of issues and if stakeholders want to invest.
00:38:11
Speaker
and to to To make it better, it kind of looks like other way around. I was expecting, and it could be my fault, I was expecting for you to calculate. How do you help, let's put it this way, the sales team to. you know, to have some kind of, ah to generate some revenue. And I was expecting you guys to say, okay, this part of revenue is actually associated with our data platform. And this piece costed you X amount of money. And this piece, basically the percentage from what you generated. Like, do you do this as well? you have Like, am I thinking wrong?
00:38:58
Speaker
and share this and and yeah Now, I don't think you're thinking wrong. I think that is another good way to try and measure your impact and measure your investment. I think it's difficult and it's quite mature way of doing it. We are looking a bit of that. So one of the other hats I wear at my organization Why is around thin ups and we're are you still relatively making the share of dazed this? in that. We got a majority, we got a way to go before we're really mature there.
00:39:31
Speaker
And then when you get really mature, you can start doing that because we're doing this feature, but this amount of revenue, and that's made up of these different components and comes down to the day platform, which is then contributing this much to that delivery. And that in theory, you can do that. It's not something we've done in practice, and it's something that we think is probably a year or two ago. It's an aspirational goal. I think the reason why I talk about incident is because that's the problem I was trying to solve with date contracts, the problem I've been trying to solve the last few years where I because that was the problems that were happening at the time. And we were, I mean, this is very common, and again, many of us have common problems in data. um We wanted to use our data for more and more important things, has strategy around that, a complete strategy that said, we want to use our data to drive revenue,
00:40:20
Speaker
through the use of ML. And this was a few years ago. These days, I always say we're so drunk up right through AI instead. who i am the same Same thing. um But we have this strategy. This is a company's strategy, right? and And we can live on that because our data wasn't reliable enough. And so for me to get investment that I needed to cut this It's not really about equality, but to try and introduce a new idea to the company around date contracts and around the potential transparency for this, I had the line completely with the company goals, and that allowed me to kind of get my investment.

Overcoming Resistance to Data Contracts

00:40:56
Speaker
And then I could prove why that was useful, why um a return investment was going to be on that by looking at the data around it, things like that. So that's kind of just where we came from.
00:41:05
Speaker
um You have to put it on yourself. But yeah, if you can measure it to outcomes, et cetera, this is fast. A great way to do it. And if you can do that, Ben, I'd love to hear how I've done. Literally not much. sure I haven't found a single company that has a framework that really works because so much of this... So if you look at marketing as an analog, I mean, really you come from the the marketing world. marketing attribution. Oh, marketing attribution is the hardest problem in the world. And that's why people just do last click marketing attribution of what was the last click before they bought, because it is so difficult. And what is value? Like with that, at least you know what what was what?
00:41:45
Speaker
you know somebody purchased, but with what we're talking about with delivering value via data, how much he is like how much value is created is very, very squishy. $100 million dollar of incremental revenue is completely different if it's very strategic and high margin revenue versus very non-strategic and very you know negative profitability revenue. and so like what is valued is also not always valuable, and what is valuable is not always valued. so like Yes, it it what you were saying is you got something that was concrete enough, but when we're talking about information around data work, the data around our data work is the squishiest data I have ever seen. and so yeah i kind of went When the data contracts thing really started popping off in early 2022,
00:42:37
Speaker
I was really confused because I was like, what do you mean people aren't already doing this? right like I wasn't that deep in the on the um ah software engineering side with eight with API contracts, but it was like, of course you have a contractor under API. like That just makes sense. and then you know But it was so revolutionary for data people that they were you you were just like, Of course, we have to bring this into this. And the way we're doing things, is it working? Why is everybody just doubling down on CDC and, you know, and and all this stuff? And it just was like, oh, wow, okay, I found a kindred spirit. But like, so much of what you're talking about as well is
00:43:18
Speaker
Again, how do we communicate our value to the rest of the organization so that we can get incremental investment? And and data people don't want to talk about that very often because data people want to talk when they're talking to other data people, they want to talk about doing fun data stuff instead of the instead of about our evil stuff. And so it's refreshing that you're doing that, but it's also you know I have not come across a single company that has a a solid framework for measuring the value of data work because it is, well, how much value did the the or they' the part of the business that did this put on it? And then how much attribution do we have from the data work versus the other work? And you know if they say, well, the data work only did like 5% of the work, then you just don't work with that part of the organization again. So you can kind of have that like, hey, we got to partner and talk about this stuff.
00:44:09
Speaker
so Sorry, Yuli, you had a point you wanted to make. No, I ah just want to argue with this. The statement you made that I didn't came across any organization that does a good work on measuring and ah how the team contributes to other a department so The reason for that is because this is the business of data team alone to figure that out, as marketing does, as sales team does. And now we are waiting for somebody to come in and say, oh, data team, thank you so much. If you wouldn't provide me this data, we wouldn't succeed. Now it's not happening. The reason why I'm having this conversation as a data team firsthand, me too came up with this framework.
00:44:49
Speaker
to show their value because we keep arguing that data teams have been so undervalued for years while serving all of the Artemis horizontally.

Demonstrating Data Team Value to Business Success

00:44:59
Speaker
The reason for that is you folks, not you in particular, but you kind of need to take the ownership of that and show the value and clean up the internal mess. Don't look at me like this. co I just don't... I've been a circuit here, and this is the reason because I don't think that every data team, and I saw that, no, no, no, I saw that data teams do not know for 100% what is happening in their data platform. And I think this is a problem.
00:45:27
Speaker
It's a problem, but I don't think the data team can say solving this problem for X part of the business was Y amount of money. I don't think that's possible, but I think you have to reinforce that part of the organization to tell you how much it was of value. This is what Andrew did with data contracts. Exactly. This is the same contract where you need to come in to sales team and say, folks, listen, there has been a while since we generated this data.
00:45:59
Speaker
Is everything is used? Okay, yes, now, what is the most valuable? Okay, this piece is the most valuable. Do you know how much it costs for us to generate your list? And just an infrastructure level, and you can do that for sure. Just attribute and how much those pipeline cost you, how much the storage cost you, end to end, from data platform down to the sales team. And you can say, okay, folks, we need to even figure grow out the way. What would you do without this data? NASA, we don't want to have 100% of your goals like what you choose. We don't want to have, but we want the credit for that. Let's figure that out.
00:46:35
Speaker
it's It's more qualitative. like I mean, Andrew, you said you're you're early in these days. Have you run across anybody that has? a quant i like This is the problem that I keep having is people want quantitative information because they're like, it's about data. Therefore, it should be a a number down to the cent. And it's like, it's a lot more squishy than that. are Are you finding things when you're talking to people about this? of I haven't seen anybody, you know the the closest I've seen was Vista saying they've kind of got a framework that they're putting in front of their business partners, but the business partners have to fill out all the information to say what that value is. Is that kind of what you're finding too?
00:47:12
Speaker
Yeah, I think so. I think there's certainly a lot of people in data teams want to be able to prove the value of that happen. And I think that's a good thing. I think that's why we talk about value quite a lot. It's probably more of a reflection of the times where budgets aren't what they used to be and be able to justify budgets a lot more. I don't think it's fair enough as well. So I think there's a lot of appetite to try and find out how we can find real data to support our value at generating. But it's very difficult. Very difficult to find real data. Often you find you use some sort of proxy that kind of tries to articulate this. But I haven't found this one that it's like, that I could just easily apply to my organisation, for example. It's like, you big can try and do things like surveys and things like that. how important was they are you Try and talk to a business like you were saying. And um ah you talk to a business and they're like, what's the business and try and get that out. But that's quite a lot effort and that's
00:48:09
Speaker
It's not, it's a proxy. It might might be, you might be good enough for what you need. So yeah, it is difficult. I haven't seen any perfect answer I've ever missed yet. And a lot of times data people didn't yeah a lot of times data to people didn't get into data work to deal with non-data work, right? Like data people just want to do the data work, but yeah. like I think whether you are an organization, you need to be, it's not just about I'm building this pipeline or I'm inventing the dual ticket, or whatever it might be. and Particularly when you get more senior, you need to think about now what is our team doing to drive the organisation goals in some sort of way, and sneak into that as best as you can. If you do that via great metrics, that really periodically determine investment.
00:48:53
Speaker
The company is investing in you as a team, as a person, and you're returning this. You can find a great metric for that, and that's brilliant. If you can't, then there's other ways to try and link what you're doing to the strategy, and that ah helps articulate the value of creating in those. Again, it's not doing data, but you should be doing that wherever you are, but particularly if you get more senior organization. If you're a senior, if you lead a team, if you lead a data org, you need to be thinking about what you're doing and apply that value to your organization. And that means, not doing some of the low-value stuff, which is like moving data from system to system, just moving data around because you feel like you have to decentralise it in data warehouse because that's what you do. That's not high-value work. But delivering some data that allowed
00:49:38
Speaker
someone on the board to make a decision about whether to invest in this product versus this product feature. It's hard to quantify exactly how much revenue that might add, but at least you've aligned it to a goal of the board and that is demonstratable at least. It's something that someone with power cares about. Right? but and that you you You get that and you go, do you want more of that? um but I have, hopefully, the last question to you, Andrew, because the plus at me. Is but this bad? Am I being treated with you? No, I'm sorry. lift yes is guy Okay. You will debrief me off the board. Okay. Okay. So my question is,
00:50:22
Speaker
Like I, again, I read the medium blog of yours. The evolution that you went and then things that you shared here at the recording, I recognize that you were looking closely, how did you help business goals? And I don't know how that I articulate this question, but do you remember like when you understand the exact point where you can basically helps the business? Or how did that happen in your case? Because what I understood, the framework, not framework, the pipeline was like, you understand how you can help and then you started to build up this welfare for decision makers, budget stakeholders, and then you got
00:51:09
Speaker
um Also, you got binding for investing in data contracts, you know, building up this and you got the support and eventually as partner ah prior um far as I you also had new team members. So it was, you know, like brick by brick, how did you initially recognize What is the right path for you to go to, you know, to build this foundation, to move forward with such a mature data point and everything. scene And I'm sorry for the song and this confusion. No, I made perfect sense. Yeah, it's a good question. And it was a journey. It wasn't like I woke up one day and have a speech and it wasn't all me. Like I had great support from my manager at the time. I had a really good PM at the time and she helped a lot.
00:51:55
Speaker
um But the journey I was on, when I first started here, I just built a platform in the same way that everyone was building it and changing the capture and things like that. I thought, well, that's what I was doing. That's great. It took me a few years of learning. It took me to have this space. We had quite a mature team, great team engineers. I was technically the team, but I was able to step away from the day-to-day to kind of think a bit broader. That gave me this space to figure out these things a bit wider, a bit of research. Um, and and yeah, I was pushed into that by my gym manager at the time, by manager at the time, and sports bar PM. And it was kind of gradual and.
00:52:34
Speaker
Yeah, I don't know exactly what particular moment was, it's just, I started noticing more about what I heard from PTO data people, our data scientists and our data engineers are saying that, I was in meetings with them every week and we kept saying, I've flown broke the other day and it broke our pipeline and it took eight hours to fit. I kept hearing this hearing because I wasn't doing much about it for a long time. I thought, well, that's a very bad problem. I just built that platform, that's where I can find, it's not my fault. But then I thought, well, no, how, I'm also hearing from the execs about how our strategy is to do more with data. I didn't seem to match up. And I eventually took the time to think about, well, how can I solve this problem for them? Maybe it's not my job necessarily. It's not my day-to-day job. I was going to take some time to think about how to solve this. I've got this background in software engineering. It feels like there's some sort of interface. There's some API missing here. i Maybe a contract and that's where it then comes from.
00:53:28
Speaker
um and And I took the time away from my day-to-day job, I let things go to allow myself some time to think about that kind of area that no one knows what somebody was thinking about. um So it was definitely a journey, definitely helped by some key people at that point in my career, definitely helped by the fact I had a great team underneath me who didn't need me to review PRs all the time. So I just great at that, I had great team engineers doing that. and great support from my manager and my PM who really helped. yeah Having a PM and her skillset really helped me not only identify the problems that we were having as a company, but also helped me improve my communication with different parts of business.
00:54:09
Speaker
um Again, I was quite lucky in my background. like My background is software engineering. I could quite easily talk to software engineer in rare terms, and we could speak the same language. I could also quite easily talk to the agent in rare terms to speak fair language, and data scientists speak rare language. I was quite lucky because my background up is cross-section, having these conversations with people. I spanned about to talk to people like PMs and CTOs and people like that as well and as I kind of matured and i grew as a person as well. So again, the theme for this recording video has been about communication and that's kind of where it came from. And as I communicate more with different people, these ideas came formalized.

Evolution of Data Contracts and Persistent Issues

00:54:48
Speaker
And over time, it became, their contracts became how we built this platform. And it's eventually a to where we are now, which is a much better place than ever before. You kind of talked about building a reliable platform, but you also have to be a reliable person.
00:55:02
Speaker
who is who is spotting those trends, who is saying, like we're trying to make improvements to make this better. It's not just, I see the problem, I fix the problem. it's I see the persistent problem, so we have to go to a higher level to actually think about how do we solve this persistent problem. because you know it's kind of the definition of insanity is to keep trying the same thing over and over and expect different results. And it's like, well, if the pipelines are always breaking, you just tell people to stop breaking the pipelines. Well, they don't know like what's but's downstream, but they they don't know. So you know that's the shift left. I've talked about this. You can't give people data ownership unless you give them the tools and the capabilities to own that data. Otherwise, you're just you know treating it like a hot potato, and and that's not product thinking.
00:55:50
Speaker
None. It's just, yeah. i love a lot of but like I love having conversations with you because you are one of those people who are popping it up to a higher level rather than just trying to do, again, data shift for data. like it's It's not, it's about... How do we solve problems? What are our persistent problems? How do we build something where we can communicate when something is broken? Because that's being reliable, right? Being reliable, your you want your car to constantly you know not break down. But if it does, you want it to tell you well how it broke down or that it's going to break down or that it's going to have some problems or anything like that. You want that communication aspect as well. And so a lot of what you're saying is just like,
00:56:34
Speaker
have some empathy and go talk to people about what their problems are. And their problems aren't specific to data. They're specific to what they're trying to do and how the data supports. that I just love kind of everything you're saying there. Oh, thanks. Yeah. You mentioned a good word about empathy, having empathy of people, making sure you're not blaming people. That's good. And it all comes down to communication again. And you mentioned as well, like, about being able to become a reliable person. I guess I build it up over time at my company. I joined it quite as I've become a mid-life engineer and I kind of grew there. um So I kind of built up that kind of authority, but wasn't just through doing good work and but having i a great team helped with that. But it's not like, um yeah, I'm not the smartest person on my team. I'm not the smartest person on this podcast probably. ah it don't um know i don't seem It's nothing um not, nothing special about me. It's just, them
00:57:26
Speaker
It's just taking time to talk to people, taking time to re-understand problems, understanding the why, to start with why is one of the ways I'm not complaining, and really digging down to why. It's not just why it's spending all the time, oh, because the scheme has changed. Okay, then fine, move on. I'll ask you why again, and why again, like the five why's idea. Well, why do that scheme change? Well, because the engineer is making a change to the new feature. Okay, why did that impact us? Well, because we're running on a database already, and when my scheme changes, it breaks us up. Well, why do we not expect that to change? um Well, I don't know, we should be. I keep asking why and why and why. Not just on my own, but like with other people, with engineers, with who aren't causing problems, they're not doing this on purpose. has They're not trying to cause problems with data people. They want companies to see too.
00:58:11
Speaker
um i write and reading banstein problems They're quite happy to to get involved in the solution. That's exactly what I was doing. um so big ah it It's nothing I'm doing special. I just took the first step of talking to these people. I built these relationships over time. I spoke to people. If I do it the next time, it wouldn't take me so long to do this. I i have no more confidence to build those relationships. I mean, what a lot of people in data struggle with is talking to engineering, talking to, didn't pass up business, talking to sales. I feel like whatever it might be. And that is, so yeah, it all comes down to communication again im em and then the tech. And then once you know how to communicate better and you can't build relationships, then you can solve bigger problems and have a great and much greater impact on your organization. And COVID didn't do great to us, I couldn't help at all. Yeah. Anyway, folks, it's a wrap.
00:59:03
Speaker
Andrew, thank you so much for joining us. Again, incredibly humble, incredibly smart and empathetic. Andrew Jones, the one and only, please check out his book because it's about having a common sense around data contracts. And this is what I love about Andrew, having just a common sense and being a human being when talking about data contracts. Thank you so much folks. Thank you much. Go ahead. Yeah.