Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
004 - Why is Data Modelling a "Second-Class Citizen"? image

004 - Why is Data Modelling a "Second-Class Citizen"?

E4 · Stacked Data Podcast
Avatar
332 Plays1 year ago

Is dbt lowering the bar for data modelling having a negative effect on data quality?

Data modeling is the cornerstone of data-driven decision-making. It's the art of translating a business's concepts, definitions, and activities into data structures. When done right, it empowers you to answer the crucial "why" questions by capturing the "who, when, how, and what" of a business. Moreover, it paves the way for future efficiency, reusability, and data consistency.

So, why do so many organizations still overlook the importance of a robust data modelling strategy?

This week, on The Stacked Data Podcast, I have the pleasure of hosting Rob, the Head of Data Product at Miro. Rob delves into the critical significance of data modelling, the common pitfalls to avoid, and shares invaluable insights on how to approach data modelling and effectively lead teams of data modellers.

🚀 Key Takeaways:

  1. Know      Your Critical Concepts and Attributes: Define and design them upfront,      ensuring alignment across your organization. Regularly revisit and expand      your list of conceptual definitions.
  2. Invest      in Ongoing Education: Constantly enhance the skills of your data      contributors. While not everyone needs to be a data expert, analysts      should grasp architectural principles, master their tools, and engage in      continuous learning. Rob, for instance, dedicates 5-7% of team time to      Learning and Development (L&D) activities.
  3. Maintain      Your Models Like a Garden: Regularly dedicate time to review, refine,      refactor, clean, upgrade, and promote your data models. This should be a      shared responsibility and part of your sprint routine.
Recommended
Transcript

Introduction to Stacked Podcast

00:00:02
Speaker
Hello and welcome to the Stacked podcast brought to you by Cognify, the recruitment partner for modern data teams hosted by me, Harry Golop. Stacked with incredible content from the most influential and successful data teams, interviewing industry experts who share their invaluable journeys, groundbreaking projects, and most importantly, their key learnings. So get ready to join us as we uncover the dynamic world of modern data.

Meet Rob Winters, Head of Data Products at Miro

00:00:35
Speaker
Hello, everyone, and welcome to another episode of the Stacked Data Podcast. Data modeling is essential to high quality and relevant data. So why have many forgotten the fundamentals? In today's episode, I'm joined by Rob Winters, the Head of Data Products at Miro.

What is the importance of data stack observability?

00:00:52
Speaker
Rob breaks down the fundamentals of data modeling, why it's so important, the typical pitfalls, and how to succeed in data modeling.
00:01:01
Speaker
We also talk about observability on your stack and your data, the huge benefits that it can bring to you and your team. I hope you enjoy our conversation. Hi, Rob. It's a pleasure to have you on the show today. I really appreciate your time. How are you doing? I'm great. Harry, thanks for having me. I'm super excited to be here with you today.
00:01:24
Speaker
But today we're going to dive into two really interesting projects. The first is about unlocking the huge amount of data that organizations have and how to effectively build out data modeling strategies. And then I think that also ties in nicely with data observability and data quality management, which I know you've experienced quite a lot in your career. So first off for the audience, it'd be great if you just give us a nice sort of overview of your career and how you've got to where you are today.

Rob's Career Journey in Data

00:01:53
Speaker
Yeah, absolutely. So Rob Winters, I am currently the head of data products at Miro. So my team here is responsible for data modeling, data ingestion of third-party data, observability, and quality management. But I've been playing in the data space for a long time, almost 20 years now. Prior to this, I was running data at Nivon Trip Actions when I was there. Prior to that, I was in consumer travel, gaming. I started my career actually as a data and financial analyst in telco at T-Mobile.
00:02:23
Speaker
So I've kind of been all over the space and I've had the chance to do it all. I've worked as a data analyst, as a data scientist, as a data engineer, an analytics engineer, as well as being a manager throughout most of my career. Amazing. So a really sort of broad understanding of the whole data lifecycle, which do you think that's been a benefit in your career and having that wider understanding of how everything fits into one another?
00:02:51
Speaker
I think it helps a lot. I think I'm also incredibly fortunate that I came into data backwards. A lot of folks come in, they start from a technology, computer science type of background. I actually started on the finance side of things. So actually starting from understanding how data is used to influence the consumer or influence the business, and then building deeper and deeper technology understanding has been, I think, a huge benefit for me throughout my journey.
00:03:16
Speaker
It's something that the guests often talk about is having that understanding of the business, right? Cause that's so important. And without that, it's, you know, you can lose some of that, some of the alignment between how the business and data interact. So great that you've come from that background. So.
00:03:32
Speaker
To dive in, Rob, every organization, every insight, every data product is powered by data.

Why is data modeling crucial and often overlooked?

00:03:41
Speaker
But one thing that we've seen is that for many organizations, they see data modeling as less of a priority or it takes a bit of a back step and they don't have a proper strategy to their data modeling, to their data warehousing. First off, could you break down what data modeling is and explain to the audience why it's so important?
00:04:01
Speaker
So I always find it funny. I talk to a lot of folks who are at the data space and I hear people say like, oh, data warehousing, it's like a dirty word in my company. I can't say anything about it. Or if I suggest Kimball, then they look at me like I've grown a second head. It's a lot like organizations who insist that they are processless. The reality is you always have data models the second you create data assets, just like you always have process, whether you choose to write it down or not.
00:04:26
Speaker
So I think it's important to realize if you're writing SQL, you're doing data modeling, or if you're creating dashboards in Tableau, you're doing data modeling. At the end of the day, all data modeling is just reflecting what's happening in your business, the definitions, who's doing things, the activities, everything that's happening and why it's happening and how it's happening. Data modeling is just taking the
00:04:48
Speaker
technical data assets and showing what those things are. I think that's sometimes lost when people think about it from like purely the technical standpoint, they get caught into tables and methodologies and et cetera, they miss began the intentionality and the why or what it's supposed to achieve. That's really interesting. So it's about thinking about how that data model is going to actually be consumed, which sometimes is lost, would you say?
00:05:15
Speaker
I think so in thinking about what the implications are of the thing you create that will be consumed. It's really easy to think about I'm solving this one problem and you focus myopically on a single point, a single problem.

Extreme Approaches in Data Modeling

00:05:30
Speaker
But when you're doing that, you're often kind of just scratching around the edges of a lot of tangentially related ideas or issues. And if you step back a little bit and think about your data modeling, then you'll often realize that there's actually a whole process you're potentially supposed to be representing, could be representing. And doing that unlocks a lot of reusability if you just bring that perspective to how you build the things you build.
00:05:56
Speaker
Yeah. Okay. Makes sense. So what are the biggest mistakes data teams and individuals make when it comes to data modeling? So there's two patterns of behavior I see sometimes if you take them to pretty logical extremes. The first is I am going to do six months of research and I'm going to design everything upfront. And I'm going to have every entity architected and every data flow and every
00:06:22
Speaker
T-crossed and I dot it. And I'm talking to a friend right now whose organization is going through exactly this. They're, I think, eight months now still into their designing phase of their data model before they can even start building anything.
00:06:36
Speaker
The other side that you see a lot of is, well, data modeling slows me down too much or having to think about these things slows me down too much. I just need to get this data set out for this dashboard today. I'll build everything as I need it. What you end up with, and you run into a lot of organizations and I've worked with a number of companies or advise a number of companies who have this, you end up with spaghetti.
00:06:58
Speaker
depends on B, depends on C, depends on A. Near-circularity of data or you end up with seven different definitions of the same concept and they don't quite match and you end up with stakeholders frustrated because things take longer and are more and more inconsistent.
00:07:14
Speaker
Two extremes both translate to really bad outcomes, again, for our consumers. We're not building this stuff just because it's fun. We're building it because we're supposed to create business value, insights, data products, data algorithms, AI in the cloud, all that stuff. So I think the big challenge for teams is how do they find that right balance? And I will say it changes not just team by team, but also problem by problem.
00:07:39
Speaker
You have to think both defensively and offensively about how you do your data modeling. Defensively, I need to think about what core concepts are that I want to execute on. I need to think about what might come in the future that will depend on this. Offensively, I need to get this thing done as quickly as possible so it's okay for me to cut corners as long as I remember where I cut those corners and I commit to servicing that debt and coming back to it.
00:08:03
Speaker
And that's one of the things I think about a lot when I'm thinking about data modeling and thinking about the problems I'm asking my teams to solve when I'm thinking about the models I'm designing myself is, basically, what's going to blow up and bite me in the ass later? And how do I make sure it's not a big problem? And what can I do right now to achieve results? And how do I find the right balance between immediacy and longevity and the things that I design?
00:08:25
Speaker
Really interesting. So it's a trade-off around that prioritization. I think one of the things you mentioned in there was sort of speed about delivering something now. Sometimes that's driven by the stakeholders and the business. They want something and they want it now. So Rob, what's your advice to help data professionals deal and manage external pressure on speed versus quality of work?

Balancing Business Pressure with Data Quality

00:08:47
Speaker
So I think this gets missed, I will say, especially with more junior folks.
00:08:52
Speaker
They see themselves as a service execution person, almost like a service desk. They ask me for data, I give them data. Data teams, data organizations, and data professionals, we are not service providers. We are collaborative partners with the business unit.
00:09:08
Speaker
right? Our goal and our intention is to bring our skill set to help solve the business's problems. And that's also important when we're working with PMs or we're working with sales leads or what have you to frame it that way. And I think that's actually where you create space.
00:09:24
Speaker
I very rarely have problems once I establish relationships with leaders in other departments to create space to do things the right way because they know if I say I need to take a bit longer, it's with good intention. And if I don't need to, I'm going to deliver as rapidly as possible. And I think that framing it not as like we do debt or we do great architecture or we deliver with the stakeholder wounds, but again, thinking about it, like how do I build a trusting partnership where I can also guide what gets done on what timelines that's really important
00:09:54
Speaker
to be able to create the right sort of space to make those trade-offs. So it's all about creating an environment and building relationships with your key stakeholders to help educate them on the trade-off that you're making and the why behind what you're doing so then you guys are aligned on what the appropriate solution is. For sure, yeah, absolutely.
00:10:14
Speaker
One of the things that we're keen to touch on, Rob, is the fact that data modeling has definitely gotten a lot more attention in recent times, many thanks to DBT.

DBT's Impact and Challenges in Data Modeling

00:10:24
Speaker
However, one of the biggest things that I'm seeing within the industry is that DBT has lowered the bar of entry for data modeling, which leads to more mess, so to speak. And how can organizations actually avoid this? And is this something that you've seen as well?
00:10:39
Speaker
So, DBT I think is probably one of the more exciting things that I've gotten exposure to in the last few years. I remember finding it in early 2019 and saying, okay, this is fantastic. This is actually something I had been building similar but not nearly as well thought out approaches to for the last few years before that.
00:11:00
Speaker
And I love the fact that it enables everyone to be a data architect. You abstract away the whole central data team concept or at least a lot of the work they were doing and you make it democratic and participatory and you let everyone
00:11:16
Speaker
adds to the definitions of the business. The velocity and the freedom it creates is incredible. It's like what we were trying to do with self-service analytics five, seven years ago, but bringing that into actual data engineering work and analytics engineering work.
00:11:32
Speaker
That said, I think a lot of organizations deal with this. It leads to enormous entropy in their data models. The DAGs get slower, no one knows why. The models get duplicative and they get into conflict. There are so many issues that emerged that we didn't have to deal with to the same degree when it was only highly trained data engineers building things. I'm not arguing to go back in time, by the way. Again, I
00:12:00
Speaker
Emphatically, I'm a fan of everyone can be a data engineer. This is how we used to say, everyone can be a data analyst. Well, yeah, we checked that box, we gave people things like Tableau, but now we can really do the same thing for anyone who can write SQL, can define a business. But I think it's really important that folks who are doing this, they, again, going back to, you do have to think about your data model a little bit. There are certain things you need to take steps up, right? One is
00:12:26
Speaker
There are critical definitions in every company. If you're B2B, it might be your contracts and your business partners. If you're B2C, it might be your customer and your subscription status just throwing things out. If you're doing transactions, it's orders. If you're in SaaS, it's subscriptions. But these are really core critical. They have to be consistent and they have to be right.
00:12:49
Speaker
I think this is where space for a central team is really important and space for governance is really important. You need to agree together on what these things represent and you have to constantly force the business back onto these common definitions. They might evolve, they should evolve, but you have to centralize it. One of the worst things that can happen and
00:13:08
Speaker
I've seen this in executive meetings, is the sales guy and the marketing guy, the CRO and the CMO walk into the meeting, and they have definitional differences on what constitutes an enterprise customer. The whole meeting is blown out because they're looking at two very basic concepts that don't match and they can't actually say which one is right. That's the thing we need to govern against. Probably just to ensure integrity of the business, one of the biggest things you can do.

The Need for Continuous Education in Data Modeling

00:13:38
Speaker
The second thing is education. Again, it's great to give everyone freedom. Freedom is you don't want to do that unless people actually know how to use the tools at their disposal.
00:13:51
Speaker
We don't give people keys to a car and just say, hey, you're 18, no, go have fun. You'll figure it out. We make them go through courses and we train them. If they prove that they're not a very good driver, we limit what they can do and we force them to go back to rehabilitation like training. I think this is really important. We have to take the time to teach people how to use their tools and we have to keep reiterating and rebuilding that knowledge continuously and driving points whole.
00:14:18
Speaker
One way I like to think about it is the way I've approached it in the past is onboarding should be a lot of sort of gradual training to get people comfortable. When I was training analyst, he is DBT, I was spending probably about 20 hours over the course of the first month invested just in teaching them, here's the data, here's the existing models, here's the tooling, here's how you use things, here's how you do discovery.
00:14:41
Speaker
And then on top of that, spending two hours a week for your entire time you worked in the data stack, re-emphasizing best practice and sharing and learning. And I think that's really important. I think a lot of people I speak with, departments under invest in trading. They're too busy
00:14:59
Speaker
delivering to know how to be efficient or effective. And I did some math once and it basically said that if I can spend five hours training someone and they spend 15 hours a week modeling, if I can get a 5% increase in efficiency, which for people who have never been trained how to use a tool is not very hard, it pays itself back in less than three months. So it really pays off to invest in continuous education.
00:15:25
Speaker
And sending someone not to a workshop once is not sufficient. I think the last point coming to what do you have to think about is these data models are a garden. And again, a lot of people are working with a different set of constraints. The goal is answer a question now and then be done with it.
00:15:41
Speaker
That's like plant to plant and then walk away and then just wait to see what happens. What happens is weeds grow in that garden box or in that planter. You have to continuously refactor and remove and monitor and pay attention to and fix. And if it's no longer important,
00:15:57
Speaker
then you get rid of it and you archive it and you remove it. But this is a continuous process and I think it's a responsibility for everyone who creates awesome as the responsibility to maintain. I have seen this with other teams where it's like, look, analytics can build, but data engineering is responsible for maintaining everything once it goes to production. That's not a sustainable exercise. Everyone has to feel accountable for the things they make. I apologize, I rambled a lot there.
00:16:24
Speaker
No, no, no. I mean, you touched on a few areas. I think the area around definitions is obviously really, really interesting. There's the semantic layer, which, you know, DBT and Google are fighting over really that space and trying to create a single source of truth within them definitions, which I think is so important, as you said, to not lose trust in your data team by the business, because obviously then you hopefully everyone is agreed on that, what that definition for that given metric is. And then
00:16:54
Speaker
I loved what you had to say about the continuous learning. It's so important. I don't think people should be given, as you said, one course to then go on and that is all the training they get. And being able to quantify that, I don't think I've heard many people quantify the training and translate that into efficiency of people's work and what that's going to cost on computing time and weather housing.

The Role of People in Successful Data Teams

00:17:16
Speaker
I would actually connect to, because you mentioned a few areas people are investing in software, does not make a data team successful or make a data team fail. Technology will not fix any of the issues organizations have. People are the fundamental value driver in a data program. You can have an incredibly valuable, impactful, productive data team with no CI using open source tools where everyone is just hacking away.
00:17:45
Speaker
or you can throw millions and millions of dollars at software and end up with 5,000 dashboards that all are in conflict and aren't updated half the time. And that's why I think education and just conceptualizing all these things as an outcome of how do you make your people successful, that's really what translates to a data program which is going to create amazing results for an organization. So
00:18:09
Speaker
Just to counterpoint on, yes, the space is always evolving technically, but I don't think the fundamental drivers of what makes teams succeed has really changed at least since I started my career. Yeah, so it's always about understanding their fundamentals. That's the thing that I think is key. The technologies can come and go, right? Yeah.
00:18:29
Speaker
So Rob, what do you know about data modeling now that you wish you knew sooner?

Empathy and Adaptability in Data Modeling

00:18:35
Speaker
Because I think that's something that for many people in their careers, they'd be really interested to hear and learn. So I have worked in a number of different organizations and a number of different data problems and platforms and domains and
00:18:52
Speaker
My fundamental tendency is to try to argue to be, look, my view is you should be dogmatic about things, you should have opinions.
00:19:02
Speaker
But you should also be constantly challenging those and adopting those. I did not used to be as self-challenging as I am today. And I used to think, well, you need to do it in this methodology. And it needs to be right in this way, going back to this whole rigidity versus flexibility. And I think that was a miss. I think it took a lot of, it made me hard to work with at times. It made me challenging to collaborate with because I was arguing from a position of,
00:19:30
Speaker
philosophy rather than pragmatism. And I think that's probably the thing I wish I was more adaptable about earlier on was, okay, build a better understanding of not just the problem I'm trying to solve, but
00:19:42
Speaker
the people who have tried to solve it before and why they tried to solve it the way they did. I think that's really, I always talk about being empathetic business partners rather than having stakeholders. I think that's something I did not do a great job at early on in my career and I still occasionally struggle with, but I think that's probably the thing that's most important about data modeling is you have to bring a high degree of empathy into it if you're going to be successful at it.
00:20:09
Speaker
Interesting. So again, it's that relationship with the business, which can be the driver for success.
00:20:15
Speaker
OK, Rob, so look, we've already alluded to the fact that the modern data stack is growing. There's lots of tooling in this space. I know that the growing stack and managing that stack both from a technical standpoint with the actual technology and the data flowing through it can be a challenge.

Observability in Growing Data Ecosystems

00:20:34
Speaker
Is it a challenge that you see? And what's your thoughts on observability and how you can create observability within your data stack and your environment?
00:20:43
Speaker
Yeah. Again, going back to if you democratize, you get a lot more stuff. If I look at early in my career and that's almost 20 years ago, I'm looking at there was a central data warehouse and you just worked on these core 200, 300 tables that were maintained by a central team that were updated once a day and you knew they were always right. If they weren't right, your dashboard being more was probably the least of the company's concerns because everything was wrong and the financial reporting was incorrect.
00:21:12
Speaker
So today, if I look at Miro, there's probably 10 new data products created on a daily basis. I'm rounding up a bit, but that's the space of tons of analysts. They're all creating stuff all the time. You can't keep track of it in that same centralized model. It just doesn't work. You have to find ways to basically create abstractions of the observability,
00:21:33
Speaker
discovery, governance, so that you can ensure that the things that people are looking at, they are reliable, that they are correct. Because the thing that I have seen in her data teams the most, it's not about how fast they produce stuff and are they delayed. It's not about
00:21:50
Speaker
how pretty are the dashboards or not, or how fancy are the algorithms. That's the shiny stuff. It's, can I trust what I'm getting from the data team? And in situations where the people feel they can't trust the outcomes, trust the results, then you might as well not have a data scene at all. So that's why this thinking about how do I create observability is so really fundamentally important in a space of modern data products and democratization of data products.
00:22:15
Speaker
So we started with this problem at Miro. So observability was a very real problem for us like six to nine

Ensuring Trustworthy and Reliable Data Products

00:22:22
Speaker
months ago. We started looking at how do we create trustworthiness and observability and data. And I think the first thing we did is we said, okay, we have to trust comes from two things.
00:22:33
Speaker
There's technical validity of datasets. You have duplicative primary keys, data updated, really basic stuff. For a lot of data professionals and data engineers, this is where they start and this is where they finish. Guess what? My dataset is pulling across and I've got data for today, so I'm happy and it's good.
00:22:51
Speaker
But the other side of it is really, you know, how do you create trust that it represents what's supposed to be or what's the truth, right? I remember many times where like I looked at a dashboard and I said, okay, this doesn't look right. Like I knew something was off. It just, I had an immediate gut reaction. And me as the data person doing that, you know, that sends me down a path of research. But like if a stakeholder says that, holy shit, that's a bad place to be in because they're normally right.
00:23:18
Speaker
they often have so much domain knowledge that they can smell invalid data incredibly fast. So that's why we're talking about domain testing. How do we capture business understanding into data observability? What is the expectation? Basically, in data science, we think about model drift. We should be thinking about the data product drift as well, and that becomes a critical part of being able to understand not just was it right when I built it, but does it continue to remain right or something fundamental or underlying changing?
00:23:48
Speaker
So I think that's kind of for us, how we think about creating observability is every data product, it has to be technically right, you know, it has to be updated at the time that people expect it, contract guarantees, volume guarantees, completeness, etc. But it also has to be domain correct, meaning if there's domain observability that's needed, we need to have that test suite in place.
00:24:11
Speaker
Why would a data team need or want good observability? What are you trying to unlock on top of that? I think it goes back to this whole idea of the garden almost. We have all these plants out there. You need to make sure all the plants are growing healthily. The boxes aren't full of weeds. They aren't getting scorched by the sun. The neighbor's dog didn't come by.
00:24:35
Speaker
beyond the plan to kill it. There's all these things that can happen and I think in our data sets is the same thing. Entropy is always growing in a system. Even if you don't do anything with it, an upstream API might change or a field might move in the schema or something like that or
00:24:55
Speaker
the marketing mix might shift and the sales team might not be aware of it. There's all these things that can change and every object in your database has cost. Cost in the sense of I'm maintaining it, cost in the sense of someone might query it and ask questions about it.
00:25:11
Speaker
The idea of observability is how do you abstract away keeping track of that whole garden? How instead of you looking at 1,000 or 10,000 or 50,000 tables every day or reading hundreds of DAGs or hundreds of data updates, how can you make it so that if there's something changing or unexpected, you're alerted first and you can act and react on it? I think that's the purpose of good observability.
00:25:40
Speaker
Do you think it's also, or does it also help with engineers being able to identify where breakages do happen and save on that engineering time and instead of trialing back and trying to find out where there may be a problem that you're able to pinpoint that straight away?

Benefits of Observability in Data Issues

00:25:56
Speaker
For sure. I think an example here would be we had this data issue a few months ago, and this was before we set up the full end-to-end observability on the pipeline.
00:26:05
Speaker
And basically, to explain what it was doing, it was taking in events, and then it was enriching them, and then it was doing mapping on them, and then it was doing a whole bunch of business logic to produce a product. And the results started drifting on the product.
00:26:17
Speaker
And people were convinced it was a raw data issue on the events and it was this thing and we didn't know tracking was failing. It was this and spent weeks kind of digging to various different points. And then we got involved like the data engineering team and we would basically stepwise back through the pipe and we were able to say, look, it's actually this sideways related data product that's brought in this one step that was causing data to drift because a field structure had changed in the inputs, but not in the mapping or the lookup table.
00:26:47
Speaker
Very small thing. You wouldn't even look for it in this impacted data product. But that was the root cause. And if we had had the observability we have today in place, we would have actually caught it the day the drift occurred rather than spending weeks researching this problem. So there's an instant sort of tangible save on resource and engineering time within the team as well. There's another reason why it's so important.
00:27:13
Speaker
For sure. It lowers stress. And one of the things I found is it's not a problem to tell people there is a problem and that you're working on it. It is a problem for people to find the issue and then come to you. And I think that's the benefit of this is the break would still have occurred, but it's much better that I tell the marketing folks, hey, this report is currently impacted and we're fixing it.
00:27:36
Speaker
just so you know. And here's the details of what we know about the impact so far versus them coming to us and saying, what the hell is going on with this data set? That makes perfect sense and maintains that trust in you guys as a data team. Exactly.
00:27:53
Speaker
OK, so it clearly sounds like you've implemented a successful sort of observability project. The podcast is all about helping people get that hindsight perspective and understand what some of their biggest challenges were within projects like this.

Shared Responsibility in Data Quality

00:28:10
Speaker
So could you tell the audience what some of the biggest challenges you come across within this observability space? Well, I think every organization is going to have different challenges.
00:28:22
Speaker
I can say for us, one of the bigger challenges we've had is there's an assumption that data quality is something about the data you ingest. And it's a data engineering problem to guarantee the quality. And again, I think where if you think about democratization of data products or democratization of modeling, that means the accountability for data quality also should democratize.
00:28:42
Speaker
It can't be that just engineers are accountable because you can have amazing data at the beginning of the funnel. It can be perfect. It can be on time. It can be guaranteed seven ways to Sunday. And then some person will use it in a model seven steps down and they'll have an illogical where clause and all of a sudden records will start disappearing or the calculation won't make sense.
00:29:04
Speaker
Literally had this the other day with someone where I was helping them solve something. They were trying to calculate a percentage score and they were colluding NAs in the denominator. And then they were saying the scores are broken and the data is small. So anyways, point is, democratization is I think the piece that really everyone has to take ownership for. If you create a data product,
00:29:23
Speaker
you have to feel accountability for putting quality controls in place. If you use a data product and you have expectations on it, your system or your environment should allow people to set their own expectations and then know when those expectations are changing. And I think that's one of the biggest opportunities as organizations go down this path is, how do you take not just the engineers, but everyone along with you as you build this stuff?
00:29:47
Speaker
very interesting. When you're building this stuff, did you guys Miro build this tooling and this observability yourself?

Miro's Approach to Observability Tools

00:29:56
Speaker
Or did you bring in, I know that I've seen there are lots of observability tools like Cifla, Metoplane coming onto the scene. And I suppose on that, where do you sit on that build versus buy argument?
00:30:07
Speaker
So we're much closer on the build versus buy. We used SODA as our framework. Shout out to the SODA guys. It's an amazing tool. So I think when you think about solving the problems, you're always thinking about, what is my prioritization goals? Is it speed, cost, integration into the tech stock, et cetera?
00:30:26
Speaker
What resources and budget do I have to work with? And what constraints do I have in the system? So for us, we were very focused on what, you know, we had engineers who wanted to work on it. We were not looking to bring on another SaaS partner and make a significant investment in our initial focus. And we wanted to integrate this into our data product definition in our data pipelines. So given all that, we really thought that building on top of the framework was kind of the best balance for our team. But I really want to say like,
00:30:54
Speaker
There is no such thing as one size fits all, and most software vendors won't try to tell that their thing will always work for you. But if you hear that, that's probably not true. You really have to critically evaluate what are you trying to achieve and what's the best path given the organization and the team you're working with.
00:31:12
Speaker
Market research, market research, market research. Exactly. Give your most passionate person space to try a few things. Tell them what you need to achieve and what the broader resources would be if you went into the space. Ask them to evaluate and come up with suggestions and give strengths and weaknesses. Perfect. I think that's great advice.
00:31:35
Speaker
If someone was about to start this project, what are your best steps you can take in order to ensure you have good observability over a data platform?

Steps for a Successful Data Observability Project

00:31:46
Speaker
So this was the first time I had really gone in depth, obviously done a lot of testing in the path, but trying to do a whole overarching, one observability platform to rule them all type of thing. That was the first time I really done all this. If I used to boil it all down and look at what we got right, what we could have done better,
00:32:08
Speaker
If I was to do it from scratch today, what I would do, I think the first one is really start from your current problem space. You're not putting observability in just for the fun of it. You're starting because probably someone has yelled at your boss or yelled at you maybe if you're going to have a data problem.
00:32:25
Speaker
So start with your current problem space, figure out what the biggest pain points have been, where you've been surprised, I think is the thing I always look for is what happened that I didn't see coming and what are the most biggest vulnerabilities? Where if this thing goes wrong, will everything else fall apart? That gives you your priority list to think about implementing.
00:32:46
Speaker
The second thing is, again, I always talk about training. I'm going to come back to it again. You have to train everyone who adds to data products what good data quality management looks like. A dbt null test on one column is not a sufficient data quality program. You need to define what good is, you need to teach people what good is, you need to teach people why what that should be is, and just reiterate again and again and get people with you. And then
00:33:12
Speaker
You've got your priority. You've got your skill team. You can start covering easy tests. The ones I always start with, every table should have a primary key. This is really small stuff, but it always makes life a lot easier. If you're expecting one record per user per day,
00:33:28
Speaker
You can use that to define a surrogate key. This is great, but you should know what this data set should look like. You should know how often it needs to be fresh. You get those basic tests in place, those basic technical tests, you'll already start catching problems and being able to be more proactive rather than reactive. Another thing is, and again, this goes back to stakeholders want it now, right? Your definition of what done is and what's good enough to deliver to a stakeholder has to change, and you have to change it through the whole organization.
00:33:57
Speaker
Adding testing does not add a lot of time, but it has to be part of this is now ready to be consumed. It's a place I feel a lot of organizations pay lip service to, but they aren't prepared to really commit fully to the way software developers think about the need for test coverage.
00:34:15
Speaker
The fifth point I'd make, and this is something we did, which I'm really happy about, is data quality is part of your definition of organizational success. It's a metric. It's a measurement. It's something you show to people. We started putting in front of folks, here's our data quality downtime. Here's our reliability. Here's a data quality incidence.
00:34:35
Speaker
And the team just appreciated the visibility, but it also highlighted that this is something we care about, right? You should measure this just the way you do like revenue, maybe not to that same degree, but you get my point, it's got to be a critical metric for your data team.
00:34:50
Speaker
So and i guess the last one it's like all things you need to treat it as a journey you're not going to get there in a month or a quarter you need to know where you want to be you need to be continuously evolving your vision of where you want to get to but pushing yourselves on that path and measuring your success against
00:35:07
Speaker
Am I moving closer to what my ideal state would be? And I think that shows it's easy to get lost and not see progress, but that's how you can create that visibility for people and that sense of achievement. Testing is not the most exciting thing to do, but it's incredibly impactful and you have to celebrate that people are making meaningful impact when they write it, when they create tests.
00:35:30
Speaker
Amazing. I think that's a really well thought out step-by-step sort of guide for which can at least set the framework for how people should approach something like this, which is clearly so important to the success of a modern data team. So I really appreciate you telling the audience that, Robin. That sort of brings us to the end of, I suppose, the topics that we're going to talk about. Thanks for listening to my long rambling responses to your quite own point questions.
00:35:57
Speaker
No worry. I think the audience are really going to enjoy listening to exactly what we're talking about. I think they're right at the cutting edge of what data teams are struggling with right now. So the final bits, Rob, is some quickfire questions. They're questions that we ask all the guests and are hopefully ones that the community can value from and help them further in their

Criteria for Evaluating Job Opportunities

00:36:18
Speaker
careers. So the first one is, how do you assess a job opportunity and how do you know if it's the right fit for yourself?
00:36:24
Speaker
So I'm kind of old and I'm kind of grumpy and I've kind of been around the block for a while. So at this point, I'm really appreciating that work is a part of my life and it's almost part of my identity. And given that, what I do is more important than
00:36:42
Speaker
what my job title is, or I've met people who jump from a team they're really happy in to a job they're less excited for a 5% or 7% increase, that's not going to bring me happiness in a job. And I really critically evaluate what will make me happy in work, and is this opportunity something that I'm going to get something meaningful from, and I'm going to appreciate my time in this role. So I basically ask myself three questions.
00:37:10
Speaker
If I do this thing, am I going to get better from this? Am I going to grow? Am I going to apply skills that I want to practice? I'm going to be challenged in ways I want to be challenged. Second is, and again, interviews are limited. And that's why it's so important that when you're interviewing, you're asking a lot of questions back to the interviewer and understanding the culture and the people.
00:37:32
Speaker
these folks I'm going to want to work with or spend time with? Honestly, are they going to make my life richer because I'm going to be around them and I'm going to get to spend time with them? Some of my best friends in life are people I've had the opportunity and pleasure to work with and learn from and grow from it. And some of the best growth I've gotten has been from people on my team. And then the last one is, can I make a meaningful impact on other people? Now,
00:37:58
Speaker
None of us, I don't want to say none of us, but very few of us are actually working on things that make the world a better place. We're not all working at non-profits. We're not working for the right cross or anything like that. We're working for corporations and we're helping corporations and shareholders make a lot more money. So I'm not naive enough to think I'm making that type of world a better place in back.
00:38:19
Speaker
What I look for is like, can I look back and I can say I helped these people. I made this impact on this group of individuals that I might work with, or on this company, I changed the course for the better. And I can look back proudly on that impact I had. And if I can answer those three questions positively, then that sounds like a great job. And if I can't answer the questions positively, then you can't pay me to take that job.
00:38:46
Speaker
That's so important. You spend so much of your life working. It's obviously a big part, as you said, of what we do. And I think it's so important to enjoy the people you work with. And I think that's the real key differentiator, isn't it? The amount of impact that you can have. I think that's something that we always look to help people understand in an interview process. And you're so right. Ask questions, many questions, so many questions that you should be able to walk away and know exactly what you're stepping into.
00:39:13
Speaker
It's a great, Rob. The second question, what's your best piece of advice for an interview other than asking a lot of questions? All right. So I'm, let me see at this point, probably interviewed well over probably about a thousand people, I would guess, you know, at least. So the things that stand out to me and the things that I think help people succeed.

How to Shine in Interviews

00:39:33
Speaker
The first one is, keep in mind, this is your opportunity to shine. Okay. And it's about you shining. So.
00:39:41
Speaker
Know how your work mattered and what impact your work had and talk about it. And this, by the way, also starts before you even interview the starts on the CV. Don't put down on your CV that you've participated in agile rituals. Tell me what impact you had and show me how you are so incredible because the thing is most people I interview, I know they are.
00:40:01
Speaker
That's why I'm talking to you, but they don't always bring that out in the conversation effectively. Know the starter method of talking about what you've done, but building on that point, you really want to be the best version of yourself. You need to be yourself.
00:40:18
Speaker
I'm not hiring an empty CV and five years of SQL experience. I'm hiring a person. So I'm looking to see who you are. And again, cultural fit is often a hand-wavy thing to basically hire people just like you. I don't take that approach. But
00:40:38
Speaker
I'm looking for people who, again, if I'm looking for people who will make my life richer, I'm looking for people who make the team's life richer. So I want folks who are going to bring capabilities, right? Like personality into the conversation. And I think the last point, and I want to come back and reiterate about asking questions, is an interview is not
00:40:58
Speaker
me buying your labor, right? It's not the interviewer selecting you. An interview is us trying to discover if we can work well together and achieve great things together. So it's not just ask questions, it's approach it as a two-way conversation. You need to find space to ask questions back. And a lot of interviewers are not skilled at this. So they hammer out these questions off a form for 29 out of 30 minutes, and then they say, hey, do you have a question?
00:41:28
Speaker
you need to ask your questions as you go along and you need to dig in. And if there's things that the interviewer mentions that you find interesting you'd like to understand, explore that a bit and ask clarifying and follow up. Use it as an opportunity for dialogue and I think you'll find more ability to understand if this is interesting. And quite honestly, you'll make a better impression in the interview as well.
00:41:51
Speaker
I think that's great advice for both interviewers and interviewees. Don't get stuck in that old fashioned rigid format of question, answer, question, answer by the interviewer and then ending it with your turn to ask a few questions. I think the flow of conversation shows that you're going to get on well and you can really explore each other's passions and what value you're going to be able to add and know if that value is needed. So great advice there.
00:42:16
Speaker
Final question, Rob. If you could recommend one resource to the audience to help them upskill, what would that be?

The Role of Soft Skills in Data Careers

00:42:24
Speaker
So the biggest limiting thing that I see when people try to make the move from junior media towards more like senior staff
00:42:33
Speaker
is they really focus on technical skills and they think that's the different share. How brilliant am I at working with Kafka? Or can I write an ML algorithm without using a library? Or have I used the newest or shiniest tool and they define that as like the basis and they think that's what's going to make an impact? And
00:42:53
Speaker
Sorry, it doesn't. And it's not that important once you start moving up the ladder. It's not that it's not completely unimportant. It's not completely unimportant. But what matters is social intelligence. What matters is how you basically, number one,
00:43:10
Speaker
how you can apply that technical knowledge to solving problems. So discovering the problems, understanding how you can help solve it. And then I think probably at least as important, if not more important, take people along with what you want to achieve and why, and how you're going to do it and why, right? This idea to kind of, and this applies for managers, obviously, but this applies for senior individual contributors as well. So I think you want to grow, you want to upskill,
00:43:40
Speaker
Take your nose out of lead code problems. I do them, too. They're fun. But if you really want to be able to create career success like that, again, that staff level or create that opportunity to be a leader, a manager, and focus on how do I better understand the soft skills that come into play.
00:43:59
Speaker
planning and prioritization, strategic thinking. Quite honestly, just how do you build meaningful relationships with the people you work with and build harmony in the workplace? Again, it is a social environment. And so I think there are great books out there to read if you're working in a multinational or multicultural team or organization. The culture map is great. I think How to Win Friends and Influence People is a classic that everyone should read at least once, if not 10 times.
00:44:27
Speaker
I think there are great books on strategy and team relationships, how to turn the team around, good strategy, bad strategy, team and teams. There's a lot of great reading material that even if you are an engineer, you should start reading this as early as you can and find opportunities to apply these skills in your day to day.
00:44:46
Speaker
I completely agree. One of our previous guests, Leon Tang, he spoke about the book never split the difference, which again is about relationship building and negotiation. And he said it's bringing them softer skills in which you have just reiterated. And I think to ascend the ladder, them softer skills are so important. And the earlier you start practicing them, the better you will get because they're just reading a book, you will not learn. It's the application, right? Yeah, exactly. And I think if you look at it, so
00:45:16
Speaker
quite openly, we all have to manage our managers to a degree, some of us very little, some of us a lot, but that's always part of it, right? You think about it, your relationships, you're managing up sideways. And if you're a manager or like a staff engineer down
00:45:31
Speaker
the term, but to lower levels as well. You should start practicing these skills because it will also make your work experience much better because you can get your manager in a better place for what you need. Yeah, I couldn't agree more. Look, Rob, it's been an absolute pleasure. Your insight, I'm sure, is going to go down an absolute tree. I know I've thoroughly enjoyed it, so I really appreciate your time, and yeah, hopefully we can speak again soon.
00:45:57
Speaker
I'm absolutely thrilled to have had this time to chat with you, Harry, and I really appreciate you inviting me onto your podcast. No worries, Rob. Well, that's it for today. Thank you everyone for tuning in and we'll see you again soon.
00:46:11
Speaker
Well, that's it for this week. Thank you so, so much for tuning in. I really hope you've learned something. I know I have. The Stack Podcast aims to share real journeys and lessons that empower you and the entire community. Together, we aim to unlock new perspectives and overcome challenges in the ever evolving landscape of modern data.
00:46:32
Speaker
Today's episode was brought to you by Cognify, the recruitment partner for modern data teams. If you've enjoyed today's episode, hit that follow button to stay updated with our latest releases. More importantly, if you believe this episode could benefit someone you know, please share it with them. We're always on the lookout for new guests who have inspiring stories and valuable lessons to share with our community.
00:46:54
Speaker
If you or someone you know fits that pill, please don't hesitate to reach out. I've been Harry Gollop from Cognify, your host and guide on this data-driven journey. Until next time, over and out.