Introduction to Yola and Jason Tularoo
00:00:00
Speaker
Hi all, it's Yola from Straight Data to Talk. And today I'm lucky to have Jason Tularoo joining me from Detroit, right? Yeah. Okay. Jason, thank you so much for stopping by. ah Please jump in, introduce yourself, and then we'll kind of jump in while we are here today and what we're going to discuss.
Jason's Background in Data Engineering
00:00:25
Speaker
Yeah, so thank you for having me. It's a real pleasure. We've kind of circled around this for a bit, so I'm glad that we finally like found the time. um But yeah, so my name is Jason Tularoo. I'm currently a manager of data engineering at Corewell Health, which is one of the largest hospital systems in in the or in Michigan.
00:00:47
Speaker
im And And that's a pretty cool gig, but also new to me. ah My traditional background is in marketing analytics, where I worked at mr MRM in the past. And I've also done a stint at Zwift, which is a very cool cycling video game. ah So ah seeing a lot of different experience across ah across my time. And so I look forward to sharing that with ah your audience. It's fun because also it means you have so much experience to share with us.
Challenges in Data Governance and Migration
00:01:18
Speaker
And one of the topics that we actually touched on previously and and have a very you know live like lively ah discussion is data governance. And today, ah you also mentioned that you guys are going through a sort of mi immigration and rebuilding lots of stuff. And you see that as a great way to leverage your governance.
00:01:44
Speaker
And it also feel like feels like it's way too much going on at your organization today. So yeah, tell me please, how do you guys navigate and and how do you even sell it to your leadership? You know, so many projects at once.
00:01:59
Speaker
Yeah, so I think it's important to kind of note, like I ah was blessed early in my career to go into an organization that like had really, really strong governance. So like really strong table naming, really strong like everything already built into their system. And that kind of like drives how I think about this today. But in the modern tech stack, you know you're never inheriting that perfect unicorn like I did in my like very first role there at MRM. um So you're coming into an organization that has legacy on-prem or maybe they've built things fast because they're a startup and they just like have had a lot of different architects or cooks in the kitchen and made things work but then they are you know maybe they've gone through new funding they want to
00:02:44
Speaker
they need to meet i you new new goals based on investors and they're looking for that data edge to
Communicating Governance Importance to Leadership
00:02:51
Speaker
it. And I think this is really where like governance comes into play ah because your spaghetti code within an organization only works for so long. right So I think the the first thing about it is really, ah and the hardest is is really like explaining to your leadership how broken your process currently is. um and And then also working with to understand like from a stakeholder standpoint, do you trust this data?
00:03:26
Speaker
I'm pretty honest, not just with my stakeholders, but people on my team. but like If I don't trust the process, I will i will say as much. right And so if I want the organization to be a data-driven organization and I want them to reach their goals, we should feel comfortable with that.
00:03:45
Speaker
right we should As the data engineers in this case, we should be building an architecture that will let us like sleep easy at night and ideally not wake us up with lots of PagerDuty notifications.
00:03:57
Speaker
so Well, this is ideal world and like I had a talk today actually with one of our customers. ah They have a pretty
Simple Governance Solutions without Heavy Investment
00:04:07
Speaker
decent stack. Everything is modern data stack, of course. And we'll be talking about governing, not even like full scale governance, but covering their data products.
00:04:18
Speaker
And there is a ability in google crowd analytics hop and i was suggesting them to implement it is free of charge. um And you know i kind of. Explain to them why would they need it because also this is the request from the team that i was getting like hey you do you know how we can cover things easily.
00:04:38
Speaker
and like I saw that as a nice step without having to implement any solution without much of fast as I saw it. and In hand the basically, the decision-maker goes from the technical team. you ah Do you realize that we kind of have this pool of tasks as every department, right?
Understaffing and Governance ROI Challenges
00:05:02
Speaker
and This sounds so appealing to us, but I'm not sure if we have capacity capacity to do that.
00:05:09
Speaker
And this being like, I appreciate it and there's it also means a lot to me. They've been honest about their priorities, so on and so forth. But also what I saw, every every data team is understaffed. Like how do you pull it through this kind of projects? Because governance have no clear return on investments in years. Yeah.
00:05:37
Speaker
Yeah, you're spot on. I think it's all about tying it back to that that trust. right So if you're dealing with CFO metrics with your CFO or your finance division and there's information going to investors and boards and stuff like that.
Trust in Data for Decision-Making
00:05:54
Speaker
and You're sitting through those ah and you're the data team and you're sitting through those meetings and like this stuff doesn't resonate with with you or you don't know where that came from.
00:06:06
Speaker
like that That is all the action, right? like that's where That's where it becomes like, you have to go back to your leadership, your stakeholders and ask them like, hey, how are you comfortable with these manual processes? Are you comfortable with the data that's going on? And I realize that we have these tasks right now to do this, but ah to do this new dashboard, to drive this innovation, to build this new model from that end. But if we do it with the data that we have today,
00:06:37
Speaker
Here's my concern. I feel like we're going to do this today. Let's say we but call it a governed like data set. right And we try and go through all the all of the steps that are necessary like through a governance framework.
00:06:55
Speaker
But we aren't really intentional about going all the way back to the source and driving through the process every single step of the way that we're covering governance and making sure that it's governed all the way through. I can give you your metric, but when you rather have it the other way, yeah, it's going to take away from the tasks that we can do. Maybe we'll have to pull away capacity. but isn't that worth investing in? ah what if i And then like to couple it that way, what if I could also say that the next time you ask for the same request, I can get you the data even faster than last time?
00:07:37
Speaker
Well, this this all I'm trying to say it's it's also the leadership being able to comprehend this level of, how do you say, justification.
00:07:53
Speaker
Because it's it's not like, oh, I can give you this this data next time and in half an hour instead of two days. Well, well you know it's for the investment you're asking. up from like the the The project can cost you two years, 15 people, whatever, right? It's not really a good justification.
00:08:13
Speaker
I'm not saying it might not be enough, but the person on the other end from the leadership, they also need to be literal. They need to have some level of literacy to understand the importance of this project. So how do you align the business goals and the governance? Like how do you tie it together? I guess this is a key also. so Yeah, I think it really comes back to like asking them what they're getting today and like and whether it like fits. right um in In many ways, it does
Convincing Leadership and Team on Governance Necessity
00:08:51
Speaker
it is a coin flip or jumping off a cliff. like Essentially, if I build a process and I will, and I'll document it out and tell you all the reasons that you have to do this, it's not just the leadership team that's the problem or
00:09:06
Speaker
ah not the problem, but the opportunity for me in this case. I also have to sell this within my own team because then I'm going back to my engineers or my stakeholders and saying we're doing things wrong and it it's a risk, right? So I don't think it's and don't think it's just one one cliff for one ah one or conversation. It's a lot of ah It's a lot of detective work. You really have to like understand what are the gaps and it takes time. You have you have to really have the opportunity to so take a whole look and say, like if I'm looking at this pipeline,
00:09:45
Speaker
Where are the gaps that I don't understand? Like, how does this thing change here? Not even and looking at it from a perspective of like, what alerts are we getting? Or what are the failures? Because everything would be perfect, like with the pipeline, perfect from like a yeah a current setup. But yeah.
00:10:05
Speaker
What? What's missing? But out of the sudden, you're just missing like, I don't know, key, like primary key. But the pipeline functions perfectly. Yeah.
00:10:17
Speaker
Yeah, or a stakeholder emails you overnight and you get this like fire drill right and that you're like, oh, I didn't even know this existed. Or ah maybe you sent an extract externally once before and you never stored the logic for how you did it because it was just done off the side of someone's desk. right So it it really becomes this whole chain of uncomfortable conversations about how does data move within our organization? What do we trust? Where are we today? like where where Where does our infrastructure stand? And like what's our baseline? And then how do we move forward? right How do you tell this story like incrementally of what you do? My approach is typically just that. it's
00:11:10
Speaker
Where are we today? What's happening? Audit, everything. ah Understand root cause. So looking at it like a consultant inside your own organization, but then coming out with a plan of saying like, okay, here's what I understand. Here's where we can go. And so in many ways, it really comes down to, you know, where are the silos?
Complexities in Data Traceability and Lineage Skepticism
00:11:34
Speaker
Where are things breaking? Do I trust that I'm only bringing this data in one time?
00:11:38
Speaker
If not, how can I really say that I'm governed in that way because then I don't have a single source of truth or a single domain that I can tie this back to like from that? And what happens if I have KPIs that are using these two disparate data sources that also have this overlap? How do I have that traceability? And how do I think about observability or transparency of the data from that end? How do my end self-service stakeholders feel about that?
00:12:07
Speaker
So, so here's a question because a lots of, and and this is a genuine question for me. And basically I'm here, like I wouldn't to represent Julia, but also I represent Julia from us had simultaneously. and ah Yeah. But listen, lots of questions that you mentioned to me can be solved here. Data lineage. Yeah. But it doesn't mean that it the data is going.
00:12:32
Speaker
It doesn't, and it also, this is the this is the best part about data lineage, right? Which is, how can you be sure that the lineage that you're looking at is right? Did you set standards for metadata? Is there stuff that's missing? Are there permission gaps? Then maybe something could not be tracked like through this. Is it entirely all there, right? So I've been an environment i haven't i've been in environments where you can pull out of backend databases. You can pull off on-prem databases where we our cloud infrastructure and data lake like covers that stuff, but other people have access to it. And how do I know that they're not just going to the the engineering team that owns that microservice and saying, I need this data. Run this query real quick for me. um So it all comes down to like your lineage is only as good as metadata.
00:13:31
Speaker
That's the metadata as your naming convention, as your entire structure for how you think about this. So like my approach is typically from the standpoint of like, okay, let's look at every table we have. And then who owns that data?
00:13:48
Speaker
But listen, again, again, Yulia, Yulia from masshead kicks in. I'm trying to hold her back, but what do like how did how is it possible if our crimes have one 50,000 tables, the other one have 40,000 tables? How do you go through every table? It doesn't make any sense.
00:14:08
Speaker
Do they only use that setup? Like, uh, is there not a single query that you can run that will give you every single table and then you have to go through the manual sleuthing of, but okay, what's the table name and who owns this? And they're letting out 57 tables. It will take them forever. I, yep. I've done it with, I've never done it with 50,000, but I've done it with 10,000 before.
00:14:37
Speaker
And then this is where you get into, well, so I think an important thing to note here is I'm assuming that we're talking about a medallion architecture or some kind of lake house format here, like from that end. So then you're breaking it out into like, okay, golden table here and then looking at it introspectively from like, okay, this is bronze. Does every bronze table.
00:15:01
Speaker
is used the downstream. All the same. Like, are they all built the same way? So then this is where I get into, OK, governance by each layer. um So how do we uniformly bring in all data in into Bronze? Can we control that? Do we have full observability on that? Do we use the same orchestration tool or have the same similar logging, like, for all of these things? No? OK, start there. um Then is Bronze all the same?
00:15:31
Speaker
then if not, like what are the gaps in the process? What are some of the biggest things? Is there a team that already uses the process that you want or do you have to start completely from scratch like in this? So in, I've been on teams where they, one team centralized owns everything. And so it's pretty easy to be like, Hey, we're going to stop doing this way. And we're going to go in this direction. And you just like, you give up a few sprints and you can go that way. But what happens when.
00:16:02
Speaker
You don't own everything. You have a centralized like platform team and then there's ah you know business-level ah data engineering and work that's happening too, it becomes a lot more difficult. you out That sleuthing and understanding of the processes is more painful ah from that end.
00:16:25
Speaker
This is so interesting and I am i actually can and agree on this pain point because so just the case that we came across, you know, data mesh, everyone wants want to have it for different domains.
Data Mesh Implementation and Governance Impact
00:16:40
Speaker
And and it's really interesting, right? It's it actually makes so much sense, but the realization is so difficult, especially given today's education in data space for teams beyond data. Even even software engineers, they are not into into data. It's a different party. So the case is ah they've they've been creating separate ah data products fully automatically place through their a data platform, their separate GCP project, which means this is 100% isolation.
00:17:18
Speaker
So what they ended up with, they have like 18 plus different GCP projects, that provision that they continued, you know, supplying and compute for those data products. And the problem they ran into because they realized like, well, for their early small organization, and that and I don't remember how many people, maybe 200.
00:17:43
Speaker
800 data product, not 800, but 80 data products is kind of a lot. So, they because they didn't have the governance of the data products in the end, it was all distributed through different TCP projects. They also couldn't understand which projects are kind of overlapping, and not projects, but products are actually overlapping and somewhat answering the same questions. Exactly.
00:18:11
Speaker
yeah so This is a tricky question and maybe you can explain to me. So does the data mesh make the governance harder in the end of the day or it's a technical way of setting up things? Like does data mesh help us to know?
00:18:34
Speaker
and for the all So my goal is always to end up at data mesh, but I will never ever say the word data mesh. ah because I like to back my way into data mesh by building a system that gets me there from self-service. But also my approach, ah this is controversial, my approach also ignores business logic and analytics. From a foundational standpoint, I think you keep business as usual going until you can assure that you have everything aligned to ensure that like source of truth architecture from say your source bronze ah in other layers. Like I typically prefer actually using like an intermediary layer between bronze and and silver as well. And then to silver and then
00:19:33
Speaker
then you tackle gold as a totally separate thing because it's it's typically like bringing together dbt practices, different ah developer feelings of our own CTEs and stuff like that. you know So that's much more difficult to get into. um And that's also where your pain comes in of like Which definition in the indicate, do you have a KPI framework? And if not, which, which definition of the KPIs like matter do, are there other ways of getting into those KPIs? Like, do you have a self-serve? Do you have like a tool, a streaming tool that, uh, allows for gooey visualization of your data that isn't governed, but then helps you answer questions early on that people are drawing insights from. And then you have to build a.
00:20:20
Speaker
a tool framework based on decision and impact in the business of like how what's the scope of your decision that you're making with this low-quality data source. okay like Here's the level of decision that you can make using this tool, but here's where you have to get the data team involved. so like Data Mesh, yes, is the goal. Part of my governance like driven architecture approach is to make everything self-service. You have a new tool? Cool.
00:20:47
Speaker
Maybe we'll use an air bite. Maybe we'll use a five-tran. Maybe we'll use a router stack, something like that. How can I build a config-driven approach to turn that into a GUI or a front-end, like where I can have a ticketing system where you tell me what that is. We go through, it pulls through your data. The data then goes into another system that lands the data into your bronze layer and so on and so forth. But it's all governed. The metadata is all captured in one place. And up until your silver layer, your code is all done step by step by step, and you can build in your data quality all the way through the framework. But that what I just explained takes a lot of effort, a lot of executive buy-in, and a lot of also hemming and hawing from your your team, right?
00:21:39
Speaker
No, for sure. That's a lot. That's a lot. Can I ask you a question? Because we're starting to talk about the tools. ah Do you think there is a solution that can help, um not to shift governance, because this is a really good question to ask, but kind of help you covering the data better?
Improving Governance Beyond Naming Conventions
00:22:01
Speaker
what What do you think? Is it possible to govern data better than naming? What you told, this is basically a yeah homework to be done at the organization. just Think about architecture, think about naming, think about code, ideally use repositories like all of the press all of those processes in place. but So i think ah I think most organizations have it. They just don't know that they have it and they don't use it the way that maybe I think about ah this tool. But most organizations have a data catalog. And I've used a number of them yeah like from
Effective Use of Data Catalogs for Stakeholders
00:22:45
Speaker
that end. But like my my goal is always to think about it from the end perspective of a business stakeholder.
00:22:52
Speaker
If I'm using your data catalog and I want to know what x, y, z, I work in healthcare right now, so number of patients per location is important, right? like Or in hospitals like or department census is what it's called, right? So like let's just use that. If I'm an executive and I'm in my data catalog, can I easily find that KPI? And and ah and right and so my approach is always to use that as your case. like Here's my tool. We already pay for this. Why can't I answer these questions in this tool? um And that frames your entire architecture around this like central point, which is not that it matters whether you're using Python or JVM or
00:23:38
Speaker
Spark or Snowflake or Databricks or any of these things, all of that stuff is interchangeable. What really matters is the intention of how you build this stuff through. like In my approach, you could use any of those tools You just have to be really intentional about where you're doing it and how you're going to get to that end goal. And so long as you think about it from the end standpoint of if my goal is to have an executive be able to in one place see all their KPIs and I have faith in that data and I don't have to get a nasty gram email from them about like why this didn't work, then
00:24:17
Speaker
How do I set that up? I'm not talking about a dashboard. I'm talking about raw, just being able to query a table. like Do a select star on an end-to-end table that just says, like here's my aggregate by location, yesterday's data.
00:24:35
Speaker
I mean, we like is it even right to optimize? I know you pulled that as an example, but this is also very much the case. like We got frustration from C-level because they cannot understand some data solutions or they cannot understand data. is It doesn't mean that there is something necessarily wrong when they email you. is It sometimes means they cannot understand it. and i think I just don't feel like it's correct to optimize everything for them.
00:25:04
Speaker
i mean near do i I think that's a really great point. I just want to pull on that thread. Yes. And I will say yes. And my point in doing a lot of this is not to have stakeholders fully drive this, but to open up the and be able to eliminate the pain points of exploration that analysts have and that data science teams have and make it so that they can do greenfield work of, what's a business question I'm interested in?
00:25:36
Speaker
no one asked me to go and solve for. But is there an interesting EDA that I can do here to maybe solve for this problem? Or how do I want to add to a product? So I think like this isn't limited to data scientists and analysts either. As a data engineer, I am always thinking about ah upstream and downstream use cases for data. um Some of the my favorite collaborations from a governance base have been with software engineers where we've rushed to get a product out and the analytics or the so or the you know microservice around that wasn't built in an optimized way for how we could log or answer future
00:26:22
Speaker
problems for us or even build future data science ah like personalization problems around it. So then it's going back to the software engineers and saying, like hey, I realize you guys were under the gun here, but can we reopen this? Can we talk about like i don't like can we talk about I don't need you to aggregate this stuff for me upfront. I don't need you to give me anything other than just like what happened and like we'll figure out the use case for it. But I need you to get more granular with me. um You mentioned this earlier and I wanted to go back to it. So it's a good opportunity. But like my favorite conversation with software engineers is like, who's responsible for building schema into data? And where do you enforce schema? Are we talking about data contracts?
00:27:08
Speaker
Yeah. like data kind like Well, that's all a part of of this, right? So like ah so like I really appreciate just kind of a flexibility around it. So i I don't necessarily care about enforcing data contracts.
00:27:26
Speaker
early yeah if I don't have to, right? um Do you realize, do you just realize if I'm going to cut this into the LinkedIn video, how much heat you're going to get? What's one? I mean, I'm old years.
00:27:44
Speaker
Yeah, so like i've I've tried it, right? like i think like Let's say there's a feature like upstream of you and like a team you have a microservice that drives like a payload and you have a you have a really hard-enforce schema enforcement like on your source.
00:28:00
Speaker
Okay. Isn't it so much easier to just not care about it at the source layer? Isn't it so much easier to build your governance structure so that like, cool, their payload changes because of this new value. Let me just grab that, take what I need, ah load everything. Yeah. Only push forward what I actually have a business use case for. Maybe I never even got requirements for this and they're just updating it. And all of my stuff still works. Why do I care?
00:28:32
Speaker
ah How about the use case? Like, listen, I am, I'm into your use case. I got it. And I um i understand it. How about the use case if they not just add you a value, but also change the, um, yeah, type of the field for some of your, you know, very important field. I don't enforce type until later in the pipeline as well.
00:29:00
Speaker
But this is so much like also when you're in forest type, okay, you do that downstream. This is also takes time, you know, from, from the debugging perspective. Okay. It came through, we don't know how many you know downstream transformation there were until it picked it up. And then you would need to kind of go back and it can take a little bit more time than if you would have that, you know, on ah borders, sort of say.
00:29:30
Speaker
It can. Yeah. like I'm not saying I'm right. I'm saying that like in a high velocity, like eventual consistency scenario, you have to make these tradeoffs about how much do I really need the streaming value? and like what what services are dependent on that being accurate or like needed right there versus like what's driving downstream insights that are dashboards or like ah you know reactive decisions that we're making. When you're dealing with things, like this is a great call-out, right? When you're dealing with things like optimization or A-B testing and stuff, like yeah, you can't be as flippant as I'm being about these like things. You have to have the values that you need. But I still
00:30:14
Speaker
would say that you also have to be intentional about what happens when you change those values? Or can you build the type values and hierarchy within the, say, upstream microservice ah setup so that you have a reusable format for when things need to change so that you don't have to deal with schema change?
00:30:39
Speaker
So can you wrap the type column like into that thing and then have them build hierarchies in that where you deal with the complexity of the hierarchy when you need to, but not when you don't? I mean, it's just for me, just adding complexity at this point, uh, in a way, but it's not, not a bad thing. Like if you guys are aligned on that, shouldn't be a problem. What I would suggest that this use case, I would have the,
00:31:10
Speaker
alerts about those ah changes on the border, like it doesn't prevent you to get the, uh, continue getting data, but it actually alerts you about the current state, what happened. And this is basically what we do. This is that we're alert. Yeah. Spot on. So, uh,
00:31:29
Speaker
I'm not saying you don't have a schema registry or that you don't like detect like what the payloads or enforce governance of what the payload should be. I mean, I'm just saying that you should build pipelines so that you don't necessarily care when they change. Because I think you as a you have to, from a business standpoint, have to raise the question of like,
00:31:55
Speaker
ah If this upstream value change, but it doesn't impact my data, do I necessarily care like from that end? I think this also, you know, gets into a nuanced conversation as well around like.
00:32:08
Speaker
ah If you have data products, like you probably have a dev and test set up, but like in many environments, like traditional business, like maybe you don't have dev and test or you don't have a good way to get like artificial data when you need to test things.
Artificial Data for Testing in Business Contexts
00:32:23
Speaker
Or maybe that's a data team responsibility for building that stuff to to go through and having a good test framework. So I think it's just ah you know it's just a matter of like organizationally how you're set up as well.
00:32:34
Speaker
right ah And it depends on the business. like Do you really need to have that artificial data to test things? you know Some business can run. Nobody dies from this data. so they can step They can test it on the goal, honestly.
00:32:48
Speaker
Yeah, i think that I think that's the question, right? Like ah for data tolerances, especially when it comes to like migrations and stuff, ah i I really always push executives who are like, it has to be exact of like, does it really? What is the difference between me doing plus or minus 1% versus 0.001 that you need? Like what second of page?
00:33:17
Speaker
This is, this is like my favorite, favorite topic to talk because sometimes achieving like 90% of data reliability is cheaper than achieving the rest of the 10%.
Cost Justification of High Data Reliability
00:33:30
Speaker
Is it worth it? Like you have to, I mean, not just the data team, because normally all of the people of data team were bored dead.
00:33:38
Speaker
Our data is reliable as much as we want it to be, and there is acceptance rate. Your tolerance rates do that. But we need to communicate that to stakeholders as well, like folks. We are not curing cancer with it. Yeah, we are trying to do the best.
00:33:58
Speaker
This will be really controversial, but I think aside from like completeness checks or non-null, et cetera, I think like the most critical like highest value check that anyone can put into place is a freshness check. like Do you youco you want the insights?
00:34:16
Speaker
And I'm going to give you the insight publicly. Yeah, insight publicly. So we have dataplex integration with Mass Hat, which is data quality a solution. and and And the reason for that is, you know, lots of clients were asking, oh, you do this just anomaly detection, which is freshness and volume for all of our tables. But how about if we want to do some maximum mean values or ah uniqueness and reintegrated with dataplex?
00:34:46
Speaker
Now ask me how many clients of ours have that any of the complex rules running? You got the answer. You got it. Because this is like when it comes to actually implementing and checking something, you need to put your mind into it. It's not just blindly sporting something really automatic.
00:35:08
Speaker
this all look Yeah, go ahead, sorry. It's also like an iterative approach too, right? like I always start at... I love Occam's razor, ah so the simplest solution is often the right one. So like whenever people start to talk about anomaly detection, and I e default to... Okay, let's start at the easiest implementation ever, plus or minus 1.5, 1.25%. And we'll min max the daily average of rows and just get our value and just go from there. yeah And tell me your use case for why you need anything different. I've never worked in banking. So like, please don't come at me if you were get a, you know, bank, like, so that's not an environment that I've been anybody use case, every use case.
00:35:55
Speaker
And we we just faced this use case recently, and we were totally blown away. um No, I actually have to. Sorry. So first one is telemetry data. When you receive telemetry data, it's not necessarily when you can get it back. It's hard to re-patch it, so you need to make sure it's accurate at every given second. You receive it and it duplicates. This is basically Edge case. It was crazy for us to get algorithm work and, ultimately,
00:36:27
Speaker
Do the anomaly detection and the second use case and it's more but we have a client and They are in advertisement business attack. So if they and There is something wrong with pipeline. It also means for them. They are losing money And yeah, this is yeah when it comes directly to money everyone got excited, but this is not for all pipelines and Right. you're youre I mean, so like the telemetry I can get to and i've I've dealt with that and it really depends on like, okay, well, what's your reality of like, what's the cost of that value you from there? It probably triggers some, you know, I have friends that work in energy. So like it could be like, does it trigger some like onsite thing like for us to have to go to or like, is it a bigger like indicator? Marketing is interesting and though, because
00:37:20
Speaker
I think the level of sophistication that you just said, like race is like, okay, they're doing MMM, they're doing MTA, like they're doing some you know massive AB b testing
High-Reliability Systems in Marketing and Beyond
00:37:31
Speaker
from that end. It's costing them like driving people into their funnel. I can totally see that. But that's a That's a high reliability like system yeah anyways, like that you're going through and you're refining. And I would also say that like you, you built your governance around understanding that. So like, I would never approach those problems flippantly from that end, but you just do it in a more constrained and faster way. But even from that end, like, uh, I guess.
00:38:05
Speaker
That's a very sophisticated like organization. How many organizations like have those use cases, really? This is a very nice call-out because you can understand that i'm if we get in, we're already working with a yeah more or less sophisticated teams.
00:38:22
Speaker
not more or less, but more sophisticated if sophisticated teams and those companies already understand the value of their data. It's like not the investment just in data teams, but investment of their efficiency in in observability of their data warehouse and and data environment. So it's, I would say we are in tier one companies. Yeah. So you're right. And the use case is where they need us is also more direct to money, I guess, and more direct to value. You know, this is a nice cloud.
00:38:52
Speaker
ah But i think it's I think it's great because like it it is the difference between, like yeah, if you're already data-driven, then yeah, that's going to matter. But if you're in the path to data-drivenness, you might not even might not even know.
Reflecting on Unsuccessful Data Projects
00:39:05
Speaker
I've been a part of MTA projects and MMM projects that have gone nowhere, and and it's just kind of it's interesting. It is. No, I just keep thinking, and you're so much right, because I also thought about two other ah clients, for which is web-free platforms,
00:39:23
Speaker
And for them, the magnitude of mistake and failing costs so much money, only recalculation they're all written. And this is also where we help them to make sure, you know, they at least stop at the right point. Yeah. So you're kind of right about the, uh, basically the tier of market we are working on and it's not a representative sample repo.
Benefits of the Modern Data Stack
00:39:49
Speaker
I'm sorry to repeat that, but listen, um, I'd love the talk we're hearing right now and I want to touch base with you, uh, touch base about modern data type because you have this. I think you're one of those professionals that are pure gold because you have this exposure to on-premise systems.
00:40:07
Speaker
which i will be honest i have not and like when people are telling me oh we were working on this you know databases tabular databases and i'm like are you a dinosaur because i you know i'm being i'm a freshman here yeah i have nothing you know i was always on cloud i have no idea how the on prem looks like so how do you feel like about wonder nato stack does make your job easier on all.
00:40:35
Speaker
Absolutely. Hands down. like I'm a huge believer in like the lake house and just kind of the concept of decoupled storage. I love the flexibility of it. um I'm also a huge advocate for understanding the gotchas that software providers have in their managed services and explicitly Yeah, so ah Delta tables in Databricks versus using Delta open source tables or a managed iceberg in Snowflake versus open source iceberg.
00:41:13
Speaker
ah migrating from the managed service to another service, i say Snowflake, the Databricks is more difficult or Presto or any other tool than just using or managing the complexity of the open source, which comes you know at a cost of implementation. You have to think more critically about that. But so what I love about them the modern ah data stack is that flexibility. um So I think of all these different pieces. Orchestration, storage, the cloud provider even, the the database technologies, the programming languages, it's just pieces that you can move around. um So I try to think and build ah from an organizational standpoint and around
00:42:03
Speaker
What happens if this organization comes to me next year, this tool that I have, um and I built it so ingrained in my into my data stack and they come to me and next year they they want a 20% increase or 50% increase, right?
00:42:19
Speaker
how do i How do I sell that to my management that like we didn't know this was going to happen? ah Sorry, but like we have to pay this fee. I ah love the modern data stack, but my approach is very skeptical of how businesses that are providers in it go about their providing their services or structuring their contracts. I try to think about What are my alternatives? So if we can use the manage or pass service, what's a hybrid or what does it cost for me to do it myself? And then when we structure those features or the ethics around like trying to do this stuff, we tend to do that cost balance and analysis like against like, Hey, so here's your multiple different options.
00:43:08
Speaker
A lot of this comes from my background of in marketing consulting at mr MRM, I used to have to come up with multiple different options. So like it it just comes into ah the way that I think about it. When I now work on the client side, I still come with, of like okay, what am I losing if I do this this way versus that way? um I would say that my my ah account reps or my sales engineers probably don't like me for thinking that way, but it also like helps us to continue to push and drive forward, but not. um There's also like an interesting piece on staffing from like an on-prem to a cloud. So if you have an organization that's only ever worked in the cloud, they
00:43:54
Speaker
are familiar with like your constant vigilance of cost because it's this external thing that you have where you're paying the AWS DC fees of the world a monthly charge, that server, that EKS, all of this stuff has a cost and
Cost Awareness in Cloud Transition
00:44:11
Speaker
it's tangible. right But if you're working with on-prem and you've always worked in on-prem, you just have a database.
00:44:18
Speaker
you You don't care if you run that thing at 100% the whole time. that That might be another team's problem ah like that you never worried about. That might be like a DBA team or an on-prem management team. But when you move into the cloud, you have to think about every query, every operation,
00:44:41
Speaker
Every single movement and thing that you do, you cannot be at 100%. Your query cannot take two hours if you need to run it daily. You have to think about how you build things and how can you take maybe these monoliths that you had on-prem and turn them into small, replaceable pieces. I know where you're heading. You need to just start thinking of the cost.
00:45:07
Speaker
of the solution you're using also being able to communicate that to your business stakeholder, like do we really need something near real time every 15 minutes or can we just do that batch once once a day? ah It depends, it depends, I know. But do you realize that you are actually a unicorn kitty cat in here saying that I care about that? The majority of data engineers, they don't even have permissions to look at the those invoices and they have no idea Like they they don't care, they are not in charge, they are beyond, because normally the invoice is managed by a financial team, and data data teams have no access to that information in those organizations. So this is actually a sweet spot for cloud providers, because those guys don't care about it, who are actually using these capabilities.
00:46:00
Speaker
I mean, that's a perfect interview question and also something that I would hopefully tie out before I accept it and offer there because like that is a critical step. right ah The reality is like I've come into a lot of different organizations that are in the midst of transformation that need to move to the next step. So there is a lot of oversight. I've had and like do have P and&L responsibility like for departments like that I've been a part of. and And honestly, someone might some of my closest collaborators in a lot of my ventures have been the DevOps team or the cloud team from that end. So like these things are front and center. I built solutions that were on-prem that moved to the cloud that like we were using VMs to like move them through. And it was great. It was super scalable. But
00:46:53
Speaker
I wanted to continue to push the needle in those solutions. like How could we get to a Kubernetes solution? so My approach is i I still want to do cool stuff, but I want to do it so that I can assure the organization that we're driving value and that data the data team isn't just like a loss leader. right We want to be able to go back to our clients and and and the organization and say,
00:47:18
Speaker
we are driving this value. We've reduced our costs like x. And for that, I need additional headcount or I need this tool. I need this thing that's going to help you get this other thing in the business. right And so for me, that's where i I would tell every leader to ask these questions about how much your cost is per month and like what are your most expensive queries or what are what operations there. It shouldn't stop you from doing cool things. like I've run plenty of workloads like simultaneously while deprecating the old stuff. and like You just have to be really transparent about I'm going to spend more money and it sucks, but this is what you're going to get at the end of the road. yeah
00:48:06
Speaker
No, this is interesting, but you know, also I'd love to have a little time left and I'm not sure if you're going to be able to cover this, but this is also what struck me. You say, okay, I'm going to reduce this, but business has not been really focused on reduction.
Proving Data Team Value to Organizations
00:48:20
Speaker
They are more focused on what they are gaining out of data teams. And this is something really hard to communicate.
00:48:27
Speaker
Because again um i had kept emphasizing that data team are horizontally to all departments in the organization like they are not selling you know unless they are selling of course data on some marketplace, maybe then but can kind of clearly show some return on investment but otherwise it's super difficult for data teams and data leaders to prove.
00:48:49
Speaker
I do well you incremental value in organization tangible value like i would say proven the cost reduction is easy because it is always wrong. To reduce and to optimize more bad. You don't necessarily want that to do because otherwise you're gonna lose budget this is also not an issue in data teams on that maybe not for you but this is what i'm getting you know.
00:49:15
Speaker
out there. So how do you actually prove the value of your team? um Yeah. And how do you guys influence that?
00:49:27
Speaker
Yeah. ah i think i I think like my mindset's pretty, my mindset's maybe unique in this case, but like my goal as a data engineering leader is always to optimize myself out of a job so that the organization either like moves beyond like They drive that investment that they're making foundationally today in data engineering, but then it turns into value that you're getting from analysis or or machine learning from that end. right and so like I think that that really comes into a real transparency around this. of like
00:50:05
Speaker
when you're working with When I'm working with my team, I am constantly trying to upskill them to help them to understand what we're what our end goal is. I'm very transparent about where we're trying to go and how it will cause ripple effects like in our team or potentially. right like and it We're covering it from a business value standpoint, but it's also just a matter of Data teams often take hits when organizations go through rifts or restructuring. My goal as a leader is always to try to make sure that my team
00:50:47
Speaker
is not just stuck in their laurels. like They're always trying to push forward or understand what we're doing or having opportunities to kind of go through that. um From a stakeholder standpoint, it goes back to our original topic on like governance. right ah If you're failing to sell that buy-in or data-driven culture or data literacy, then like you're not doing yourself any favors. and You can build the fanciest tool in the world and like no one's going to care.
00:51:16
Speaker
right No, this is beautiful. Like I think this, which mentioned that you push your team to be out there is actually ties back. It could tie it back to your marketing background. It could be something that you realize that not not like market, but you need to fit in and find.
00:51:38
Speaker
ah the the gap where you can bring well your business stakeholders with skills, data, um and expertise you have in hands.
00:51:50
Speaker
Yeah, I think it really comes down to like is this is also going to be controversial, but like what is your motivation for working in data? right like if you like I really believe in the transformative power of data to do good and bad within a society, within an organization, like from that end. And and if were if you build intentional systems around that and are truly embraced data like from that end,
00:52:20
Speaker
the sky's the limit, right? But if you are not, if you're a part of a team and like you're your like around this environment that's happening, like as a leader, it's it it partly my job to figure out like what drives people and like can they rise to the occasion and challenge them to rise to the occasion like for that. But also, conversely, like I can't motivate anyone to do anything.
00:52:52
Speaker
yeah but I can change their poor motivation for why they're here. like no They have to decide for themselves like why why they want to be here. and so i think like that in In my approach to leadership in all of these practices, I'm super transparent. Here's what we're trying to do. ah Who wants to own this? Here's this idea that I have or here's what I think would be cool in the future.
00:53:19
Speaker
um I i ah also try to make it like inclusive. so We use like a ah written pitch process that I came up with with my staff engineer ah at a previous company, Sean O'Dell. We wrote this based on like how can we go through and make it so that you can lay out the problem that you have, what the future will be, answer some questions about that, and then leadership can sign off on it. But as ah another IC on the team, you can't refuse it. So there's three voting options for leadership, like approve, neutral, and disapprove.
00:54:01
Speaker
And for ICs, there is only abstain and approve. And so if you want to make the approach that is being pitched better, you have to comment. You have to add value into it. Otherwise, you use the we use the Amazon maxim of you disagree and commit.
00:54:23
Speaker
That's so interesting. Well, thank you so much for sharing. I think this is very insightful. ah You know, we talked for an hour and we talked it before, and I just realized that you have so much humanity in data in a sense that you want humans, humans. You want your team to think.
00:54:46
Speaker
And you want the data to serve you and may like make it reusable. What I'm trying to say that you try to encourage to put as much intelligence in every step.
00:54:59
Speaker
um Yeah, and you challenge your team. I mean, there this this makes so much sense, common sense, which is not that common, unfortunately. Yeah. Listen, I am very inspired. Thank you for dedicating your time. Talk to me and share your wisdom. Thank you so much for stopping by, Jason. Yeah. Thank you for having me. I really appreciate it. I hope we get another chance to talk again in the future. That's for sure. Thank you. Awesome. Thank you.