Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
037 - Build vs Buy in the Modern Data Stack: Stories, Frameworks & Pitfalls image

037 - Build vs Buy in the Modern Data Stack: Stories, Frameworks & Pitfalls

Stacked Data Podcast
Avatar
91 Plays2 days ago

On this episode of the Stacked Data Podcast, Harry Gollop sits down with Hugo Lu, co-founder of Orchestra, to tackle one of the most common debates in modern data teams: should you build your own tools or buy off-the-shelf solutions? Hugo shares his experiences on both sides of the decision, practical frameworks for evaluating cost, opportunity, and long-term value, and real-world examples of when building or buying was the right call. Whether you’re a Head of Data, an engineer, or just curious about tooling strategy, this episode provides actionable insights to help your team make smarter, strategic decisions.

Recommended
Transcript

Introduction to the Stacked Podcast

00:00:02
Speaker
Hello and welcome to the Stacked Podcast, brought to you by Cognify, the recruitment partner for modern data teams. Hosted by me, Harry Gollup.
00:00:13
Speaker
Stacked with incredible content from the most influential and successful data teams, interviewing industry experts who share their invaluable journeys, groundbreaking projects, and most importantly, their key learnings.
00:00:25
Speaker
So get ready to join us as we uncover the dynamic world of modern data.
00:00:34
Speaker
Welcome back to the Stacked Data Podcast. I'm your host, Harry Gullop, and today we're diving into one of the big debates in data teams. Should you buy your own tools, ah build your own tools, should I say, or buy tools from out the modern data stack?
00:00:49
Speaker
Joining me is Hugo Lu, co-founder of Orchestra. Hugo's been on both sides of the decision building process, building stuff from scratch internally. um Now he in influence help and helps companies implement tools like Orkbestrup, being the founder of off the tool.
00:01:05
Speaker
So whether you're ahead of data, making strategic calls, or you're an engineer wondering how to implement the next tool or build the next product, hopefully this will be insightful for you.

Hugo Lu's Journey in Data

00:01:15
Speaker
Hugo, it's great to have you on the show. um It's Friday morning, a big day to London, so well wife but thanks for jumping on with me. How doing?
00:01:24
Speaker
I'm good, I'm good. Yeah, you know, pretty ah pretty exhausting, explaining extending the old build versus buy equation to various people in data over the last couple of days, but no, really good otherwise.
00:01:35
Speaker
Well, i for those of the audience that don't know who you are, Hugo, it'd great to get an intro of yourself, your background, I suppose, where you are now. Yeah, of course. So like many people in data, I didn't i didn't intend to be here when I was when i was growing up.
00:01:51
Speaker
I started my career working in an investment bank. And then essentially, the the company, Juul, the e-cigarette manufacturer, were growing absolutely massively, like fastest company ever to hit $10 billion in valuation, and they were expanding across Europe.
00:02:06
Speaker
So I joined as one of their earlier hires in the UK. We scaled from 100 people to over 1,000 in Europe in like basically a year, hit $38 billion in valuation, and then everything sort of came crashing down. But during that time, we were basically just aggregating Excels to create reports, and I thought, there has to be a better way.
00:02:28
Speaker
and I asked our data team if they would help, but they said they were too busy building infrastructure, but they would teach me. so I kind of got into data engineering that way and you know never looked back since. Built up the small sort of EMEA data team in Europe for Joule.

Should Data Teams Build or Buy Tools?

00:02:44
Speaker
Then I was sort head of having data at Kodak, which is rapidly growing Siri C fintech in London. And now I'm sort of on the other side. So helping people with their orchestration and data pipelining with orchestra.
00:02:55
Speaker
then i Yeah, I mean, it's the constant theme on the podcast. I think in a few years' time, more people will be be having conversations about wanting to go into data from from when they were young. But yeah, at the moment, when everyone's used to a foreign, but great to your background.
00:03:12
Speaker
The debate today, obviously, sounds like you've got a lot of practice from Big Data London, but build versus buy. What are we talking about or when when we're saying build versus buy, particularly in the sort of on-date stack context?
00:03:24
Speaker
ah think is I think it's changed a lot, but I would i would kind of put it into like two key buckets. right like There's a lot of stuff you're not going to build yourself. like You're not going to build a BI tool. You're not going to build a data warehouse.
00:03:37
Speaker
you know You're going to buy those. But there are some things that you do. so It's pretty easy to like write a Python script to move some data from A to B, especially with AI.
00:03:47
Speaker
like Literally, even even you could do it, Harry. But like sometimes at some point that becomes unsustainable. So then you might i look at an ELT tool. And then the other thing is is sort of that entire system of of monitoring and and managing your data stack.
00:04:03
Speaker
And this often falls into, you know, the orchestrator. So people will often try to build these frameworks out in, yeah, just like orchestration and monitoring frameworks, basically.
00:04:14
Speaker
And yeah, they're basically the two areas where we see data engineers spending spending time. so you've got these two sort of areas i mean i can agree more there's obviously areas where you're never gonna you're never gonna build yourself areas where i think there's a bit more of a gray area so suppose going into that gray area what were some of the main factors that you think teams should consider when they're weighing up this decision of build first is by how do they have so many so many factors so many factors um
00:04:45
Speaker
I mean, I think one like one that's really important is just to like assess the skill set your team has. Because you yeah you you work with a lot of analytics engineers and analysts, right?
00:04:56
Speaker
Yeah. And um you know that's that's sort of where I started my career. So doing stuff in Excel, building a report, and then it's like, oh, wow, but i can use I can use SQL to like do a lot of this heavy lifting and now kind of feel like an engineer. That's amazing.
00:05:10
Speaker
But a sort of problem problem that I ran into was that and know People would say, oh, Hugo, like this data doesn't look right. like What's going on? And then I didn't really have the answers because I didn't know where it was coming from. So I had to like go and speak to and more technical people in the organization to work out. They didn't know because actually someone had changed the column name in Salesforce or whatever. like this is This is almost becoming cliche now. I've said it so many times.
00:05:32
Speaker
So I was always looking to basically sort of like get my hands dirtier and and and own more of the end to end. So I think like if you're looking to you know build stuff when you don't have access to like quite quite strong technical resources, that can be really challenging.
00:05:49
Speaker
but But yeah, there are a few others, but I think that's quite an interesting one. think one of the other interesting things that see, I suppose, is that speed to value.
00:06:00
Speaker
I don't know how if you would weigh that up as ah as an engineer. oh right. Absolutely. But I think i think nowo i think mao is the is' the wrong it's the wrong way to think about it, right? like Because before you're like, oh, you know, this, I could maybe do it faster. ah Otherwise it would take me three, six, 12 months or whatever.
00:06:18
Speaker
Because the only thing you're being asked to do is to create a report. But like now with AI and some of some of what's out there, you can you can start automating stuff that's like really, really valuable for the business, right? it's like Let's say you've got like a customer 365 use case, right? So you've got a list of all your customer records in your in in your warehouse and you're trying to you know activate those, do things like more more email marketing, like targeting, retargeting, like outreach.
00:06:46
Speaker
It's a very real opportunity for us now because there are like a million and one things that like we can actually do with AI. I was speaking to an airline earlier this week and yeah they have a lot of people call up And they're like, oh, I don't understand like what my points balance is.
00:07:03
Speaker
And that's just a query, right? like that's That's going into the points and the customer's table and and and working out like you know why actually, yeah, you do only have so many loyalty points.

Hidden Costs in Building Tools

00:07:15
Speaker
And today, you know they've got massive call centers, like armies of analysts just like working around the clock. fielding requests slowly, you know people on hold, like we've all been there.
00:07:26
Speaker
But that's the type of thing that now like you know we can actually start to automate. and It has like a real, real impact. right like Think about all the time people can save. Speed is great, but you kind of have to look at why you want to go fast.
00:07:37
Speaker
And now there's like a really, really big reason to go fast, because like with AI you can just do loads more cool stuff. I think one of the things that I see um speaking to both leaders and suppose senior engineers, there tends to be like differing opinions. um Sometimes the engineers want to build the shiniest, sexiest, most sort of complex thing for a problem.
00:08:03
Speaker
Whereas sometimes obviously the head of data is a bit more pragmatic and wants to maybe you look at buying a tool which is going to solve that problem. 70% of that that problem quicker. How do you, I suppose, manage, say that desire to build something, perhaps you buy something?
00:08:21
Speaker
How do you think about that as a leader? Yeah, that is a but is a really interesting one. I mean, like look, for me, when I was leading a team, was sort of always always a little bit more focused on getting the result, to be honest.
00:08:33
Speaker
But like you know as engineers, like we are we are we are here to build. i heard I heard about a new way of phrasing this, which is build versus buy versus blend, where blending is just like you know sort of picking picking your battles.
00:08:47
Speaker
You know, something like something I've observed in the last of like four to five years is, like take DBT as an example. i reckon at least 50% of data engineers that have been doing this for the last four to five years have built a version of DBT themselves at some point.
00:09:01
Speaker
And it probably worked out well in the beginning. They were like, oh, this is great. Like, can't believe I've done this. And then, you know, kind of it kind of all falls down at some point. And then, you know, you sort of feel like, oh God, now i've got to maintain all this stuff. And oh it's not that bad. But then it's like eight o'clock on a Friday and you're like, wait a minute, why am I still here?
00:09:23
Speaker
I don't know if you've ever heard. but Yeah, yeah, yeah. It's just care that brings me on nicely, a lovely segue to some of the hidden costs that I think people sometimes miss, particularly when it comes to building. It's not just the actual building of it. It's the maintenance, the onboarding, the knowledge loss when that person leaves says as well.
00:09:48
Speaker
there any other sort of areas that you sort see as sort of them hidden costs? And how do you weigh them up as an engineer? Ooh, mate, so hard. I mean, look, opportunity cost that we've mentioned, of course, right? It's a little bit like making a bad hire, Harry, right? It's like, you can ah you know you can you can obviously try to hire someone for free, but you know if it takes you three to six months, think about all the awesome stuff you could have done if you had that person in and you knew they were good from day one, right?
00:10:15
Speaker
it's like It's kind of like a similar pitch to what I to what i say, right? I kind of say, well, look, you know you're a fast-going company or you're a company with this incredible data set, what are all the awesome things you could be doing? And I think like often people in our field are so fixated on like learning how to build the infrastructure that sometimes sight of site over sight of the wood from the trees can be lost.
00:10:43
Speaker
And it's like, yeah, i think it's just like a really a really tricky inherent tension because like as ah as a technical engineer, like you want to you know you want to know your stuff. But equally, infrastructure for infrastructure's sake is almost like the enemy of progress.
00:10:55
Speaker
And yeah, it's certainly something that we've seen is like very, very highly correlated with experience. And like also fundamentally like career progression, right? It's like No one in the C-suite cares that you built something that was a bit like DBT or Orkstra, but it's just like a bit worse.
00:11:11
Speaker
They care that actually in the last six months, you know, they wanted to automate all this cool stuff with data, but you didn't. And instead you've got a hundred dashboards and only three of them get used.

Ensuring Data Teams Add Value

00:11:22
Speaker
Like that's the painful thing. yeah Yeah, I couldn't agree more. I think it's that, um and we've seen it massively with yeah all the clients that we're we're working with, particularly over the last year, is there's a big push for value. Well, what what value are you adding as a data team? Data teams are expensive.
00:11:41
Speaker
Tooling is expensive. Infrastructure is expensive. And you know we had that that that sort of small bubble burst a couple of years ago now, and there was the mass redundancies. And that's because you're seeing as a...
00:11:53
Speaker
But the bubble is back. The AI bubble is back. We have we have logic for AI POCs. We agree. So I suppose how like I suppose some tangible advice maybe for for the audience and for for the listeners. Like how can they think about what value they're actually adding to the business and to the data to their data team?
00:12:14
Speaker
And how can they make sure that they are are adding value when it comes to, suppose, making the right decision? Yeah, I think I'll answer this in two ways. right So I think if you' if you're somebody looking to you know start a data team or build a data stack, or maybe you're thinking about changing job, like doing your due diligence on the culture is really, really important because there are so many companies out there with data teams where like the C-suite just don't get it.
00:12:42
Speaker
And like once you're sort of in that culture, it's really hard to get out of it. you know i normally find that speaking to people who are like maybe in the midst of like a big migration, there's very big retailer in the UK right now, um you know they have a data team, but they're doing this huge overhaul. but This is a very, very good time to join that company because like you know they've realized that ah you know they should be doing all this stuff to like retarget their customers and make their marketing better. and you know Now they have like that impetus, now they have that cultural buy-in.
00:13:10
Speaker
so i think like that that is really important. and then Yeah, I think like the other way is just like, I hate to say but you kind of have to be almost like a consultant, right? It's like you have to say, almost take a step back and just be like, what are the main things we can actually do here that will really move the needle and then just like focus relentlessly on them and like guarantee you like nine times out of 10, the upshot of those things is never to just go away and build infrastructure for three months or like a year if you're an enterprise.
00:13:43
Speaker
But that's what people do. think it's great, Boris. You have to deeply know your your stakeholders, know what they're trying to achieve. you And I think gone out of ways now in the the modern data world, especially with the tooling that is available, where you can just lock yourself away a room and bash out code. You need to understand what the urr ah the rooms are. That's what we love doing, Harry. we love We love locking ourselves in rooms and bashing out code.
00:14:09
Speaker
Well, that's not what the business wants. that's Well, I mean... that is that is That is the tension that we are for we live with day to day. Well, I suppose the flip side, think we've spoken about some the benefits of obviously buying tooling. What are maybe some the risks of buying tooling? Where has it gone wrong that you've seen? That was a great question.
00:14:35
Speaker
I mean, look, the classic thing, right, is, ah I mean, I'm not even going to say it is, but we all we all we all know what it is. But the other thing which I've seen is like, and I think it's really insidious, is um this idea that like, if you choose open source, there are no risks because of the hidden costs.
00:14:53
Speaker
One of these is infrastructure. So like often open source tools, they're more frameworks. like You still have to like provision a computer to like run on it, so you pay cloud cost. But the other thing is just like the human cost. right like Open source tools, for the most part, are developed by you know companies like Orchestra, well, what I mean is like profit-making companies, right?
00:15:14
Speaker
So it means that like, there'll always be like a paid offering that has everything that you actually need, which means that rarely will you find an open source project that like can actually do everything you need it to quite well out of the box.
00:15:26
Speaker
And then the other thing which they incentivize you to do is always like this golden cage effect. They play on like fixed cost fallacy, right? So, you know, you take a framework like Airflow and then you're like, you know, building out your pipelines and then you realize, ah, you know, this custom operator doesn't actually do the error handling all it's due. So you add that in and then it's got no logging. So you build out some custom logging and then you realize the logging's not personalizable. So you build out a logging service to like send nice formatted Slack messages.
00:15:56
Speaker
Then you realize it doesn't have a catalog, like a lineage, so then you start integrating with other tools then you're like, two years later, you're like, oh fuck, what have done? Like how I get out of this? And like what started as something open, there's actually become this sort like monolithic code repository you just can't get out of.
00:16:11
Speaker
ah even had someone say to me the other day, like, you know, dbt is great, but like, what if i want to take it out of that? Like, what do I do? Like we're trying to run like a project with 5,000 models. We're having massive concurrency issues. Like I don't want to do this anymore.
00:16:24
Speaker
And you know that's really hard. The other thing, of course, with dbt is that like a lot of people who are less experienced with data modeling will just sort of create like model spaghetti.
00:16:35
Speaker
And spaghetti is hard to untangle. It's bit like boiling an egg. You can't unboil an egg, right? It's like once you boil the spaghetti, you can't make it into nice pack of flat straight spaghetti again.
00:16:46
Speaker
You can, but it'll take a long time. You get hands quite messy. Yeah, I mean, we we see that many times with our clients. um I mean, it's great for us. They typically need specialist resources to help them unpick that. I think getting in and setting frameworks with any tooling, whether it's buying something off the shelf or you're building it yourself.
00:17:08
Speaker
think at the beginning, you need clear guardrails, clear sort um I suppose, framework's protocol, which going to govern what you're going to do and id follow ultimately the practices of the

Blend Approach: Building vs Buying Tools

00:17:20
Speaker
industry. One of the the gripes i have, and I'd be interested to hear your view on it, is people that don't understand the tooling that they're using. And I think this could maybe link back to the point that you made earlier of, yeah, you might want to build something really cool as an engineer, but actually, if you understand what the tool's doing but under the surface, then you know how to maybe
00:17:41
Speaker
you get some of that flexibility out of it when it's there. Yeah, I don't know. I think like it's really... We've got to give ourselves some credit, right? like The space has moved so quickly. There are so many tools. like Architecture is complicated. It's impossible to be able to talk to stakeholders, convince a C-suite that data is good, be an expert dbt modeler, know how to build a dashboard.
00:18:05
Speaker
you know be like a data engineering whiz and sort of like have a good view on like what good architecture looks like like. That's a lot of skills. So I think it is really, really hard. but But yeah, like you just kind of have to have right frameworks in place to make sure things work.
00:18:24
Speaker
So I think like platforms like Snowflake and Databricks have made this a lot easier because they basically say, look, if you can get well-formatted data into an S3 bucket, and like you guys know how to write SQL and do some basic you know data modeling, you should be all right. like Your architecture is not going to be fucked up.
00:18:45
Speaker
And that's really helpful. But of course, like When it comes to AI workflows and unstructured data and like scaling across multiple teams and like, you know, maybe you have like one use case where you actually have quite a lot of data.
00:19:00
Speaker
So you need like a streaming solution. that's when That's when stuff can get a little bit more complicated and where sort of like having a framework that helps you like visualize your entire stack, like a sort of single pane of glass to sort of settle the guardrails.
00:19:16
Speaker
And that's when that can be really, really important. What about the argument of, I suppose, that the control and the flexibility that you get when, say, you build versus you buy? Do you think that's overrated or that's underrated in this debate? ah think I think that's the, like, you're not Apple or Google or Facebook argument.
00:19:40
Speaker
Like there really, really aren't that many use cases where like it makes sense to just do everything in-house. I'll give you an example, like, ah you know, we work with a few companies who kind of sell data as a product, right? So take like financial regulation, for example, right?
00:19:57
Speaker
You know, this company will, you basically get like reporting data from large financial institutions and from their customers, like filter it through like their proprietary engine, and then at the end it will spit out a result that says, oh look, you are or aren't compliant with all these regulations, here's what you have to do to sort that out.
00:20:14
Speaker
So like all their product is is basically one big data pipeline. But the orchestration is kind of complicated, right? It's like you know when yeah and whatever, like bank A signs up, like you're going to have to spin up all of these pipelines automatically, which is you know that's like quite a complicated pattern, right? Like dynamically generating end-to-end data pipelines.
00:20:35
Speaker
But like building that out... in like Kineta Flow and like maintaining it is a huge task, right? It's like you're trying to onboard like 30 clients in a month. Like that's a job for like five, or six engineers.
00:20:48
Speaker
But obviously with some of the ah some of the tooling on the market, like you do just get that out of the box. So, you know, there's that. And then I think the other thing is like cost, right? It's like, there's a reason that software companies don't use Snowflake as a database for their transactional data is because it's too expensive.
00:21:03
Speaker
Whereas it does make sense for like other analytical use cases. So, you know, I think, I think you kind of have to blend, right? It's like, sometimes it will make sense to, to use something that's bought like a database. Other times it'll make sense for you to host it.
00:21:16
Speaker
You think, I think with things like, with things like orchestration, like often it makes sense to like, you know, basically not. not build it yourself. Because it really is like reinventing the wheel.
00:21:27
Speaker
Orchestration is an old problem. I agree. I think Big Data London, it can it couldn't solve demonstrate more how in how many... One, I'm always baffled of how many vendors there are out there, but these are all companies. Why are you baffled?
00:21:41
Speaker
Data's hard, mate. Yeah, I know, but i but I mean, I suppose I'm always just surprised at, obviously, the the amount of choice out there, which one makes it hard to decide but um on on options, but I think it's...
00:21:56
Speaker
It's obviously it's these companies and these ultimately they're most of these vendors are just data folks, you know, a bit like yourself, Hugo, who have come across and dealt with these challenges internally. And therefore, surely there's a better way to do this. they've sort of been on that path already. And I think sometimes that can get lost in the marketing, you know, the PR, in the sales of it. When you boil it down, a lot of these founders are actually just data people who are trying to solve problems that they really struggle with. I mean, I think, I think this is a thing, right? It's like, because the industry is so new, everyone kind of thinks that the way that like they are using data is, is quite unique.
00:22:35
Speaker
And I think for us as a company, it's been like really challenging to stay focused on like one specific use case. Right. So like. You know, I gave that financial reg tech example.
00:22:47
Speaker
That's actually a relatively new use case for us, right? Because that company will we sort of like use something like an orchestra to power, like a product that they're selling. You know, the one that we stuck to that is our sort of bread and butter even today is just like that analytics use case, right? It's like you're a...
00:23:03
Speaker
you know You're not like a big SaaS company or maybe you're like a fast-running SaaS company, but you're using data for an operational use case, like helping your finance team or doing some like lead gen for sales. like How do you how do you like you know build an infrastructure that solves that problem?
00:23:19
Speaker
And like those problems are the same for 99% of companies. So there I think SAS is a really, really good fit because like you are literally reinventing the wheel if you sort of like don't do it.
00:23:32
Speaker
Obviously, like there are other certain areas where it doesn't make sense. like One of these would be like you know like if you're like a Netflix right and you need to do some machine learning on some streams, right it's like you've got massive volumes of data, like that's really hard.

Convincing Businesses to Adopt New Tools

00:23:46
Speaker
like there all I can't name a tool off the top of my head that would do that for you. I mean, there are plenty of tooling that's come out of yeah engineers that have been built these tools internally for Meta. Actually, I'll give you an example, actually. I spoke to somebody at the conference this week, and they were sort of looking a bit lost. i was like, what you looking for? They sort of looked me, and they looking for a real-time feature store.
00:24:09
Speaker
And I was like, oh, right, okay. And i was like, oh, what about Tepton, that one, the Databricks board? They were like, yeah, yeah, I know. i was going to use that, but then they bought it, so now can't use it. I don't know what to do. right And this was like a staff machine learning engineer, and like a big company. This is like a smart person.
00:24:25
Speaker
And they're like, i don't want to build this. like And I still want to buy it. So it's like, even when you're doing stuff like real time machine learning, even then, like very smart people recognize that like, actually, this is not going to be tenable for us to sort of build ourselves. like I would like to buy it.
00:24:39
Speaker
And like data problems gnarly. Like there are aren't that many vendors that solve them in the right use cases. um And in this case, looks like there's a gap in market. so yeah if anyone If anyone listening to this has been in a real-time feature store, message me. Got a hot lead. Yeah, got a hot lead for you. I scanned the badge. I'll have to i'll have to go for our thousand scans. but One of the things I think would be helpful, us particularly for our audience, is they might have seen and they and worked with a tool that they love and that they would love to put into production in their companies, but they don't know how to go about...
00:25:18
Speaker
doing that and ultimately building the business case for it. So you are ah an engineer um and you you see a new tool and you want to play around with it more. How do you go about convincing the business to to get buy-in and yeah ultimately a getting to use a tool that you' you're really passionate about?
00:25:37
Speaker
Yeah, I mean, look, so what we do at Orchestra is sort of divide into two camps. So there are the companies that the can test and those that can't. So if you're a company that can test, you can just POC something.
00:25:49
Speaker
Normally, I would try to do like a before and after. So you pick a use case and you're like, hey, look, this is what we're doing today. And then you know you test your new tools or whatever, your new stuff.
00:26:00
Speaker
And then you say, oh, wow. And look at what it's like now. And like, if there's not a tangible difference, like if you can't be like, Hey, look, remember that dashboard I produced last year, it took nine months.
00:26:12
Speaker
I just made a similar one and it took me like two weeks, but changed one thing. Like think about all the stuff I could do over the next year. If I had this permanently, I would say do that. You know, I think it becomes like a completely different process if you're in like a company where you have to like go through a process to procure, right? Cause you can't just test stuff.
00:26:29
Speaker
At that point, you just have to start having conversations. How do you have them conversations? i suppose that just that's the big... Mate, it's so hard. So hard. You almost have to become almost like an internal salesperson, right? Like an internal marketer, asking the people who hold the purse strings what they want and like how you can help them.
00:26:45
Speaker
And I think that is ah that is a skill which like comes more naturally to people in the analyst domain, but like a lot less naturally to you people like myself who are more on the engineering side.
00:26:56
Speaker
yeah i mean i I feel like one ah one of my big learnings from my career to date is that like you kind of have to be having those conversations. i mean You probably have more of those conversations with people than um than their direct reports do about these sorts of things.
00:27:11
Speaker
mate What do you think? Yeah, and I mean, we have a lot like that. I suppose one to enable us to work with um what with our clients, where we have to rely on internal ad advocates. and My advice is always to point to the the value, understand what that opportunity cost is, what is the solution. and I suppose you control that pre and post sort of thing but beforehand. You can talk about it.
00:27:42
Speaker
and if you can't actually try. And I think it's helping and um ah the stakeholders see what the differences would be. And then, yeah, i obviously, once the POC starts happening, hopefully that's when the results happen. But I think, as you said, it's sort focusing on the actually understanding and telling a story of this is the problem on this entire situation and then what is going to be the solution and what's going to be the ultimate impact from that.
00:28:08
Speaker
Yeah, no, I think so. I think like, you know, if you're listening to this, right, ask yourself, like, with data and AI, what are the top five things that business needs that would that would move the needle that I could do? but I'd love to know, like, what percentage of people um actually have that answer off the top of their heads?
00:28:23
Speaker
Excellent question. Excellent question, I think, to ask you, Devon. I think if you're not if you answer ah answer that question and you're not working on the top three of them, then ah you need to change your priorities. so from of here We've always spoken a lot about, that i suppose, the the overall debate between Bill first and Biden, but like I'd be keen to understand how orchestra fits into this. You've you've so mentioned orchestra a few times. well What is orchestra?
00:28:49
Speaker
Yeah, so ah basically we've built a declarative platform for building, running, and monitoring all of your pipelines. The sort of old world looks a bit like this. So you know in order to build a pipeline, you'll learn Python, you can learn a complex framework like you know like Airflow.
00:29:07
Speaker
And then within that, you've got to write a lot of code to you know trigger things, pull things, like do error handling. fix themselves when the APIs change, yeah gather metadata so you can actually work out what's going on, do things like quality testing and lineage. And then even after you do that within Airflow, you've got to set up quite complex CI, CD, almost cosplay being a DevOps engineer and and work out how to deploy this in multiple environments.
00:29:33
Speaker
Because obviously you have to do that because if you're not doing CSUD and and sort of like know proper software development, like are you even an engineer? And then you know like obviously as you scale, like you look at things like catalogs and observability tools and lineage tools and you know sort of many years pass and you know you're're you're still you're still not left with the thing you wanted.
00:29:54
Speaker
so The alternative to that is is basically orchestra. ah like we've We've wrapped ah look all of that functionality into like a declarative framework, which means instead of having to like write thousands of lines of code, you can just do it in a few.
00:30:07
Speaker
It's all fully managed, um so there's no infrastructure to maintain. The CRCD is also all out the box. so For me, a big thing is in empowerment. right It's like you can be a kick-ass data modeler,
00:30:20
Speaker
And with Orchestra, you have an entire enterprise-grade data platform at your fingertips. You don't have to go to your manager and be like, oh, sorry, sorry sorry manager, I'm feeling a bit out of my depth here.
00:30:33
Speaker
like I think we need to hire a platform engineer. And they say, oh, how much does that cost? And you're like, oh, according to Glastor, about 120, hundred hundred twenty hundred fifty k but You can just sort of you can sort just sort of get started and have everything ready ready to go. so you know big Big wins, I guess, are like speed, time to life, ease of maintenance, and like lowering the barrier to entry.
00:30:52
Speaker
and Obviously, we're doing AI stuff as well. Who isn't? Who isn't? That's something that's great over- I remember and one of the speakers, an Australian industry, Manu, at the AE meetup, he spoke about what they've been doing over at Medicaid.
00:31:11
Speaker
Yeah, i mean, it really came up on on his slides of his architecture that he was using orchestra. He spoke really highly about the tour and the impact that he'd had with them at Medicaid. I suppose what would be keen to hear, suppose one of your other sort of, I suppose, biggest, most impactful sort of stories of where, you know, it's been quite transformational for them. No, it was nice. It was, I think, I think Elena, one of my colleagues said like, you know, orchestra is one of the things that helps them breathe easy on a Friday evening.
00:31:41
Speaker
where Yeah, exactly. You don't have to worry about all of your pipelines ah failing. But yeah, i mean look that's one. like um you know one of our One of our other other customers is Experian.
00:31:53
Speaker
And here it's more of a marketing use case. moreverse sort of like a ah marketing use case so They basically went with us because they want to get result quickly and they also want to have best in class architecture that's really, really, really scalable.
00:32:07
Speaker
So we're sort of like implementing, implementing there, which is like very, very exciting. Amazing. Yeah, I would encourage anyone that's listening that's got some of these challenges I even think there's probably a few of Cognify's clients who are building out themes, these data platform engineers, and they're really hard people to find. This blended experience, um probably needs to point them in your direction. i mean, the way the way the way i normally pitch it, right, because like you know this isn't for everyone, is firstly, you have to like believe that like there's a build versus buy equation to be had, right? It's like the other day we did a meetup at Rightmove.
00:32:45
Speaker
And like they are ah you know they're a GCP shop, so everything has to be in GCP. So the only stuff they can get is stuff in Google Cloud. So you know they built everything, like from um you know orchestration in GCP to a full-on alerting and like logging service, because it all has to live there.
00:33:02
Speaker
And you know in that case, there is no build versus buy. But you have to be you have to be on the other side. And then the next step is, OK, well, you know would you buy something? like what Would you see the value?
00:33:14
Speaker
And again, like in some in some scenarios, like people just don't. right It's like, for whatever reason, the people that hold the budget have said, no, we're going to have this team of 50 engineers,
00:33:25
Speaker
30 of them are going to be offshore. like We've got resource, right? And like we're just going to do this. And like actually, we don't have a huge opportunity cost with AI and data here. So we're happy to go a little bit slowly. ah you know we like We don't want to be too reliant on software jobs. In that case, again, it's like, you know this this would probably help, but like it's probably going to be a bit of a pain to get through.

Final Advice for Data Engineers

00:33:46
Speaker
But then if you're sort of not in either of those camps, then you know baby maybe worth a look at. Excellent. Well, I suppose any final piece of what I suppose we don like to end the podcast with, I suppose, a final piece of advice, so but obviously relevant to to data, but stuff that you know and now or something that you know now that you wish you'd known sooner of what it made you a ah better and and engineer.
00:34:10
Speaker
A better engineer? Yeah. Yeah. ah Or build a better data class? A better data class. I think it's probably a rule which applies to life and also also in my capacity as a founder, which is just that like making people like you and build your trust is really, really important, especially like as you as a data engineer, maybe in like a sort of like...
00:34:38
Speaker
high growth company, right? You'll be a new function. Like no one knows who you are. Like you're the data person, right? You're like data Dave. think it's really important that like people trust data Dave and don't like sort of like, you know, don't get scared by having to speak to them or or worried that Dave is going to go on about infrastructure and stuff they don't care about.
00:35:00
Speaker
um Because like, once you Once you kind of lose that, like you're fighting a really difficult uphill battle. So yeah, make friends, make friends.

Closing Remarks

00:35:10
Speaker
Excellent advice. Making friends and ultimately like data, you have the easiest conversations when the business comes to you from the from the start, whether that's building a model or adjusting you towards it. When they come to you at the ninth hour and want you Exactly. exact what between so Exactly.
00:35:30
Speaker
be seen as that trusted partner is something we cover a lot, I think on the pod and i suppose that's just reiterating that. So yeah, Hugo, it's been now a pleasure to have you on today. Thanks. It was really fun.
00:35:41
Speaker
No, really fun. Thanks, Harry.
00:35:45
Speaker
Well, that's it for this week. Thank you so, so much for tuning in. ah really hope you've learned something. ah know I have. The Stacked podcast aims to share real journeys and lessons that empower you and the entire community. Together, we aim to unlock new perspectives and overcome challenges in the ever-evolving landscape of modern data.
00:36:06
Speaker
Today's episode was brought to you by Cognify, the recruitment partner for modern data teams. If you've enjoyed today's episode, hit that follow button to stay updated with our latest releases.
00:36:17
Speaker
More importantly, if you believe this episode could benefit someone you know, please share it with them. We're always on the lookout for new guests who have inspiring stories and valuable lessons to share with our community.
00:36:29
Speaker
If you or someone you know fits that bill, please don't hesitate to reach out. I've been Harry Gollop from Cognify, your host and guide on this data-driven journey. Until next time, over and out.