Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
006 - Unlocking Exponential Growth: The Data Platform Revolution  image

006 - Unlocking Exponential Growth: The Data Platform Revolution

E6 · Stacked Data Podcast
Avatar
192 Plays1 year ago

In Richard's own words, a data platform serves as a horizontal service—a team and platform that supports various vertical pillars across a business, enabling them to operate as efficiently as possible.

Large enterprises are often notorious for lacking speed, agility, and efficiency. Richard tackled this challenge head-on at VMO2 by combining cutting-edge technology with impeccable processes to construct a data platform that fuels exponential growth.

Richard's data platform team has achieved a remarkable 10x impact across an organization, with over 200+ engineers and analysts using the platform daily to drive business value.

Tune in now to hear more, and you'll discover:

  • How to structure a team to successfully deliver such a wide-ranging project.
  • The      Return on Investment (ROI) of a data platform.
  • The most significant challenges in building an enterprise data platform.
  • The essential skills required for successful execution.
  • Striking the right balance between technical expertise and soft skills as a data leader.

SEE THE EXACT QUESTIONS AT THESE TIMES


2.00 Richards's background and experience

5.50 What is a data platform

10.40 What is the end goal of a data platform how does it facilitate ROI

13.30 How do you structure a team to deliver a data platform at scale

19.00 What have been your biggest challenges and lessons on this journey

25.00 if you could do it all again what would you do differently

27.22 How do you have the agility to navigate the ever-evolving landscape

24.00 How do you use your tools and platform to “level up “ the whole organization

41.20 Quick fire questions

Recommended
Transcript

Introduction to Stacked Podcast

00:00:02
Speaker
Hello and welcome to the Stacked podcast brought to you by Cognify, the recruitment partner for modern data teams hosted by me, Harry Golop. Stacked with incredible content from the most influential and successful data teams, interviewing industry experts who share their invaluable journeys, groundbreaking projects, and most importantly, their key learnings. So get ready to join us as we uncover the dynamic world of modern data.
00:00:34
Speaker
Hello everyone, and welcome to another episode of the Stacked Data Podcast.

Evolving Data Landscape with Richard He

00:00:39
Speaker
This week, I'm joined by Richard He. Richard is the Director of Data Platforms for Virgin Media O2.
00:00:47
Speaker
Today, we're going to talk about the ever-evolving data landscape and how Richard has implemented a self-serve data platform at Virgin Media O2 to unlock efficiency at scale. He talks about the biggest challenges, some of his lessons, and how he would approach the problems now. I hope you enjoy our conversation. Hi, Richard. Welcome to the Stacked Data podcast. It's a pleasure to have you on. How are you doing today? Yeah, I'm not bad. How about you, Harry?
00:01:16
Speaker
Yes, very good. It's a bit hot end of summer and we're the sun's out and roasting us, but all

Insights from a Google Conference

00:01:22
Speaker
good. How have you been the last week you were at the Google conference? I remember.
00:01:27
Speaker
It was quite amazing, to be honest. It was massive. And there's so many talks being scheduled at the same time. It's kind of difficult to just pick the ones you want to attend because there's so many you want to attend. I only managed to attend like some, but the quality of the conference was really hard. And I also had the opportunity to speak to a lot of vendors in there and networking with other engineers and leaders in the space. I think it was really kind of
00:01:52
Speaker
overwhelming and very super useful to actually see how other people in the data space solve their own problems. What are the trends and especially for some of the newer vendors, the software is how they have improved over the years and to use different ways to solve data problems with a lot more automations, especially the Google Cloud partner. So it's been a lot of time discussing in that space and yeah, it was fascinating.
00:02:16
Speaker
Brilliant. It sounds it. And I suppose we're going to dive into some of the points you mentioned about obviously understanding, upskilling, learning and getting the most out of the technology that's available to us. So Richard, it'd be great just to get a nice overview of your background and your experience.

Richard's Career Journey

00:02:35
Speaker
So actually my background, I studied computer science and actually, funny enough, did mobile computing in my master's degree when I first came to the UK, but I actually never worked for a telecom until now. This is actually my first job in a telecom. So, but obviously my background is computer science of engineering. So my first half of my career, probably about seven and a half years, I spent mostly as a full stack software engineer. And actually, I guess it's more
00:03:03
Speaker
more backend focused. And at that time, I kind of realized I'm probably, am I a data guy? So, and then I came to London probably about like seven years ago, then when I got the opportunity to work for probably one of the largest affiliate marketing companies in Europe, and then I got the opportunity to lead a team. That's where it got in me into data. Actually at the time is mostly when, I think it's 2014 time when Hadoop ecosystem was really taking off with
00:03:29
Speaker
a lot of the, I think people say leading edge, but technology, but we actually managed to do a lot of things in the distributed computing world or massively scalable kind of businesses, operations using the Hadoop ecosystem. That's mostly like the Kafka ecosystem, the Spark streaming, and also with Elasticsearch and HBase. So we actually built a lot of very cutting edge systems at the time. That's actually a kind of
00:03:53
Speaker
somehow accidentally got me into this data world through distributed computing with all kind of looking after the systems. And I guess that's probably kind of early shape of early days of data platform back in the days. And then afterwards I started kind of moving into the cloud world because all the Hadoop stuff back in the days was actually on-prem. So then I started working AWS, migrated Redshift into Snowflake, but basically in the last
00:04:17
Speaker
seven years also of being specifically focusing on kind of transitioning to the Google Cloud ecosystem because I just realized, six years ago or so, I realized BigQuery was one of these things that is going to take off massively.
00:04:32
Speaker
I've started helping a number of companies in their digital transformation journey to kind of became a data engineer and more focused on the data ecosystem to help companies to really use data to address issues. And I've mostly been on the data platform side. And then that's primarily to kind of help the business to make data available to them to basically make people work as efficient as possible to build their product services in a company.
00:04:58
Speaker
And yeah, so I think back me back in here, like in the media or two, I think it's about two and three years and three months. I joined it in May 2021. And then that's when you have back to the telecom world after, you know, started a master's in mobile computing. It's kind of quite interesting to kind of finally come back around into this space, you know, start leading the Anziren team to help with this translation journey we have.
00:05:24
Speaker
Brilliant, brilliant. So a full circle journey. And yeah, I think it's clear over the years that you've become a GCP expert.

Defining the Data Platform

00:05:33
Speaker
For those of you that don't know, Richard also is the channel owner of Practical GCP, where he shares practical experiences on how to run systems properly in production. And he's currently obviously leading the data platform team at VMO2.
00:05:49
Speaker
That's what we're going to dive into today, Richard, how to navigate the ever evolving data landscape. So first off, a new development in the data space over the last recent years has been the cloud data platform. So first off, for our guests, can you define what that data platform is?
00:06:08
Speaker
Yeah, sure. I think this is a very interesting one because based on the different places I worked, I mean, you know, involved in different medium, primarily medium large companies, data platforms have different definitions in different companies. So every team kind of have their own, you know, ways to set up a data platform. Because like you said, it's quite a new thing. And in my opinion, I can summarize a probably just one single thing that is a horizontal service.
00:06:35
Speaker
or T to basically look after the horizontal pivot. So what I mean by that is not like you're building the verticals, which is like these two complement each other for a specific data product or services that tackle a specific area. The horizontal mostly means how do you
00:06:53
Speaker
build. So that could be ranging from data engineering, MLOps, DevOps, or analytics engineering. How do you actually focus on these keywords to make the verticals as efficient as possible? So that would be my definition. And I think
00:07:09
Speaker
It's not just limited to add a support role. Some of these roles, like DevOps and Alice's engineering, that could be more support roles. But in terms of building the data models to own the single customer view in the horizontal layer for everyone else, they don't have to go to the data lake. They don't have to deal with compliance and privacy from the very ground up. They don't have to deal with GDPR from the very ground up. You have a framework set up to allow other people to do that much more efficiently.
00:07:34
Speaker
or in the data engineering team where you can build maybe services to integrate with our system for sending communications to customers that you can actually build something more like a SaaS system internally for the company to use, then not every single team needs to worry about how to scale that stuff or how to connect to the third-party systems or how to
00:07:55
Speaker
get the data inside they have to your customers from the ground's technical level. So these will actually enable the business to accelerate or especially exponentially grow much quicker in the modern days. I think that's my definition of data platforms, but obviously if the company is much smaller, you may only have data engineering. It is kind of medium sized. You might have a few more teams. You might have kind of dedicated DevOps. Some companies that have different kinds of line reports in different areas. So these are all good. I don't think in my opinion,
00:08:25
Speaker
We need to stress too much about what the team should own, but it's more like what works for the business to accelerate the growth. Not everybody has to repeat again and again, especially if you deploy some services, if you're going to do it again.
00:08:39
Speaker
Can we make that a blueprint rather than just trying to start from scratch, which is what I see in a lot of the businesses where they're trying to build a data product. They always start from scratch and then including security compliance, all this stuff that they need to deal with. And that is really time consuming where if you have a data platform team, you can really address those issues at scale.
00:08:59
Speaker
Perfect. So a data platform team is a horizontal team which supports the efficiency of all the different other verticals within the data team. Yeah, I think so. I think to emphasize it's actually really important that data platform apps on our own, we can't actually
00:09:18
Speaker
you know, deliver anything that is end-to-end, in my opinion, because it is optimized in a way that the people are really good at what they do. I focus on these areas and combine as collaborative effort across different teams with different expertise, because you would have, you know, soft engineers focusing on building verticals, you have data scientists building verticals, ML engineers building verticals.
00:09:39
Speaker
and analysts building the verticals for digging out the insights, these are all the counterparts of the platform team. So combined together, that's where you can actually achieve the exponential growth. Otherwise, you would not be able to do that. Brilliant. That brings us on nicely. You've already alluded to a few points, but what is the end goal of a data platform?

Enhancing Efficiency with Data Platforms

00:10:00
Speaker
How does a data platform facilitate ROI? Is it just all about efficiency?
00:10:06
Speaker
I think efficiency is definitely one of the most important things I would say, but it's also like a structure and security and compliance and privacy because it's not about going fast, right? It's also about going fast without failing in, you can fail in the small areas.
00:10:23
Speaker
But I don't think most companies would allow people to get hacked, right? So, or just basically sharing your data with the rest of the world with no, with no checks. So I think that's another pillar of the platform team is for all of these very difficult areas to accelerate, to give other teams a lot more efficiency. They can work a lot faster, but without barring too much about the foundation, because it's baked into the platform. There's only limited amount of things people can learn or need to learn in order to accelerate without barring too much about
00:10:53
Speaker
the issues. So some of these kind of really, I can give you one example of that is like in terms of data platform, what the platform team typically, I don't like to use the word control, but it's kind of like the guidelines are set in there is, you know, not everyone should have, let's say in a networking space, you just have ingress rules to allow firewalls to be open. So those the teams would not have permission in those areas.
00:11:16
Speaker
for good reasons, right? You don't want to give everyone permission to do this kind of stuff because when they don't understand it, it will cause problems. But what we do is we tune it to how most teams would work. And then we create some kind of framework to allow people to accelerate, but they don't really have any kind of restrictions. They can still do it, but it would be in a safe environment, such as if you have data scientists working in experiments, right? When the environments, let's say the Jupyter notebooks we run in the GCP world, they would not have egress rules to be able to access the internet.
00:11:46
Speaker
But we can still have an artifact registry connected to how they can download all of these Python packages securely and safely through our internal network with all of the security checks in place. Or having something like VPC-SC, that's more of where we're collaborating with other teams like the security teams. This gives us a lot more. We did actually publish articles as well in this area. On the VPC-SC, how did we actually put this together in our organization to make sure
00:12:15
Speaker
If you use any Google services, you can't easily just send stuff to other places. It needs specific rules to be added, which requires different teams approval. So that makes the overall experience so much safer, which allows an enterprise, even an enterprise level with hundreds of people working together to be able to do things a lot faster, accelerate without worrying too much about the security compliance, privacy, or all of this kind of foundation stuff.
00:12:40
Speaker
Perfect. So I mean, it sounds like particularly on an enterprise level, it has the ability to make sure that everyone is compliant and whilst simultaneously increasing the velocity of their work. Yeah, exactly. Because ROI means nothing if you actually have one big bridge and then set you back like five years. That is something unacceptable from any company's perspective. Yeah. No, that makes perfect sense. So how do you structure a team to build and deliver a project like this?

Team Structure and Digital Transformation

00:13:09
Speaker
I think this is kind of a tricky one, to be honest. When I joined here two years ago, it was just me to start with in the engineering space. So I was like, because I've kind of worked in other organizations are helping to do similar things, but I don't think I've ever been
00:13:24
Speaker
at this kind of position to lead the entire function to kind of assist and build and managing the whole function. So there was a lot of trial and error and then learning from the other expertise in the industry and how other people do it and reading and exploring what are the options. But in terms of structure, to be honest, initially, I only knew this thing's probably a day-time zero team as a data platform team. That's where most companies kind of start.
00:13:52
Speaker
But as we started to grow, because of the things that we need to build and to support different parts of the business, how do we unlock these different kinds of products and services to really deliver the value or improve our customer experience? The team started forming, we have more teams, we want to scale with more teams. And the DevOps function was born because we needed a dedicated DevOps function to help
00:14:15
Speaker
other teams grow as well. It's not just we build everything in the platform team because initially we were kind of doing that, but then it's transitioning to what we call a DevOps decentralization to allow other teams to grow. So that's kind of not like something you just have from day one. It really depends on how quickly the business grows and what other skills get filled in this whole data space.
00:14:34
Speaker
And then obviously, ML is always the thing that the company wants to do for very good reasons because you can then do personalization specifically to help your customers get very targeted experiences, including if you had a bad experience with a call or you found a different offer from other places, all of these different things that we can help our customers make the best decisions using the ML models to
00:14:57
Speaker
allow the customers themselves to have the data they have to make more informed decisions. And then the MLOps function was born because we need to support a large group of data scientists who are very good at what they do. But what you don't want to do is for everyone to start from scratch to build your pipelines and training pipelines or experience from scratch on Vertex AI. That's very time consuming actually.
00:15:19
Speaker
from to start from that. And also all of the security and compliance side, like the thing I mentioned earlier, not having kind of internet access. But how do you do that, right? In practice, it's very difficult. If you don't know, you're just not able to do that. So the analytics function and data modeling, that is something always kind of, you know, you need data models. It doesn't matter what you do, you need data models. But also how do you get a large group of people
00:15:45
Speaker
be able to, especially the ones owning domain knowledge and knowing how to do that. And I also know how to run SQL, which is many businesses have the same thing. They have analysts know how to run SQL. But if you don't have the tool, how do you scale that? Because, okay, you can get someone to write a piece of SQL and then give you to a centralized data engineer and ex-engineer team, which a lot of the companies still do these days. But that's not efficient because
00:16:08
Speaker
You need to get the domain knowledge out of your head and get someone else to do it. But if they already know it, why don't you get them to do it? So that's where dbt and dbt cloud comes in. So this is where we allow the scale of Alex Sanjuri and the Dave Mulling effort tool to be a very large group of people. I think one of the talks, one of my teams I gave recently at the event,
00:16:29
Speaker
was even at the time, I think five months ago, we had 180 users already using DPG or working very closely where we have the centralized team building the centralized-based models, where the other teams have the specific domain model knowledge, they will build their area, but they're very well connected. So it's not like we're repeating the work we do. There's always kind of the center bit, there are also the bits that
00:16:52
Speaker
you have the freedom for your specific use cases. So that creates a very collaborative environment where we can accelerate by not repeating ourselves for all of these areas. So it's kind of a long-winded explanation, but I just tried to kind of show the experience in terms of
00:17:07
Speaker
don't worry in the sense if you don't have all of it, neither you need all of it. So I think it's very important to assess when you go to a company to see what exactly their states are at. The company might just only need that in Nigeria or they just very specifically focus on ML because it's all small stuff and that's the only thing they do. You don't necessarily need all of these other functions because
00:17:28
Speaker
you know, because we don't have that many teams, right? So I think it's all subjective to who what the business needs. And that's something as data leaders, we really can't ignore, because otherwise, it doesn't matter how good technically we are. There's no way we can help the business to transform.
00:17:42
Speaker
Yeah, so it's about identifying the areas in which that automation from that platform perspective is really needed. And there might not be that many use cases in your business, but it's all about identifying where they are and then exploiting the capabilities of a platform automation, preventing that repetition and can really be powerful. Exactly.
00:18:06
Speaker
You've clearly delivered a very sophisticated data platform at VMO2 which touches many different verticals. It's been a long journey. What have been some of the biggest challenges that spring to mind on this journey and what would be your biggest lessons to people?

Challenges in Growing a Data Team

00:18:25
Speaker
Yeah, I think this is kind of probably quite a painful one to me. I would like to share. I mean, as I mentioned, when I joined in 2021, it was just me in the engineering space. And then I've only in my whole life managed small teams. I grew small teams of afterthought people, including myself. So now in our team, we have 30 plus engineers. So that process of growing a team, I always come as I always do even now, I think I'm not very good manager.
00:18:52
Speaker
It's really difficult to try to understand the team's objectives, trying to align with stakeholders, trying to focus on what really matters. Because digital transformation is all about, can you focus on what really matters to work on the really big things to help the company to scale? Because the data platform is not very useful if the business cannot scale. So that is the really important part. To understand to work with the business on that side is very difficult while managing a team. And the other side is rolling a team is not
00:19:21
Speaker
I think all managers will probably agree with me that growing a team from Scribes is a very, very difficult thing. It's especially with what we're doing in this space. As you can see, we want to develop the world over the two years continuously. And that whole process of growing the team is very, very difficult, even from a
00:19:38
Speaker
I wouldn't say even, specifically from a technical perspective. To be honest, I don't think I actually would be able to lead a team if I am not technically competent. So this is kind of a very special area where you have managers that need to be very technical in order to grow a function like this. I think that's the very difficult part because I've seen a lot of the others
00:19:58
Speaker
We're probably in a similar space. We've been doing a lot of hands-on stuff, designing architectures and probably not very good at structuring the team and putting structures in place and how to help. And also, hiring in the UK is really difficult. I think we hired a very good-sized team over two years' period of time. I don't even know how we did it.
00:20:22
Speaker
Starting from myself specifically, the rest of my team also do it is to share the knowledge with the industry to help more people adopting Google Cloud because it is, in my opinion, the most data-centric cloud where you can actually do things a lot faster, a lot security, a lot efficient because I've worked in other areas and I think this is probably the most efficient one.
00:20:42
Speaker
But the challenge in there of growing a team and really kind of sharing what you know and then to do this really complex stuff in analytics and analytics operation, like plugging into APIs from the data engineering space into your data portal and systems. And that is not a skill that many data engineers have or the rest of the teams, most of the teams have. So I think that specifically is very important and difficult because it just took a long time to
00:21:10
Speaker
world team of this that both have, especially in the data engineering space, that both have the software engineering skills and the data engineering skills. So that is something that is quite rare. I think that's one of the spaces I found particularly difficult. And obviously the structure in the team. I think if I will probably say one thing to everyone in terms of what I think is important is
00:21:32
Speaker
maybe set more structures in place at early days. Because that's one of the very important things. If we don't have a structure, the team can get very confused about what they need to focus on. Or, you know, like, is this like one person just does everything? Do we need a team to do this? How do we engage with stakeholders? You know, who owns this priority? So I think structure is something that is really important. That's something I found very challenging, given I've only like, you know, a lot of small teams. And this is the first time I've done this.
00:22:00
Speaker
Yeah. Oh, so it sounds like that process structure and key definitions of where someone is, what their remit is, is so important. And I can imagine hiring on GCP. It's obviously still the smallest cloud provider. It's the area where we are at CocteFi are most busy because it's the hardest area to hire for because there is just less people with that direct experience. But
00:22:26
Speaker
I think that's worth noting as well for listeners that software engineering skills is a really, really important skill to have as a data engineer and it really helps you with accelerating your career because you have a good understanding of the scalability of best practices. So looking back, Richard, what would be your advice if you were to do something differently? If you were to do it all again in a hindsight perspective, what would you do differently?
00:22:54
Speaker
I think now because I've done a lot of it, I wouldn't say we're still a long way to go, but we have done a lot. We've come a long way from initially with a very small team. I really think putting more clarity and structure in the team in the early days helps.
00:23:10
Speaker
So I know lots of people talk about agile, you know, Kanban, Scrumban, all of these different methodologies. These things do matter, but you kind of have to put them in a way that is very practical, not just use the buzzwords. And I think initially I wasn't really thinking too much when the team was a lot much smaller. Do we need these things or not? But actually,
00:23:28
Speaker
I think if I have to do this again, one of the first things I'll be thinking about is how do we structure the teams from early days in order to support, have focus, a clear accountability and ownership, and also just the structure to work with stakeholders, to work with product managers, to have that structure set up at the early days. Because that's something, to be honest, from my past experiences, most data platform team, the way I see operate, don't have that. That's why I didn't
00:23:55
Speaker
know about this, how important this stuff is in the data platform side. But actually over time, when things started forming, that it made me realize this is actually critically important. There's no different to running a, you know, just a piece of answering, you still need structures to allow the specific focus. Because if you don't have a structure, what you're going to get is like,
00:24:14
Speaker
many companies, you just get hit by all of the tactical stuff. You don't even know what you're focusing on. Stakeholder A wants something, stakeholder B wants something, and then you're kind of struggling with priorities. I think setting some structures up to align with stakeholders, especially senior leadership at early days, really helps in terms of how to define the long-term goal. So how do you actually grow your business to utilize what the platform team can actually create to go with exponential scale?
00:24:40
Speaker
Yeah, makes sense. So really communication with the business aligned on priorities and then putting that structure in place from the get-go to really help make sure everyone knows exactly where they're going and what they're doing. So in such a large organization as VMO2, how do you guys have the agility to navigate the ever-evolving data landscape?

Building Trust in Large Enterprises

00:25:05
Speaker
So I think that's probably a tricky one. How do we navigate the landscape? I think this is one probably key takeaway from me is working with others that are already in the business have good knowledge of what is happening in different areas to collaborate and build trust to work together. So what I mean by that is it's not all about Google Cloud. That is one thing.
00:25:29
Speaker
you know, I need to make it perfectly clear. It's not all of our cloud providers or not all about all the DBG or the fancy BigQuery, whatever the things we call the names of the platform team or data product. That's not the most important thing. The most important thing is to understand, especially in a larger organization, enterprises who have been in this space for a long time, right? It's like many businesses like telecoms or banks or education industries and even some of the old like retail areas.
00:25:58
Speaker
A lot of the stakeholders or people have technical expertise in other systems. Maybe people call it legacy systems, but the systems still work, right? So they are kind of older systems. But one of the key things that we can't ignore in this transmission process is to really understand where people are coming from.
00:26:16
Speaker
and why this system still exists over so many years, because there's usually a very good reason why they're still there. And it's not all about changing everything to cloud native overnight, and that is usually a disaster if people try to do that straight away. It's to understand the priorities of the business. Yeah, sure, we want to go exponential scale. We probably want to go all-in with Google Cloud. All of that stuff doesn't matter.
00:26:39
Speaker
the more important thing is to work with the existing side of the business to understand really how do we do that practically, to do it once at a time. If people don't buy insight here of what we've tried to do, all of this is irrelevant. It's very important to build trust with others because at the end of the day, the only thing that speaks the truth is data. But that is a very difficult message to get to
00:27:01
Speaker
across in the large organization, you can't just go to someone else and say, rub a dashboard in their face and say, you are doing it wrong. That's just crazy. We can't do that. This is the word building trust is super important to understand where everyone else is coming from. What is their priorities? Why we haven't migrated the stuff into Google Cloud yet?
00:27:18
Speaker
Are there anything like methods that we can use in the interim to get the same value, but while we plan something strategic to get something test for them to fill the need of the business in the interim? So all of these things we need to understand, and the only way to achieve that is to work with stakeholders for the rest of the business, especially the non-technical stakeholders.
00:27:39
Speaker
Not everyone understands DPT or SQL or data engineering pipelines or ML. It's very important to explain this in layman's terms. So everyone is on the same page, especially using diagrams. That's actually a very easy way to define the flow of so everyone can speak the same language rather than just keep talking. So I think that's, in my opinion, the most important thing that how it allowed us to navigate in this very large enterprise.
00:28:06
Speaker
I think many would probably share the same as me as well. Yeah. Brilliant. So it's about that communication and I like obviously how you highlight the visualization. I know from my perspective, it's always easier to be able to visualize something, particularly when something your people are very unfamiliar with. And I know that there'd be many people in the organization that don't understand the mechanisms behind data.

Growth through Data Platforms

00:28:29
Speaker
So
00:28:29
Speaker
Another topic we're really keen to talk about with yourself Richard is when we've spoken in the past, you've mentioned about how you've used this data, your data platform to help level up the organization and level up people within the data team. So could you explain a bit and elaborate a bit on that conversation for the audience?
00:28:52
Speaker
Yes, sure. So I'll probably go back to talk about this four pillars I mentioned on the four, in my opinion, in a mature data platform team, you have this four, you know, building blocks that is the fundamental or, you know, different kind of function. So there are two, so in my opinion, the data engineering space, at least the engineering space, these are
00:29:12
Speaker
sorry, data engineering space and the data modeling space. So I think some company kind of put that inside of the analytics engineering team. But because it's quite new, people have different names. But let's just say the core data modeling team that have the analytics engineer skills to build the core data models, plus the data engineers that build the data ecosystem to make data available or services available to the verticals.
00:29:34
Speaker
These are the level up, I would say, in these two areas, is that in terms of the data engineering side, I think this topic has been talked about quite a bit from different data leaders on linking, which I'm quite active on, is quite a lot of people, including myself, we believe the traditional data engineering job will be gone in five years.
00:29:52
Speaker
I know that's quite a big bold statement, but I'll leave the reason behind that. This kind of links to what the level up means. So as you mentioned as well, software engineering skill is imperative. It's something that we must know as data engineers, because I'm coming from software engineering, but I feel more strongly about this. The ETL thing is going to start disappearing. Starting from five years ago, there was
00:30:15
Speaker
very little vendors in the space to do something called the CDC solutions. That was an influence in the early days of it. That is a challenge that Capri, if anyone doesn't know what that is, is basically if you have a relational database to get that data into the cloud. In the past, you have to spend a huge amount of time doing, you know, Airflow jobs and batch processes to get that data in.
00:30:34
Speaker
But that is not a thing anymore. There's many competitors such as Trim, Click, Datastream. That is basically what Google bought. Alumas probably turned into that. I don't know where Alumas still exist or not. But it's basically all of these. I know you have Oracle Golden Gate as well. So all of these players in the space that would give you the CDC solution. There's the one, I can't remember the name, but it's open source as well. So all of these things combined.
00:30:57
Speaker
There's very limited companies still doing this kind of batching gesture from their operational databases or relational databases. So that's kind of one thing that Data Engineer used to spend a lot of time on, doing this stuff in Scratch. And that used to take five to 10 Data Engineers, takes bloody ages to get this thing done. But that disappears pretty much for a lot of the organization, even enterprises, to stop doing that.
00:31:20
Speaker
The other thing is this thing around integration with third-party. This is a lot of the... Let's say you have other systems like Ziltab, Braze, Salesforce, whatever, right? With third-party systems that, in the past, the data times just spent a lot of time going back and forth, sending the data out and then getting the data back in.
00:31:38
Speaker
So I think one of the articles I shared is around Analytics Hub. That is one of the pretty new Google Cloud services that really kind of bridge the gaps of those kind of things because third party can basically give you a big query table and share it with you via Analytics Hub that eliminates the entire integration effort. So a lot more of these things that happens based on a specific model called
00:31:59
Speaker
PubSub, publisher subscriber model. And that basically allows the movement of data in a more event-driven or directly kind of sharing with from publisher side that you have subscribers. It's just like a magazine subscriptions. That really basically, if you see what's going on right now, all of these ETL parts is distributed bits by bits.
00:32:18
Speaker
And that's one of the things we've been kind of discussing and debating in on LinkedIn is how many years this thing is left. But I think this is from my perspective, that's why it's very important for data engineers to pick up the software engineer skills. But it's not like the role is redundant, but it turns into data platforms. Some call it data platform engineers.
00:32:38
Speaker
So that's kind of like you build like a SaaS to allow massive scaling, reusable components or services for the rest of the organization to utilize. So that's where it's going to play into. So it still requires the same kind of skill because you need to know how to scale, how to distribute the computing stuff work, how the pipeline can be optimized, how to monitor these things, how to alert them these things, but it would be basically all turned into services. That's I think one of the foundations of fundamental things that is changing.
00:33:04
Speaker
I spent quite a lot of time talking about this because I think this is important. I think from a software engineer perspective, there's options for people to move into data engineering to do as data platform engineers or data platform engineers can even move into ML to do BMR engineers if they're more interested in the machine learning type of things. And with general AI made available, people will be a lot more accessible to data engineering or software engineers to get into ML to
00:33:28
Speaker
You know, the foundation model, for example, from Vertex, I probably give you like 95% and you just do the last 5% or the last mile to optimize your machine learning models based on that just a little bit that is specifically for you or for every single customer. But you do that math bit, but you don't, you know, go train a model to do generative AI for language-learning models. That's just not everyone's going to do. There will be key players doing that, but I think the access tool, the rest of the world would be fundamentally changed because of this. So.
00:33:58
Speaker
I probably don't have time to talk about all of it, but I think that's one of the good examples of a level up. I want to briefly touch on the other one, which I think is also important to a lot of the analytics engineers is because the analyst community is quite big. And then one of the very important things is that data analysts usually have very strong domain knowledge on how exactly they know how things work.
00:34:20
Speaker
And they also very often, almost nine out of 10 times, have SQL skills. Some of them are OK, some of them are very good. So in the past, I've seen organizations stop analysts from getting their hands on data modeling because they are not data modelers, they're just analysts, right? And then they get really frustrated because it's like, OK, I have the skills to do it, but why I can't do it? So this is where DBT comes in, especially with DBT Cloud, which is all browser-based, right? So that basically
00:34:46
Speaker
that allows people to not get really blocked out into the laptop because the security is all handled in the browser on the cloud with everything kind of locked out. So that really enables us to scale through a large group of people doing data modeling. But you still need to structure in a very good way, otherwise turn it into a mess. But what I'm trying to say is to level up analysts to be analysts and JS because of that single tool.
00:35:09
Speaker
Because as lineage, you can interactively develop everything. You can see everything. If you ask you a question, you just show them the lineage. This is how things are coming in. This is where SQL is all clear. And it has a lot of data quality checks you can do out of the box with integrations. You can even plug in the lineage information based on the Google Cloud integrations with other tools that are specialized on governance that you can buy from vendors. So there's a lot of options in there. So that means, analysis,
00:35:34
Speaker
If they choose to, there's analysts looking at, looking at building dash phones and stuff. That's probably other things. People decided to be more focused on the business side. But there's also creating a very new group that they actually want to focus on building the data models for the business. So I think it's creating different layers because of the access to the skills required is much condensed to one or two things. So in this case, all you need to know is SQL and Git.
00:36:02
Speaker
Virtual control is a bit of a barrier, but it's typically two weeks and you can start doing it. It's not like it's really difficult if you can't learn. But once you've got that, you've got version control and you've got SQL village that basically allows you to run production systems if the optional is set up. But then you need data platform again to make sure this doesn't actually go out of control. Because if it goes out of control, you have 100 trues, instead of single-source trues. Then at the end of the day, it delivers again nothing for the business. So this is where, again, the data platform how it sits in.
00:36:30
Speaker
But the level up is all about how to get a group of people who, I think in my opinion, it's like, how can you identify a group of expertise who are really kind of think they can do the next thing, but it's not giving the tools, they feel really frustrated. So I think that's where you, in my experience, this seems to be
00:36:48
Speaker
the platform team's job to figure out a model, how to make that work better to accelerate. Because usually in a company, you have 100 analysts, but you have 10 analytics engineers. How do you bridge the gap?
00:37:04
Speaker
What if you turn into all of the 100? We actually have 180 combined. So if you turn all of the analysts into analytical engineers, or maybe have half of them focus on model data modeling side, the other half focus on the more stakeholder focus side, then you suddenly have exponential growth because you just end up upscaling everybody.
00:37:24
Speaker
to the next level to be able to do the different kind of work, which historically, the perception was only data engineering. I think the whole thing, the whole world has changed because of the new tools are made available to the businesses. I think that's really the two things I felt very strongly that I want to share in terms of level up the team is to do the more advanced things because of the tool you made available.
00:37:45
Speaker
Yeah, well, look, that's why we're here, right, to discuss the latest and greatest tools and processes in which enabling organizations. I think my key takeaway points from that was that you need to identify the best tools which are going to enable your team, whether that's analysts, whoever.
00:38:04
Speaker
But you need the data platform team or an individual who is setting them frameworks, which will really govern how then people use that tool and the processes that they follow to avoid that mess. Because I've equally spoken to organizations where they've just been let loose on something like dbt and then a lot of debt builds up. So I think that.
00:38:26
Speaker
that emphasis on someone who understands what good looks like and is able to build the frameworks that will govern the wider team. I think that's obviously important and it sounds like that's what you're getting at too. Yeah, exactly. I would like to call the data platform teams, we're all evangelists.
00:38:43
Speaker
So if we actually understand how something works is all about sharing that to let the others how to be as good as us to kind of make us redundant over time. And then because that grows a lot bigger community and everyone gets better, it's just a lot more interesting to be honest in that way. So everyone kind of feel a lot happier because they have the tools to do their job rather than I always have to store it over the fence. But I think this whole concept kind of started with the, you know, in the ML world with data scientists
00:39:08
Speaker
As true, you've heard about this data science, it builds some models, object notebook, this thorough defense or MRS there to deliver to production. That has been going on for years, but this is where MLOps was born to exactly to solve that problem. So everyone can focus on what they're really good at without needing to learn everything and set up everything from scratch.
00:39:25
Speaker
Yeah. Yeah. I mean, that's been a, it's been a game changer for data scientists, right? And I feel like now, and you know, data scientists was a hype quite a few years ago now, but I feel like they're now they're truly doing the, some of the jobs in which they want to do. Cause they can focus on, on, on building their algorithms and they have the frameworks behind them, which are supported by people in them areas.
00:39:46
Speaker
Brilliant. Well, look, I think it's been a fascinating conversation, Richard. I think there's some really interesting lessons in there, particularly for enterprise, people that are working within enterprises and larger organizations to think about how to structure a team and how to get the most out of the people that you already have there, as well as the appropriate strategy for bringing in that internal expertise. So it brings us to the final section of the show.
00:40:12
Speaker
So the final section, Richard, is just a quick fire round of questions. We ask all the guests the same and it's basically aimed at helping the community with job opportunities, skills, and essentially advice for how they can maybe get their next job. So the first one is how do you assess a job opportunity in your career and how do you know it's the right move for you?

Job Opportunities and Business Value

00:40:35
Speaker
I think for me, in my whole career, I've
00:40:38
Speaker
never really focused on money. That's kind of a frank way to put it. So I've always been looking for other areas where there's a challenge in the business that I can be involved, learn from the others. Also, I think there's something for me as a challenge like, can I give a go? But yeah, so in short, look for challenges that is reasonable. And that's probably the only criteria for me to move.
00:41:03
Speaker
Yeah, makes sense. Challenges where you can equally add value. So next question, what's your best advice for people in an interview? I would say, well, I probably have a few, but I think one of the important things is really kind of research the company to see if there's something you want to do. I think that goes both ways.
00:41:19
Speaker
in terms of the value of the company that you've got to join. And as you understand what they're doing, it's not just, I'm going to work for this job, right? So that's very important. So from the other side is also, I think this is what most managers hardly really care about is, in one thing in an interview, talk about value. I see a lot, especially CVs, I mean, the candidates talking about, I've used Spark, I've used BigQuery, I'd have done this and that, but all technical stuff, but there's very little
00:41:49
Speaker
What have you actually done for the business? What is the ROI? So I think this is something I was not very good at in early days of my career at all. So I had very good mentors afterwards and then telling me all of that stuff, which fundamentally changed how I focus on. And it's very important things for all businesses, no matter whether we're technical people or non-technical people, the first thing to focus on is business value.
00:42:11
Speaker
And then that leads to technical solutions, not the other way around. So it's very important for technical interviews and to work for the industry with the business to focus on what have we as technical have technical expertise in certain areas help the business with the skills we've got, not the other way around.
00:42:30
Speaker
Yeah, that's that's a classic and it's also a pet peeve of mine. The listing of technology, just listing a technology on your CV and just talking about that you've used it really doesn't help anyone understand whether you actually understand the mechanisms behind the technology or what you are doing with it. So great advice there. And finally, Richard, if you could recommend one resource to the audience to help them up skill, what would it be?

Learning with Practical GCP

00:42:57
Speaker
I guess you've mentioned my channel, right? Look at the platform.
00:42:59
Speaker
it just be cheeky like but to be on a more serious note i think it is something i actually spent a lot of my time on doing this too i don't know if everyone know but for each video that's probably 30 minutes it took me average of seven to eight hours to make it because a lot of these things i don't know the details either but by sharing those things it really forced me to learn this again from scratch just to make sure i check everything over i don't i'm not talking something that is nonsense i really want to kind of
00:43:27
Speaker
know, for others to understand this easily. How do I explain it in an easier way? So I think that's something I put a lot of effort in is people find like anyone find this interesting topics I've been talking about in there. I do think, you know, give it a try to see if it's actually helpful or not. I think one other thing I feel, obviously, it's very important to be practical because at the end of the day as well, you know,
00:43:49
Speaker
that there's a job we need to do to focus on the value we have to deliver. So it's just this application, it doesn't really work, but it is useful in the way I explained.
00:43:58
Speaker
I'd just like to reiterate the audience checking out Practical GCP, Richard's YouTube channel. I've sent a few candidates there within the GCP space who have all spoken very highly about the content. So definitely give it a look at if you're in the GCP environment or looking to move into that space. But for now, Richard, I will have to say thank you. I really appreciate you taking the time out of your day to speak with me and share your learnings and your lessons with the audience. I hope you had a good time.
00:44:27
Speaker
Amazing. Thank you very much and thanks for the invite. No worries, tall Richard. Have a lovely time and we'll see you again soon. Thank you.
00:44:37
Speaker
Well, that's it for this week. Thank you so, so much for tuning in. I really hope you've learned something. I know I have. The Stack podcast aims to share real journeys and lessons that empower you and the entire community. Together, we aim to unlock new perspectives and overcome challenges in the ever evolving landscape of modern data.
00:44:58
Speaker
Today's episode was brought to you by Cognify, the recruitment partner for modern data teams. If you've enjoyed today's episode, hit that follow button to stay updated with our latest releases. More importantly, if you believe this episode could benefit someone you know, please share it with them. We're always on the lookout for new guests who have inspiring stories and valuable lessons to share with our community.
00:45:21
Speaker
If you or someone you know fits that pill, please don't hesitate to reach out. I've been Harry Gollop from Cognify, your host and guide on this data-driven journey. Until next time, over and out.