Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
008 - Analytics Engineering: Driving Reliability, Accuracy, and Velocity  image

008 - Analytics Engineering: Driving Reliability, Accuracy, and Velocity

E7 · Stacked Data Podcast
Avatar
281 Plays1 year ago

AI and LLM’s have soared into the mainstream, but did you know @Cleo has been pioneering in this space for years?

Join me on @stacked as I dive into conversation with Andrea, the Lead Analytics Engineer at Cleo. He unveils his fascinating journey from Data Science to Analytics Engineering, a transition many can resonate with. Discover how Analytics Engineering is transforming data reliability, accuracy, and velocity to drive invaluable insights for Celo.

🎙️ Highlights Include:

1) The DS to AE Journey: Spending 70% of his time transforming data, Andrea noticed a lack of clear consistency in data quality—a common post-DS hype challenge.

2) Rise of AE and Data Modeling: In a world with more complex data, more analysts, and a growing push to be "data-driven," Analytics Engineers are recognized for their crucial role in creating high-quality, trustworthy modeled data.

3) Justifying AE Team Costs: Cleo treats Analytics Engineering like a standard Product quad, aligning OKRs with stakeholders' needs and measuring success metrics.

4) Automated PR Reviews: Cleo's team ensures efficiency and data quality by implementing an automated PR review system in dbt, scoring queries out of 100% and providing valuable recommendations.

Andrea shares even more insights in the episode! If you're curious about what Analytics Engineering truly entails, the challenges it faces, and where the sector is heading, this episode is a must-listen.

Recommended
Transcript

Introduction to Stacked Podcast and Guest

00:00:02
Speaker
Hello and welcome to the Stacked podcast brought to you by Cognify, the recruitment partner for modern data teams hosted by me, Harry Golop. Stacked with incredible content from the most influential and successful data teams, interviewing industry experts who share their invaluable journeys, groundbreaking projects, and most importantly, their key learnings.

Clio's Role in AI-driven Personal Finance

00:00:26
Speaker
So get ready to join us as we uncover the dynamic world of modern data.
00:00:34
Speaker
Today, I'm joined by Andrea, the Lead Analytics Engineer at Clio. Clio are the leading personal finance enabler. They use cutting edge AI to help you manage your finances effectively.

Enhancing Data Quality with Analytics Engineering

00:00:48
Speaker
Andrea talks about how analytics engineering is driving quality, velocity, and valuable data to Clio. He speaks about their latest projects to achieve this and the challenges they have had to overcome. He talks deep around their use of LLMs and how they use this to enable themselves, as well as going into their best practices around data modeling. I hope you enjoy our conversation.
00:01:15
Speaker
Hi, Andrea. It's great to have you on the show today. Thank you for joining me. How are you doing? Thank you. Yeah, all good. Recently back from Italy, so enjoying the UK weather during the heat wave. You brought the heat back with us, but we are we're sitting in a quite
00:01:31
Speaker
heavily in AC, LEO offices but they are very nice. Today we're obviously going to be speaking about your journey into analytics engineering as well as what analytics engineering is to Clio and how you guys are driving value for Clio and some of the projects you've worked on so really keen to dive into them but first off for the audience could you just give us a nice overview of yourself and Clio?

Andrea's Career Journey to Analytics Engineering

00:01:55
Speaker
Yeah, sure. So I'm Andrea. I lead the analytics engineering team at Clio. I joined in January this year. As you can probably guess by the accent throughout the interviewing, I am Italian. I'm actually from Sicily, which is the best part of Italy in the south.
00:02:13
Speaker
Nice lemons. Very. Lemons, oranges, nice fruit, weather, sea. So yeah, who is listening to this podcast? Go to Sea City for sure. Your next holiday next year. So at Clio, we try to empower people to build a life beyond their next paycheck.
00:02:31
Speaker
Clio is an AI assistant, which now of course is great with all of these LLMs bubble going on. But Clio actually started way back in 2016, so we're actually well positioned in the market from before. What we try to do is focus on personal finance and make personal finance into a fun conversation, which is more or less the same kind of you would have with a friend, but with our very nice chatbot.
00:02:56
Speaker
And apart from that, we give people some tools to improve their personal finance knowledge and situation. Amazing. That's great. I mean, obviously with AI exploding in the last year, many organizations are obviously running to get on projects and implement AI in their organization.

Measuring Success in Analytics Engineering

00:03:15
Speaker
But it sounds like clearly it's hardened the space and been doing it for years. Yeah. We were very lucky or our CEO Barney was very smart a couple of years ago.
00:03:25
Speaker
I'm sure it's a bit of both, but really keen to dive into, I suppose, more about your role and how you're impacting the wider mission at Clio. So how did you get into analytics engineering and what is it about analytics engineering that attracted you to the space? It'd be great to understand your journey here so far.
00:03:45
Speaker
Yeah, actually, my master's was way more focused on data science and machine learning. So when I started my career, I started at the zone. I was actually hired as a data scientist working in the prototype area. Data scientist maybe was the wrong job title, because actually it was kind of like a mix between a data scientist and a prototype analyst. So I was doing kind of like reporting, dashboarding, and also modeling, like clustering, some predictive analytics.
00:04:13
Speaker
for the product for product optimization but then effectively 60% of my time I was cleaning data transforming data to then use that data to generate insights and that was actually the part I like the most so at some point
00:04:29
Speaker
After, let's say like six months in that role, we realized with my manager at the time, with Luke, that there was a gap in the company of consistency of clear and good quality data. That was just simply not there. So basically what was happening is that
00:04:44
Speaker
different teams, AI analysts, data scientists, product analysts, even product managers sometimes. They were getting raw data, they were applying their own transformations, business logic, and then they were creating dashboards. But of course, all of these transformations, all of this business logic was not consistent, was different, naming conventions, was not a thing. SQL queries were often very badly written, not efficient. So there was a lacking of consistency and a lack of organization within the transformation bit.
00:05:14
Speaker
So we saw that gap we entered and we actually started the A team. We were three at the beginning. There was actually not many but then in a couple of years we were 15-16 and at that point we're actually doing the full ELT process. So for me
00:05:32
Speaker
The best part of analytics engineering is actually trying to be in the middle and talk to business stakeholders, technical stakeholders, and try to kind of like bridge that gap, which at the zone was there and it was big. So that's why I got into it.
00:05:48
Speaker
Brilliant. It's something we see so often. Organisations lack a data modelling strategy and it can cause huge amounts of data debt, right, when everyone is off doing their

Data-Driven Culture and Growing Complexity

00:06:02
Speaker
own things. So you've noticed that gap and built out a standardised function which could manage and empower everyone else that wanted to model and use their own data. You gave them the framework to do that, correct?
00:06:16
Speaker
Yeah, exactly. Perfect. Well, that's great to hear. So data modeling has always been around. It's always been essential to surfacing data. Why do you think in the rise of analytics engineering has happened so quickly within the industry? And why do you think data modeling is equally sometimes taken a back step up until now?
00:06:40
Speaker
I think that for two main reasons, I would say. So the first one is that a data-driven culture, it's now becoming a thing, really. So a couple of years ago, being data-driven, it was more, I think, more like a buzzword. Like people wanted to be data-driven, but they were not really. They didn't know what it was. They didn't know what it was, exactly. So they were trying to make a decision and then justify that decision with data after, and then maybe bending the data to fit what they decided.
00:07:09
Speaker
Whereas now, in the industry in general, there is more of a correct approach to use data before actually making a decision without that bias. So I think that there is more of a demand for data products.
00:07:25
Speaker
And in this case, basically, you have more analysis, you have more data analysts that spend more time actually trying to create insights, which means that there is a way greater need to have that clean data. So you don't just need data that was there five years, six years ago, but you need clean data that is ready to be analyzed by someone. So that's the first thing.
00:07:47
Speaker
The second reason, I think, is that we now have way more data. So we always had data, but now data is collected literally everywhere. So almost if you breathe, you're generating data. Every time that you look at your phone, you're generating data. So now data is more and it's more complex. So before, I think that even data analysts, product analysts, data scientists, whoever could just get the data
00:08:10
Speaker
whereas now the transformation process is way more complex. So that is why you need a dedicated figure that can talk with business stakeholders but can also understand the technical side of things to actually make something that is usable.
00:08:23
Speaker
Yeah, that makes sense.

Roles in Clio's Data Team

00:08:25
Speaker
When there's added complexity, you need that increased sort of segregation and specialism in roles. And it's funny seeing that journey, isn't it, of job titles. Job titles, I think, are sometimes a bit of a bump steer on responsibilities. You mentioned, obviously, your first role as a data scientist slash analyst slash analytics engineer. These skills tend to be piled into one, but now as we're gaining complexity, they're segregated out.
00:08:51
Speaker
Yeah, just back in the days, like five years ago, data scientists looked very cool. So of course, people were calling themselves data scientists. Now, in ethics engineering, it is cool. So we have more analytics engineers. Yes. Yeah. People are flooding to the space. I think you get people coming from your background as a data scientist. You get data engineers who are coming sort of close to the business. Definitely the topic of the current sort of era. OK.
00:09:17
Speaker
Data teams more than ever in the last year have really had to justify their costs due to macroeconomic situation. As a new discipline in analytics engineering within the data space, how do you approach measuring success for both yourself
00:09:34
Speaker
and your team and the value that you're creating so that is a very good question and i don't really have very good answers whoever when we're going to post this podcast if someone in the comments have a good solution that would be great we can implement it here
00:09:49
Speaker
What we're doing now, which I think works, but it's not, is we treat analytics engineering as basically a normal product squad. So we follow the OKR frameworks, which is objective and key results. So we work in terms of Clio, so four months, basically. At the start of the term, we sit down with the team, we sit down with our stakeholders, and we try to understand what we want to focus on for the remaining of the term. So we basically just set objectives.
00:10:19
Speaker
After we set those objectives and we explain the rationale of those objectives and we explain why they're relevant for the business, we set metrics for them. We create reports to track those metrics and we have basically targets that we want to achieve. And then throughout the term we monitor how those metrics are going and at the end of the term we report if we eat all of those metrics or not, basically.
00:10:44
Speaker
which is not perfect because it's difficult to translate what you do in data modeling, data optimization to a revenue number, but it's the best proxy that we have for now. Yeah, I think it's a problem that many teams are facing in terms of that actual quantifying of a number. I think one of the drivers of that is because an analytics engineer is more of a support role. You're supporting
00:11:10
Speaker
downstream needs, which then has, I know that some people measure what analytics engineer, what models are they powering and what is then the return on whatever the product is. That's one way I've seen of quantifying it. Yeah, we did try that.
00:11:25
Speaker
But we didn't necessarily like it because we don't do data modeling only. So data modeling is one part of it, but we do much else. We do a lot of platform, for example, maintenance. So data modeling is just a part of it. So we wanted to have something a bit more inclusive, I would say. But yeah, that could be a good way.
00:11:46
Speaker
Yeah, I think there's no one size fits all is there? There's a lot to encompass. So you obviously mentioned that your analytics engineering team does more than modelling. So Cleo, can you give some context to the audience as to what your data function looks like and some of the key technologies that you guys work with at Cleo and what you look to achieve as a data team as a whole? And then I think we'll dive into specifically more around your team.

Clio's Data-Centric Approach in Fintech

00:12:13
Speaker
Yeah, sure. So at Clio Data Team, it's actually very big. Like, I don't know exactly the ratio between data people and employees, but it's way greater than whatever else at work, which is great because Clio is a data driven company at the end of the day, because of course, like we're a FinTech company, but we do work with data. Also, like the product itself
00:12:33
Speaker
It generates insights for our users, like the data component is there and it's like super important. It's the product. Exactly. So we do have three main data positions. We have data scientists, but the data scientists are more of a product function on their own. So they do data modeling, they work on product features effectively.
00:12:55
Speaker
We do have a product analyst at Clio, they're called AXS. And basically what they're doing is kind of an intersection between a data scientist and a data analyst. So they do modeling, they do some clustering, for example. They do a lot of A-B testing, but they also do reporting for main OKRs and features. They are embedded into every squad. So at Clio, which is super interesting for me, is when a product squad is deployed.
00:13:23
Speaker
In this quad, there is always a data analyst or a protocol analyst in this case, which is great because everyone is then forced to be data driven. And then we have analytics engineering, which is what we do. We're going to explain in a second exactly what we do, but we do kind of like the ELT framework. We look after the ELT framework. And then we do have
00:13:44
Speaker
Data engineers, but at Clio, data engineers are more DevOps, data platform engineers, really, than normal data engineers. Because we, as AEs, we do the ingestion, sometimes with five-turn-in-stitch, sometimes with custom ingestions where we can. And the data engineers, they do some ad-hoc ingestion, but it's way smaller than the full-on ingestion piece that we work.
00:14:10
Speaker
And then in terms of the second part was what we're going to do as a data team. So what we're going to do as a data team, now there is a lot of movement in the LLM space.
00:14:29
Speaker
everywhere. So we are kind of already a chatbot, which is great as a product. So I think like improvement to our own product to make it like even smarter, that is more kind of like the data science way. Also, we're trying to include the LLMs and new technologies even internally. So for example, we can talk about it later, but we are
00:14:48
Speaker
creating tools to automatically suggest SQL code or descriptions for YAML files. So we're trying to kind of like leverage as much as we can this LLM technology. And then the other thing is be kind of like smarter, faster, generating new insights for people so that Clio can be even more data driven.
00:15:07
Speaker
Perfect. Perfect. That's brilliant. And you mentioned obviously how you've got this sort of embedded model. Is that sort of hub and spokes style model? Is that what you would, I suppose, name it as? So that's an interesting point. So at Clio we have the product analysts and they're full on embedded. So they attend the standups of the squad. They leave the squad day to day.
00:15:31
Speaker
Whereas, analytics engineering now is a central thing. So, we see it's central in this horizontal pillar and we support all the squads and all the access. So, it's basically A and platform centralized and then we have the product analyst embedded and the data scientist, their own squad.
00:15:51
Speaker
With a data analyst, by the way, of course, inside the squad. Yeah, that makes sense. I think it's so important, isn't it, to have at least one team of your data professionals embedded because you need to be connected with the business. And that makes that flow of communication much, much smoother. So let's dive into your team. So analytics engineering, what are the responsibilities of your team? What's the remit?
00:16:19
Speaker
leading on to what's the overriding goal of your team? Yeah, so the overriding goal is basically to have high quality data every day.
00:16:30
Speaker
without breaking redshift. That's the goal. And spoiler alert, it did happen sometimes. It's not anymore because we fixed it. But yeah, so like our goal effectively at the end of the day is to try and empower decision making with high quality data. That is what we try to do.
00:16:49
Speaker
We recently went to this exercise that I would actually recommend to all the teams, all the data teams, of actually trying to formulate a team mission and a team vision, almost treating the team as basically a startup, which is actually quite interesting because it puts you in the right mindset and it forces you to define priorities for your team. We did write an article about it, so... We'll put a link in the comments. Exactly, go and have a read.
00:17:14
Speaker
But yeah, so effectively what we want at the end of the day is to try and ensure equal access to reach accurate and reliable data. And we want this to drive impactful analytics basically and fast decision making. So the components of that statement is that we want data that is reliable. So let's say 8 a.m. every day, data needs to be there. It's accurate, so no bugs.
00:17:40
Speaker
We actually understand the bugs before they happen. We don't, but effectively we would like to. And we want the data to be reliable. So it's there always. And why we want that? Because the analytics that needs to be generated from that data needs to have an impact. It's impactful.
00:17:56
Speaker
And that is also why we work so closely with all the other data themes, so that what we do is actually used. It's not just sitting there in a corner of redshift that is dark, it's the corner of the room, but it's actually queried, it's useful. People are getting value out of it.
00:18:12
Speaker
And so how do we do all of that? We try to provide a single source of truth for the data. So basically all the PRs that are raised against our dbt repo, we are aware of it, we have a look at it, we know what's there. So we tend, and actually we're actually quite good at it, like we tend to have very clean data set that are used
00:18:35
Speaker
in a bunch of places that are consistent, so it's very difficult, it's very rare for Clio to have people getting different results, basically from the same question. We try to create a great analytical experience for everyone, so we work closely with prime managers to understand how they're using our BI tools, we try to optimize them for performance. We also try to make it as easy as possible for people to contribute to dbt as well, because we realize that
00:19:04
Speaker
It's also important for other data professions to actually work with us directly in the repo. And then effectively what we do at the end of the day is again just take ownership of the ELT framework. So we ingest the virtual storm data and then when it's ready and it's beautiful and it's perfect, then we give it to whoever needs it.
00:19:26
Speaker
And then the other piece of it is more on the platform side. So for example, in partnership with the data engineers, we make sure that Redshift works. Basically, it's not slow. So we work a bit on back-end optimization, how to use store keys, distribution keys, how to optimize concurrency, for example.
00:19:46
Speaker
We make sure that all of our tools are connected to Redshift. They are not overloading the data warehouse. We play around with a lot of configs. We connect all the different tools together. So we do a bit of that data platform as well.
00:19:58
Speaker
Very nice, very nice. That's a good overview. I think it's clear that communication is always key to making sure what you're building is relevant and also being used and used to drive value at that. One thing would be great to cover as well is you mentioned obviously Redshift, DBT. What's the wider tech stack at Clio? What tools are you utilizing as part of your data platform?

Clio's Tech Stack and Automation Initiatives

00:20:20
Speaker
Yeah, so Redshift and DBT are, I would say, the main one. Sometimes they can have a difficult relationship compared to like DBT and Snowflake. DBT and Redshift, you need to be creative sometimes, make things work, which is frustrating on one side, but very interesting on the other side. You need to be resourceful. So for example, we create like custom materialization to solve these reliable oscillation error, which was like a big, big piece of work that I did when I was here.
00:20:47
Speaker
But yeah, of course, they began Redshift. And then we have COUNT, which is a basic... I'm not sure if you know it, but it's... Yes, yeah. I've listened to Taylor speak at the Analytics Engineering Meetup about it. It's a very interesting talk. It's very nice because it's basically a Marrow board on steroids with SQL and Python embedded. They did actually, they are doing a very good job.
00:21:10
Speaker
integrating with dbt as well. I don't think that is out as we speak, but it probably will be when we hear this episode. So have a look at it. It's very nice. You can connect your dbt core to it and you can basically import models into count and see them visually and kind of like debug it. So we use count as our, we call it EDA tool. It is basically a tool for us, but also everyone else to explore data before the actual modeling bit.
00:21:37
Speaker
And then I think that's it. That's the main four in the data space. Yeah, that's great. That's a nice overview for people to understand what you're really working with. So moving on, Andrea, to really, I suppose it'd be great to unpick some of the initiatives and the projects that you guys are working on, because it sounds like you've got a great
00:21:57
Speaker
data culture here at Clio, your analytics platform and your team is designed surfacing the high quality data. So talk through some of the projects and initiatives that help you achieve this goal.
00:22:11
Speaker
Yes, one of the main things we realized, I would say like around six months ago, is that so there are never enough A's, but we were not enough in this case, because we need a lot of very good data analysts that were asking for a lot of data modeling, but not enough A's. So we basically needed to figure out a way to kind of like
00:22:35
Speaker
basically came up with more A's without hiring. That was interesting, that was challenging. So what we realized is like some of the more time-consuming part of our day-to-day was sometimes reviewing PRs. So we tried to automate a way to make that quicker for us and also for the analysts as well when they push things to our repo.
00:22:58
Speaker
So what we do is basically every time that someone raises a PR, we have two GitHub actions, for example. They score the PR. So they score based on two things. One is dbt. We check things like, is there a YAML file? Are there descriptions in the YAML file?
00:23:15
Speaker
are all the columns from your SQL in the young file? Did you attest? Did you respect all our coding standards, all of our SQL fluff, for example, that

Data Quality, Governance, and Integration Challenges

00:23:28
Speaker
we have implemented? So that's the first part of the checks. And that is important because now we don't really need to look at the PR. So that first pass is automatic. And then if it fails, then kind of like data analysts, they go back, they fix it, and then they release it again.
00:23:43
Speaker
Then the second bit, which is also important, is around the SQL itself. Sometimes, especially in Redshift, you need to make sure that your SQL is on point and you don't do inefficient queries. What we do is we grab the execution plan of the model that is being pushed, we understand it, and we apply penalties. For example, if you have
00:24:07
Speaker
specific operations that are very expensive. We flag it, we give you a penalty and we explain how you can improve it. Now of course getting a score of 100% is almost impossible because you will always have some minor inefficiencies and sometimes the inefficiency are actually unavoidable. Let's say that you want to
00:24:26
Speaker
explore the table based on a JSON array, you need to basically do a cross join. Cross join is a big no usually and that is like we have a penalty of like 40 out of 100. Sometimes you just need to do that and so we leave it in and we basically have a lookup table based on the penalty what you can do to fix it. So those are like the first automatic like checks and improvements that you do and
00:24:49
Speaker
And then on the other side, we also try to make it easier for people to deploy in dbt. So we have, for example, a script that when you have a model, it creates a YAML file for you. And then it calls actually an online LLM to get the descriptions as well. So basically we have the model and we give you a best.
00:25:11
Speaker
Hint, let's say, on descriptions, and then you can decide if you want to keep them or change them. But at least you don't need to write all the columns and you don't need to write all the descriptions. Because sometimes, like, created art, it's always in description. You don't need to type it in. Just give it to you. And what we're trying to work on now is kind of like being proactive on the SQL set as well. So like when you're writing a model, we are developing a tool that kind of like checks the SQL for you and automatically changes it. So something that is more efficient. And of course, you can choose if you want to keep it or not.
00:25:41
Speaker
Amazing. That sounds like a great project for automation and building out the frameworks for everyone else to increase the velocity in which they can do work and trying to remove some of them blockers whilst also having, I suppose, governance to your best practices. That's definitely one I think people can listen to and take some advice on implementing them sort of tests and that structure themselves. So are there any other areas which are at the forefront of work at the moment?
00:26:10
Speaker
Yeah, in terms of analytics engineering, the other thing is actually to improve our documentation because, of course, like in the Agile type of like manifesto, the documentation is kind of lost. But now we're at a stage where it's becoming a limit. So we're actually doing our best to document as much as possible. We are running a lot of training sessions for people in the business.
00:26:33
Speaker
to improve their SQL skills, their DBT skills, their Git skills to kind of make them as much involved in the data modeling process as possible. And then of course there is this big push on LLMs, both in our internal use cases and on the product. Nice, amazing, amazing. Well look, all of these projects are geared towards increasing the speed, the reliability, the quality of the data to the business.
00:27:02
Speaker
Why is all this so important to your data culture at Clio? I think Clio is the company that I've seen with the best data culture. Some of it, I think it's because our founder, Barn, is the next data scientist. So of course, that is a big push if it's coming from the boss. One of our main values is learn at speed.
00:27:27
Speaker
which is a more inclusive version of the original one, which was iterate through data, which let you understand that data is basically central at Clio. If you're making a decision at Clio is backed by data, we do a lot of A-B testing, we do a lot of analysis when we deploy feature.
00:27:47
Speaker
So it's actually super important for people to want to have data. And in this context, try to have data fast is basically number one priority. So let's say that you want to deploy a product feature. You want kind of like data driven decisions happening. Yes, but you also want them very fast, which is also why a lot of the work we do is around optimizing time to insights. So we want to get that time as low as possible so that
00:28:15
Speaker
the product like clear itself can move very fast but still being data driven because that's the trade-off usually because if you want to be super data driven sometimes you're moving slow so we want to try and kind of like optimize that yeah yeah so it's that trade-off between speed and the data insight that you can give the business and you guys trying to reduce that gap as small as possible okay very interesting well the podcast is all about
00:28:43
Speaker
I suppose hindsight perspectives as well and some of the biggest challenges that you've come across in helping the community with their journeys. So in your time at Clio so far, what's been the biggest challenge and how have you overcome it?
00:28:59
Speaker
I think we had a few challenges. I would say like when I started Redshift DBT we did have some issues there. I think trying to be creative and kind of like solve those problems that were not kind of like common knowledge yet.
00:29:17
Speaker
That was a challenge. Clio is, again, very fast-paced, so make sure that really every day, 8 a.m. you have your data was a challenge at the beginning. And I think the most difficult challenge at the beginning was that we were not in our phase, so we had to be creative in trying to optimize our time to basically support.
00:29:37
Speaker
support everyone but yeah till now we are in a much better place so the next challenge now is actually make a fun for everyone so like we want the kind of like analytics engineering mindsets to be a fun part of everyone's job instead of oh no okay i need to deploy this model but now i need to spend hours trying to do all of this by the book we want to make it like
00:30:02
Speaker
nice fun enjoyable yeah people are enjoying their work and the work that you guys need them to do then yeah it makes it better for them but also better for you because if they care about what they're doing hopefully the quality follows as well so that's a great summary of data at Clio some of the projects you're you're working on and even what's next for you guys but as an industry and as analytics engineering as a sector
00:30:30
Speaker
What are you most excited about and what's next for us in the community?

Industry Trends and Innovations

00:30:36
Speaker
Yeah, I'll give you three bullet points because everything is better with three bullet points. So I'm gonna say one last time and never gonna say it again for the at least for this podcast.
00:30:48
Speaker
So the first thing is going to be, yeah, LLMs. They're going to be everywhere, which is nice because I think that people will spend less time trying to solve technical problems. And they're going to be spending more time trying to be creative and solve problems in a creative way, which for me is great. So like having used Chagibiti, it really revolutionized the way I work. We are lucky enough that Clio sponsors the Chagibiti account for the company.
00:31:17
Speaker
So the way you work is just totally different. So before I was spending my time trying to get the first 80% basically, and then I didn't have much time to recreate, even find new ways. Whereas now you have a problem you've never seen before, you get 80% LGBT, which is going to be wrong, and it's not going to work. But at least then you can focus your time to that very important 20%. Perfecting that last bit.
00:31:43
Speaker
Which for me was great to see. Second bit is probably going to be this semantic layer. DBT is releasing theirs. I think analytics engineering did amazing in getting that standardization, but we still need that missing piece, I would say. That single source of truth in definitions and having that across the business.
00:32:07
Speaker
I think that is still not 100% there at Clio, but in most businesses. So I think that we are the kind of like a community are really looking forward to understand if we can use this semantic layer. We are looking at it potentially to implement probably next year. So that's going to be a big trend. And then I think there is going to be also trend in BI. I think there needs to be some sort of revolution in the way
00:32:33
Speaker
We actually do business intelligence. I'm not sure yet where, but I'm pretty sure that this kind of like new excitement around chatbots, it will spread to BI. I think that Tableau is trying to do something similar. Of course, we have ThoughtSpot that does again something similar. So I think there's going to be some interesting development there as well.
00:32:55
Speaker
Yeah, I agree with that. I think, obviously, self-serve BI was kicked off by Looker and that's what's really driving these data cultures now is putting data in the hands of the business and business people most of the time don't understand SQL. So if they can naturally ask a natural language question and get
00:33:17
Speaker
data-driven results. I think that's a really powerful future. I know there are tools already that are looking to integrate that, like Delphi. So, well, Andrea.
00:33:28
Speaker
It's been great to have your insight into what analytics engineering is at Clio and some of the projects that you're working on and the challenges that you've overcome. I think there's some really interesting points for some of the community to learn from and helping in automating some of their analytics engineering processes. So that brings us to the final section.

Career Advice and Interview Tips

00:33:48
Speaker
The final section is a quick fire round of questions. It's something we ask all our guests.
00:33:54
Speaker
It is really just to help the community and the listeners to understand and help themselves further in their careers. So the first question is how do you assess a job opportunity in your career and how do you know it's the right move for you? I think that's actually a very good question. I do always the same four things. I'm not going to give you three bullet points this time. I'm going to go with four. One more.
00:34:20
Speaker
First, because I work in AEM, like in data generally, of course the first step is to gather as much data as possible, like gather as much information as possible. There is not only quantitative information, but also like talk to people, like go to LinkedIn, reach out, try to see if you have any common connections, try to see if you can get someone to give you an idea, at least get like the vibe of the company you're trying to join. So that's the first bit.
00:34:49
Speaker
then I always tend to try and talk to the same set of people that I trust, that you know that they're actually going to advise for your best. Objectively. Objectively, because of course, the company you're living, hopefully, will try to keep you. The company you're trying to join, they will try to get you. So it's nice to have someone that is outside. And I try to talk to people I trust, which is not necessarily data people.
00:35:18
Speaker
talk a lot with my uncle, for example, which is not a data person. I tend to talk, for example, with my mum. I talk to my partner, for example, and then, of course, to some of my network of data friends that I tend to trust. Your recruiter as well. My recruiter. I talk to Harry as well, of course. Trust your recruiter 80% of the time.
00:35:43
Speaker
It's easier to trust when it's, again, coming to external. I've had quite a few people recently come, nothing to do with any of my roles, but looking for advice. I think you're right, you should talk to people that aren't necessarily in data, but I think the key to that is making sure that you've got all of the information because then you can
00:36:05
Speaker
effectively speak with that person and tell them, you know, the good points, the bad points, and then they can give you a good answer. If you don't have that information, then it's really hard for someone else to give you advice. So that would be something I'd add to make sure throughout the interview process, you're asking questions to get the answers that you need.
00:36:24
Speaker
And that is actually a very good point like another thing that i would try and do is just ask relevant questions like when it's your turn to ask questions during the interview don't just ask stuff that you can easily google like try to really optimize those five ten minutes that you have to learn something around the company i think like
00:36:45
Speaker
good questions that I get from candidates are the questions where I'm not actually sure about what to answer and that is a very good question. Whereas like sometimes people ask you what are like the values of Clio and just google that and you will know like don't waste those 10 minutes because they are super important I would say for
00:37:04
Speaker
Yeah, 100%. The questions and that knowledge is what you're going to go off of to make your decision. So if you haven't asked the right ones, then how are you going to know? But equally, I think asking the right questions can impress an interviewer. You know, if you're looking to understand their pain points, their challenges, what life is going to be like, how they've overcome them challenges in the past, that's the type of stuff that's going to be of interest.
00:37:28
Speaker
Yeah, so actually, I'm gonna say something that is a spoiler. Like the questions at the end of the interview, for me, it is still part of the interview. Like if someone is asking nice questions, then it's like a plus for sure. Yes. Yeah. Well, that brings us on nicely. Nice segue to best piece of advice for someone in the interview.
00:37:49
Speaker
Yeah, so like to ask questions. They're irrelevant. Try to be real as much as possible. Like I always appreciate people that you can tell are having like an honest and nice conversation, especially. So of course, this is like more difficult on the technical interviews. For example, clear with your life coding. And of course, you can be tense. It's not easy to be yourself when you're like coding, you're like under stress.
00:38:14
Speaker
but especially when you're meeting stakeholders or especially in the first interview when we do like the introduction to the our like ed-off like introduction to me as the area manager's the first interview try to be as real as possible don't pretend to be someone else don't answer like in a structured super rehearsal way if you want to have notes fine don't read them like if you get the impression someone is reading is always like
00:38:41
Speaker
A bit weird, I think. And then, which is the best advice, try to enjoy the interviews. Try to get them as an opportunity to learn, especially the live coding interviews that sometimes can be very stressful. Try to put you in the mindset of this is an exercise that I'm trying to solve with the interviewer and needs to be a nice
00:39:01
Speaker
half an hour, a nice hour of my time. And if I don't know, I don't know, we can figure it out together. So it's not like me staring at you while you code and judging you, but it should be like, okay, so we are in this together, let's try to optimize this one hour that we have. I think that's great advice. I think especially on the coding interviews, people put a lot of pressure on themselves and
00:39:21
Speaker
really it's just for the interviewer to see how you guys are going to work together and you know data more than I think many industries is all about problem solving so how you can approach that and how you're going to do that in the real when you accept that role it's a good indicator. So final question.
00:39:38
Speaker
If you could recommend one resource to the audience to help them upskill, what would it be? I think that it is full of very nice resources. Now, it's full of amazing courses on Udemy or whatever other type of online platform. I think that my advice would be before you actually buy something, like before you buy a course,
00:40:02
Speaker
Just have a look at the documentation of the thing that you're trying to learn, which is free and it's usually sometimes better than the course and it gives you a very good understanding. So my advice would be you want to learn something new, perfect. Just read the documentation of what you're trying to learn and then if you really like it and you think that you don't have the time to read everything, just buy a course. Just try to optimize that.
00:40:24
Speaker
Perfect. Well, Andrea, it's been a pleasure to have you as a guest. I've thoroughly enjoyed our conversation and the insight into what analytics engineering is at Clio. So yeah, thanks again for your time. Thank you. All very good questions, like an interview. Crazy. Well, hopefully it was a bit more relaxed than an interview. But yeah, it's been a pleasure to have you as a guest. And I'm sure it'd be great to have you on again in the future. Thank you.
00:40:54
Speaker
Cheers. Bye, everyone. Well, that's it for this week. Thank you so, so much for tuning in. I really hope you've learned something. I know I have. The Stack Podcast aims to share real journeys and lessons that empower you and the entire community. Together, we aim to unlock new perspectives and overcome challenges in the ever evolving landscape of modern data.
00:41:19
Speaker
Today's episode was brought to you by Cognify, the recruitment partner for modern data teams. If you've enjoyed today's episode, hit that follow button to stay updated with our latest releases. More importantly, if you believe this episode could benefit someone you know, please share it with them. We're always on the lookout for new guests who have inspiring stories and valuable lessons to share with our community.
00:41:41
Speaker
If you or someone you know fits that pill, please don't hesitate to reach out. I've been Harry Gollop from Cognify, your host and guide on this data-driven journey. Until next time, over and out.