Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
013 - Demythifying ML Strategies image

013 - Demythifying ML Strategies

E13 ยท Stacked Data Podcast
Avatar
151 Plays1 year ago

๐Ÿ” ๐–๐ข๐ญ๐ก ๐ญ๐ก๐ž ๐ซ๐ข๐ฌ๐ž ๐จ๐Ÿ ๐ ๐ž๐ง๐ž๐ซ๐š๐ญ๐ข๐ฏ๐ž ๐€๐ˆ ๐ข๐ง ๐ญ๐ก๐ž ๐ฅ๐š๐ฌ๐ญ ๐ฒ๐ž๐š๐ซ, ๐€๐ˆ/๐Œ๐‹ ๐ก๐š๐ฌ ๐›๐ž๐ž๐ง ๐ฉ๐ฎ๐ญ ๐จ๐ง ๐š ๐ฉ๐ž๐๐ž๐ฌ๐ญ๐š๐ฅ ๐š๐ฌ ๐ญ๐ก๐ž ๐Ÿ๐ฎ๐ญ๐ฎ๐ซ๐ž ๐š๐ง๐ ๐ฆ๐ฎ๐ฌ๐ญ-๐ก๐š๐ฏ๐ž ๐Ÿ๐จ๐ซ ๐›๐ฎ๐ฌ๐ข๐ง๐ž๐ฌ๐ฌ, ๐›๐ฎ๐ญ ๐œ๐จ๐ฎ๐ฅ๐ ๐ญ๐ก๐ข๐ฌ ๐š๐ฌ๐ฌ๐ฎ๐ฆ๐ฉ๐ญ๐ข๐จ๐ง ๐›๐ž ๐๐จ๐ข๐ง๐  ๐ฆ๐จ๐ซ๐ž ๐ก๐š๐ซ๐ฆ ๐ญ๐ก๐š๐ง ๐ ๐จ๐จ๐?

๐ŸŽง In this episode of The Stacked Data Podcast, we dive deep into understanding the strategies for implementing ML/DS effectively and avoiding common pitfalls that lead to a lack of ROI and missed opportunities.

๐Ÿ‘ฅ Join me as I chat with Irina Ioana Brudaru, the Head of Data from Element Insurance AG. Irina has had an incredible career, including a stint at Google; her passion for this subject is infectious. Tune in to uncover insights into:

โžก๏ธ What is ML & AI, and how are they linked/differentiated?

โžก๏ธ How can a business determine the right time to incorporate machine learning and data science into its operations?

โžก๏ธ Signs indicating a company is ready to bring in ML expertise.

โžก๏ธ Strategies for building a team and infrastructure ready to utilise ML/DS projects.

โžก๏ธ Common pitfalls when launching DS/ML capabilities and their consequences.

โžก๏ธ We touch on some of the latest tools that are breaking down the barriers to advanced ML like Infer

โžก๏ธ Balancing immediate gains with a long-term vision in developing ML products.

๐ŸŽ™๏ธ Tune in and Iโ€™d love you to give us a FOLLOW to hear all the great conversation we have coming up!


Recommended
Transcript

Introduction to Stacked Podcast

00:00:02
Speaker
Hello and welcome to the Stacked podcast brought to you by Cognify, the recruitment partner for modern data teams hosted by me, Harry Golop. Stacked with incredible content from the most influential and successful data teams, interviewing industry experts who share their invaluable journeys, groundbreaking projects, and most importantly, their key learnings. So get ready to join us as we uncover the dynamic world of modern data.

Importance of AI, ML, and Data Science for Businesses

00:00:34
Speaker
With the rise of generative AI in the last year, AI, machine learning, data science has been put on the pedestal as a future and a must-have for businesses. But could this assumption be doing more harm than good?

Implementing ML and Data Science Strategies with Irina

00:00:47
Speaker
Today I'm joined by Arena, the head of data from Element,
00:00:51
Speaker
We're going to be uncovering strategies to implementing machine learning and data science effectively, and also uncovering the common pitfalls many get stuck in, leading to a huge lack of ROI and missed opportunities. First off, Irina, it's great to have you on the show. How are you doing today? Thanks, Harry. Wonderful. I'm very happy to be in this conversation with you. Don't be close to my heart. Brilliant. Well, it'd be good to get a brief overview, Irina, of yourself and your background and how you got to where you got to today.

Irina's Data Background and Experience

00:01:21
Speaker
I'm a, I'm a nerd. I've always liked to say I'm a Swiss knife of data and a full blown nerd. I have a computer size background, so I'm probably engineering background with some research on top in large data. And, oh, shall I go into the long or the short version? The short version is in the last years, I'm doing consulting and roles as head of data where I'm taking care of the data engineering, data science, ML, and data analytics. I have worked.
00:01:49
Speaker
Since I graduated in three countries and that is Germany, the Netherlands and US and I've also worked for six years for Google. So I also know some marketing in case I need to moonlight us something else.
00:02:01
Speaker
Otherwise, outside my free time, I'm actually a mentor and coach for people who want to switch to data.

Mentoring and Coaching in Data Fields

00:02:08
Speaker
I'm part of the Berlin Mentoring Club, but also part of a couple of other programs for coaching and mentoring, and I really like it, actually. I have a very senior head of data science that I'm coaching, which is a good exchange for me as well.
00:02:24
Speaker
That mentor relationship at Cognify, we run a female mentoring program. Mentees, obviously, they're the ones that typically are seen to get the huge value from a mentor. But what we've heard and seen, mentors also get an incredible amount of value from just seeing different perspectives and hearing how the other challenges that other people are doing. It's great to hear that you've had so much experience in that space.
00:02:49
Speaker
So yeah, let's

Differences Between AI, ML, and Data Science

00:02:50
Speaker
dive in. Today we're obviously going to be talking about machine learning, data science, and the strategies to really implement them effectively. I think first of the audience, it'd be really good to understand what is machine learning, data science, and artificial intelligence? How are they linked? But I think more importantly, how do they differentiate? Because sometimes these terms are used interchangeably.
00:03:13
Speaker
It's very, very true. So I'm going to start first to define what artificial intelligence is, which is the theory and development, I'm looking at an official definition, right, of computer systems able to perform tasks that normally require human intelligence. I think this is an easy definition in that sense.
00:03:34
Speaker
with examples as visual perception, speech recognition, pattern recognition, decision-making based on pattern recognition and translations, of course. So this is AI trying to think, to shorten the definition. Computer methods that do similar things to humans.
00:03:54
Speaker
And if I go up down the hierarchy, where AI is the biggest field, we can talk about machine learning. And machine learning is a field or part of the AI area that gives machines the human-like capability to learn and adapt through statistical methods and algorithms.
00:04:15
Speaker
So what differs from AI is more that the subfield of machine learning tries to learn from the data or from the things that you feed to it in a similar way as if you want humans to. So learning from the past. To predict the future. Exactly. That's what you want at the end of the day. And machine learning is actually also part of data science, but where
00:04:42
Speaker
you know, part of it. The algorithms train on data delivered by the data science in order to learn further. So data science is more descriptive and machine learning is more predictive. Data science is the field that, if you want, takes a look at the data and studies it, tries to understand how to extract meaning from it, and basically where data science stops, machine learning starts.
00:05:08
Speaker
I hope this is a decent definition. I mean, you can find and complexify them all over. Yeah, no, I think that's great. So it sort of starts with data science, progresses into machine learning, and then that machine learning develops into artificial intelligence. Yeah. And if you consider the type of data more complex than the one that the data science is using, like, let's say pictures or voices or recordings of audio or video, then it goes more towards the AI.
00:05:38
Speaker
Amazing. So look, many businesses I know, you know, they look to start data science projects as soon as possible. There was obviously the hype of data science back in 2016. And I think there's still some legacy of that.

When to Adopt Machine Learning in Business

00:05:51
Speaker
So how does a business determine when it's the right time to incorporate machine learning, data science, AI into its operations? Because I've seen a lot of horror stories of where they haven't seen the return on the investment because when they've started.
00:06:06
Speaker
Depends very much which department we look at that uses smart decision making. So I would say that regardless of the business model or the company, you will always have some methodologies that you can apply in the marketing team and in the CRM team always, kind of regardless of what you're doing, gaming, e-commerce, insurance.
00:06:28
Speaker
At the end of the day, you can still do a lifetime value formula or customer segmentation that you can feed back to the targeting algorithm. You can bid on the difference that it takes to move a user from one bucket to another. So in that sense, what I'm trying to say is that we could see ML implementations very early if applied in the teams like marketing or CRM or customer acquisition or customer support.
00:06:58
Speaker
And if we're talking about the actual methodology related to the product that one is offering in the startup, it's a bit of a different beast. And I think this is where lots of people get stuck. The moment you can do an implementation for ML from a business perspective is, first of all, when you need it and when the data is clean. Because if you do not have clean data, then you outbook the wrong decisions. And I've also seen that in the field.
00:07:28
Speaker
So I would say that, I would also say that the German ecosystem that I have seen in my personal experience is having a hard time implementing ML at the company's strategy level. So you might see some ML in something, in engineering, or you might see it in the data, or you might see it in the marketing. But what I feel, and this is maybe why ML fails the way that you see, I see the lack of strategy at the top regarding this approach, which approaches make more sense.
00:07:58
Speaker
And when you say at the top, is that from senior data leadership or leadership externally as well? Because that's quite sometimes a big battle that we see. You get executives who maybe they've heard the latest buzzwords, right? And they then sort of look to push these challenges. So yeah, how can data leaders, I suppose, deal with pressure down or to get these projects started when they're maybe not ready? And how as a data leader, do you know when's the right time to pull the trigger, so to speak, on these projects?

Benefits of Machine Learning for Business

00:08:27
Speaker
I mean, what you want to do with machine learning is either speed up the decision time or save money or make more money, right? These are the applications of what you want to do or spend money better. And I would say that in the C suite, regardless whether it's a VP of data or it's a chief marketing officer or it's a CDO, they should know the value of data.
00:08:51
Speaker
Sometimes what I see is that people at that level don't trust themselves to say, oh, I want a new segmentation. So I would say the lack of creation of a good strategy is also due to the lack of knowledge at the top.
00:09:07
Speaker
You know, maybe they're, they're too senior in the C level. Maybe they are not that they're skeptical. They just do not know or don't believe that it could have any benefits in the company. They don't see it. So I think data literacy would be a necessary skill for the C level and more complex one. Yeah. Okay. Perfect. So, you know, what value is really possible with machine learning and data science projects? Could you provide an example as well of a situation?
00:09:36
Speaker
Yeah, of

Challenges in Machine Learning Predictions

00:09:37
Speaker
course. So I would say that if you are a consultant and work in the ML field in the data science, it's very, very hard to promise an output. And I think this is the hardest part in any business to have that ML people
00:09:53
Speaker
and really try to pitch a beautiful project and show value. You can't promise. It's not deterministic. So this makes it harder for planning. Of course, there is a gut feeling involved, but to do this inability to know where you're going to fail or not, then yeah, I would say that this is also the lack of maybe buy-in from management or
00:10:17
Speaker
Even the ICs, the individual contributors who are actually doing the implementation, probably also might figure out later that they cannot necessarily prove value. So it's always a risk to go into that direction and have a very expensive headcount doing that kind of work.
00:10:35
Speaker
In terms, because I mentioned data literacy, and I'm thinking of a program because we have to implement it in our company as well. And I'm thinking, what could be a really good education that's very holistic about data to the higher high level? The best one that I have seen is actually a German lady.
00:10:54
Speaker
who built a card game. It's like a Monopoly card game. Instead of Team Blue, Team Green, you have Team CRM, Team Marketing, exchanging different types of data like product analytics data, marketing data, customer center, and then you collect them. This game shows you what kind of things you can do in every department with which data.
00:11:20
Speaker
And it's trying to be as exhaustive as possible. That's the kind of thing that I would give to the C levels. Because then they have a list like, oh, in CRM, I can do this, this, this, and this, and this. In marketing, I can do this. For retention, I can do this. For churn, you can do this. So to be clear, that would sort of giving them a list of all the different types of projects that you could do and how they would impact the business.
00:11:44
Speaker
Yeah. In order to measure how they impact the business, I mentioned a few paragraphs before that you can influence it in time. So the time to insights dropping down or in ROI, right? Either as a loss or as a win, hopefully a win.

Successful ML Applications: Examples

00:12:00
Speaker
Have you got any tangible examples of projects that you've implemented or your teams have implemented and how they've been able to transform decision-making or business process?
00:12:12
Speaker
So one of the nicest projects that I have seen, but also the most complex, was a matching problem, which is, for example, applied in dating or in gaming in order to pair users with similar seniority in the game. It's a matching algorithm, ELLOS scoring. So that's a very beautiful one. I did not touch it, though. Where I have...
00:12:38
Speaker
been hands on, which for example, last year I did retention analysis and what impacts the retention with hundreds of features. So this was very nice because you, what's it called? I chose XGBoost and I started playing with it. I literally forgot to close my laptop. It was still 9 PM. I was still outside in the garden and I was writing the details of the model. So even at this age, and I still can get very excited and forget about it, but forget about time.
00:13:08
Speaker
Another topic I really, really love is causal inference because it allows you to measure, let's say, things, entities that are not as easy to measure through an A-B test or just are impossible to do so. So, for example, we in the previous company was a travel provider, but we didn't know how to evaluate the lost revenue due to strikes, right? And you have all of these
00:13:31
Speaker
contracts with the risk providers and insurance, so that in case, like, I don't know, a certain type of ban has a strike. Joke, joke for the company because we had that. Then you need to provide your estimated loss to the insurer so that you can receive the money back based on this. But how do you estimate it, right? So it was a time when there were a lot of troubles in the travel industry. So
00:13:57
Speaker
I thought that the best way to measure is actually causal inference, which is a method also used for brand lift, for marketing lift, to give you this little delta whether you have done an improvement in your KPI or not. So this was the first application and I also gave like a little intro 101 to causal inference as visual as possible and as no code as possible. I don't like to make complex slides like a general audience.
00:14:24
Speaker
It was so nice that people started asking like, oh, can I use it for, not for strikes, but for, I don't know, it is Snowden and the entire trans, Alpine trans something is not working. What is the impact as well on our business so that we can do something?
00:14:45
Speaker
Or what is the engagement score in an application when you have migrated part of it to React and whether it's different from the engagement before? So you can even apply the engineering level, which is very nice to measure this. How does this bring money?
00:15:05
Speaker
I mean, first you have a more exact forecast of your loss in this particular case, and you can ask the exact money and say, hi, I have stats on this one, so give me the money. On things like turn retention drivers and whatnot, this is harder, right? This is a very static insights giving.
00:15:27
Speaker
And the implementation of those findings would be done in conjunction with the product department, because maybe some features are on purpose if they are this way. But maybe you can redesign the funnel as a result of what impacts retention, so in order for more stickiness.
00:15:45
Speaker
Amazing,

Importance of Clean Data and Infrastructure

00:15:46
Speaker
amazing. So, I mean, the things that also stood out there, as you said, like the marketing and CRM, that was sort of some of the key areas where you can sort of start to deploy machine learning really from quite an early stage. But, you know, are there other certain sort of prerequisites or signs that indicate, you know, your company is ready to start taking on machine learning and bringing on some machine learning expertise?
00:16:12
Speaker
I don't think it has to do with seniority. It has to do with the quality of your data. And I said in these very operational departments, you can start thinking about it much earlier than maybe you are thinking about it for other departments. So, I mean, the marketing and the CRM part are very crystal clear, but when you apply it to
00:16:32
Speaker
an e-commerce or a subscription business, then you need to add on top of that for subscription either RFM model or survival analysis in order to do segmentation. I would say sky is the limit, but you still have to have some data to start any one year of data.
00:16:47
Speaker
So the core then is really, and this is something that we see a lot and it refers back to that garbage in garbage out, but it's having the appropriate infrastructure in place with the data that is in an appropriate position to be able to run this analysis. Otherwise, it's not possible.

Transition to Dynamic ML Systems

00:17:04
Speaker
And also, the way that many companies have implemented ML or data science today is actually really static. It's just a notebook, and they ran the model, and it's just observational. And that's it, and it usually stops there, while proper ML should be
00:17:20
Speaker
should be done in a way that you know how to put it in production, you need to productionalize it, you need to track it, you need to monitor it. And these are skills that move from data science, this kind of static observational model, to actual ML, where the model keeps learning from its past and does a better work. Like a recommendation system.
00:17:44
Speaker
For a recommendation system, you need more data than you needed marketing or CRM first. But if you are a travel provider or an e-commerce, you should have a recommendation system. A constant sort of feed of new data. Yeah. The algorithms teach itself.
00:18:00
Speaker
I'm pretty sure that if you look under the hood for Zalando, it's always learning. How does it know that I like a certain texture of my clothes and then it recommends the same thing in other clothes or in shoes? So that's pretty brilliant. We can talk more as well about what makes or breaks a male in a company. It's also where the data scientist sits. Does it sit with the data people? Does it sit with the engineering? Does it sit close to the business? Does it understand?

Centralized Data Science Teams at Cool Blue

00:18:29
Speaker
the business problems it's trying to solve. Have you seen the most success then? I mean, on that, because there's so many different models that flow around for structuring your team, what do you think has been the most effective? The one that I have noticed is the most effective was a centralized DS team in the Netherlands at the company called Cool Blue, which is a media-marked electronics e-commerce.
00:18:56
Speaker
And these guys were like PhD guys, four or five of them, super geeky, floating around the company and then helping them with certain projects that were tied me to a certain area. So that's where I have seen it more successful because it was obviously with a strategy from the top, it was separated from the other data teams.
00:19:15
Speaker
fully independent on their own, and they would just be assigned where the juicy projects are happening. So like a task force, they really were a machine. Like the cool glue at the time when I was talking to them had a data maturity that I can't even explain to you, probably the highest I've ever seen in my life, if the maximum.
00:19:35
Speaker
Really, really. So they had sort of like a centralized specialist data science team who would then be deployed on whatever projects, you know, marketing, gaming, suppliers, forecasting budget. Almost like an internal consultancy then, would you say?
00:19:54
Speaker
Yeah, that's exactly how they worked. And they were independent. This is also what's very nice, because I don't know. When you have a lot of work, sometimes machine learning engineers or data scientists are put to do analytics work. And this type of organization, the centralized consultant, I would say protects you from doing... They showed it. Yeah, protects you from doing just dashboards. I mean, if I hire an ML person to do dashboards, my heart is breaking for them because that's not what they're supposed to do.
00:20:21
Speaker
Yeah, okay. That makes sense. So that's sort of, yeah, as I say, internal consultancy, it shields them from their external pressures of other responsibilities creeping in. And that's hugely impactful for, you know, why data scientists, machine learning engineers look to leave organizations. It's because they're missold an opportunity or when they get there, they're not doing the work that they're doing.
00:20:44
Speaker
And I think that comes back to this data quality as well. Many data scientists spend much of their time in data modeling, transforming, trying to get data into a condition, and they're not best positioned to do that. They're maybe not as experienced doing that as, say, an analytics engineer.
00:21:02
Speaker
Yeah, exactly. What I see as well is that ML or data science people prefer Python far more than

Python vs SQL: Data Scientists' Preferences

00:21:10
Speaker
SQL. And I'm an analyst at heart. That's how I started. So I like to use SQL until the last moment of aggregation and prepare my data frame directly in BigQuery.
00:21:21
Speaker
And for example, even when, to give an example, you have an interview for a job and you get a file and you're supposed to do some analysis on it. It's just so easy. Push it into MySQL database and do some counts and do some summaries. And then you have the result faster than if you would write the same code for that function in Python.
00:21:46
Speaker
So not only that, but a road to success based on the methodology or the language that the data scientist is choosing. If Bison is very long, the road, you write a lot of code until you figure out what's going on in your very large data.
00:22:04
Speaker
So I would say this is also something that maybe management cannot necessarily bless or understand the time it takes to do a certain project.

Simplifying ML Processes with Tools

00:22:12
Speaker
And this is why, I mean, I mentioned other people and other tools. So I might as well, I'm a big, big fan of a tool from your country, from UK that has this ML SQL. So you write one line of SQL, which does ML modeling, very similar and maybe even more simple that the way
00:22:31
Speaker
BigQuery ML natively does that. And yeah, imagine writing causal inference or having a ton of, let's say that you have the country as a feature, right? And this software knows automatically to the transformation into columns. I don't need to worry about that. I don't want to write anything. And this is, I would say that
00:22:49
Speaker
my gut feeling in the future for a successful ML, one needs to deploy things faster to iterate to when the moment you've reached a certain point where you've proven that your approach is bringing something to the company, speed or safe sense or whatnot. That's when you should put it in production and you continue to work and iterate on it, but put it in production first.
00:23:12
Speaker
To start showing some value, I think that's one of the other things that makes data science, machine learning projects, harder to especially get off the ground to start with. The time to value is a longer road and sometimes business economic pressures. I mean, projects get canned before they're even
00:23:28
Speaker
finished. I think being clear with your stakeholders as to when they can start to see value and yeah, focusing on them, smaller wins, getting something out, which just shows value is really, really great advice because otherwise, you know, you can never get to the final stages if you're in something that's too long.
00:23:45
Speaker
I mean, imagine I'm a manager and I get a data scientist who works for a whole month on a code of something that he cannot tell me anything except that that package didn't work, that methodology maybe didn't yield statistical significance. That means there's nothing tangible for me when I manage someone like that. So yeah, one has to think a little bit more strategically. Like, okay, they have a very long project and I need to also give them something small that proves value in order to have their back covered, right?
00:24:13
Speaker
I think that's a great piece of advice. And we never got a name of the tool. You mentioned the tool. It's called Infer, like causal inference, but just Infer. Really, it's one line of code and does XGBoost and also transforms all your features, all your countries, flattens them out. Amazing. So it gives you sort of the power of machine learning with only a few lines of code.
00:24:37
Speaker
Yes, and if you really want to be a statistician and really say, oh, no, this tool, I don't like abstraction under the hood, I want to look under the hood. You can put all your parameters there and knock yourself out. But it's just more of an approach. It's feeding into what I was saying that ML needs a more iterative approach to proving value and not writing a lot of Python code and maybe going to these tools is a better way. I'm a huge fan of that.
00:25:06
Speaker
or another tool that's more used.

Recommended Tools for ML Flows

00:25:08
Speaker
What other tools and platforms would you recommend for businesses which you've had really good experiences with? Then one that I really like and it's covered in the Netherlands, and it's used mostly in the bio, chem, bioinformatics, is a platform called NINE, K-N-I-M-E. It's from Switzerland or from Germany. You can build ML flows in the same way that you build a digital pipeline.
00:25:35
Speaker
So you literally see like little squares and this is your input. It reads from the database and this is the clustering and this is the productionalization and another table and this is going back to your database putting it all back together.
00:25:47
Speaker
But that one is, I mean, I really like it, but it's a little bit clunky because it's not used for IT, but it's for free. So you can still take advantage of all of those pre-trained, pre-coded pieces. I think the audience would be keen to understand here, what are some of the common pitfalls when launching a data science project, machine learning capabilities, and what are the consequences of them? I think we've touched on a few of them, obviously, data quality,
00:26:16
Speaker
not being able to show value in a quick enough time. Are there any other common pitfalls that you've seen in the industry and how would you tackle them? That's something actually that I encountered at Google. I was working with two packages of causal inference because my love for causal inference though. The question that I was analyzing was a business question, which was, how does enabling smart bidding
00:26:45
Speaker
by a client influence their spending? That's a beautiful business question. To translate it into research and getting your data done is not easy. I was using two packages. One was the causally impact by the guys in Zurich. This is a team from Google who wrote the libraries in R and Python. And another package that was used by the Econo statistician group at Google, the Hal Variants group.
00:27:15
Speaker
And the two, it was so obvious, the two sides of the coin of ML. The guys in Zurich being like PhD researchers, it'd be like your hypothesis has to be super clean. Your data has to be super clean. You must be able to swear on your ancestors' souls that there is truly no other change happening in the time interval that you are turning on this model in the world or with all of these accounts and spending. Of course, everyone cannot guarantee that, right? And also you have different sides of the world with different maybe
00:27:44
Speaker
periodicities of spending and buying ads. While the business people, the economic statistician business people would be like, okay,
00:27:54
Speaker
You know, maybe it's not the most perfect hypothesis, but it's the best we can get. And we do get a little bit of statistics. So I would say we can link that with this. So it's, it's the approach of being I worry tower, PhD level, machine learning built like waterfall model versus, you know, knowing when you got value, you push it out and then you continue working on whatever you need to work further if you want to.
00:28:19
Speaker
These are the two mindsets, and it also depends very much, again, the data scientist or ML person needs to be closer to the business because of this gap. Yeah, I think that's clear, isn't it? Whether it's an analyst or data scientist ML, when you're answering a business question and that's the end goal of what you're doing, you need to really understand what that business question is, why it's having impact, and I think why it's being asked in the first place is,
00:28:48
Speaker
is a really important piece to understand because you can then follow down the road of building something that's not going to be answering the true question as well. So I think that understanding is why is this question being asked is really powerful as well.
00:29:04
Speaker
Does it answer your question or? Yeah, no, I mean, it was just keen to share any other ideas. I really know that, you know, any other areas which you see, you know, people make mistakes, whether that's leaders or ICs who, you know, are actually building the code, you know, are just keen to try and help other people not fall down any sort of common pitfalls. I think we've covered quite a few. But yeah, if you've got any others, we'd love to hear.
00:29:31
Speaker
Other things, maybe know when to stop the improving of the project and maybe you need to jump something else. Give yourself a certain deadline. For me, the causal inference problem at Google, I think I gave myself three months to have results. The first month was building the data to customer support emails. It was a global collection of data.
00:29:55
Speaker
And then I gave myself, okay, we need to reach that. The moment that I had the data frame, I told myself, please, I have to have statistic in three weeks, or maybe I need to go to another project because otherwise this is my OKR, right? Other things, it's for my recommendation for data scientists and machine learning engineers.
00:30:15
Speaker
when they join a company to ask very, very clear questions about what is a stack? Am I going to code in a notebook or am I going to code in a way that is actually productionalized and how will that be? Try to understand if the company has a certain ML strategy, as in like, do they see for the next two years that they will apply certain methodologies? Do they have work for you? Because this is, again, this is the issue that I see a lot is that maybe they
00:30:43
Speaker
The company hires a few scientists and has some work for them, but not long term. So I was asked like this long term and also ask the infrastructure, because if you have a PostgreSQL database, I would say probably you cannot put models in production as easy, like it's a bit more complex.
00:31:01
Speaker
That's great advice. That's great advice to really understand what you're going to be stepping into in a company. I think that is so important. I would also ask, what do you have today in production? Where do you use ML today? Oh, you do customer segmentation for marketing. How do you do it? Because if it's in e-commerce, I want to hear RFM. I don't want to hear K-Means.
00:31:22
Speaker
because this is a continuous by till you die. The type of clustering depends on the business model. I've seen as well this mistake being like the wrong clustering methodology just to have a clustering. That's a great point because I think that's something that's often done. You have these projects being kicked off because it's the cool thing to do is without a long-term view of how is it actually going to impact.
00:31:51
Speaker
Some of these projects don't start with the business problem to solve, but they start with methodology I'd like to try, or that we need to build a chat GPT or a semantic interface to our querying system. Those are less important than having a really good ML strategy.
00:32:12
Speaker
Sometimes I would say that the most senior leadership in data should actually try to make a list of the business problems that are hard to solve and then see if those are solvable by ML and not try to engineer an ML problem just so that you can say you hired someone and they're working on something.
00:32:31
Speaker
really, really relevant. I think the things that come through Clear Irena is to be able to start this journey of creating a machine learning strategy. First, you need to have a data strategy to a data platform, high-quality data, and somewhere where you can be able to hire these people and give them an environment for them to thrive.
00:32:53
Speaker
marketing and CRM are probably two of your quickest wins where you can start to implement and show value. So they should be your focus points and really make sure that when you're starting these projects, you're understanding what questions they're answering and why they're answering them and showing a quick time to value at least on some of the smaller pieces, which then buys you time essentially for these bigger
00:33:20
Speaker
projects which take longer, but it can be more impactful. Have you got anything else to sort of summarize before the end? I was just thinking, what was the first time I used ML at work?
00:33:35
Speaker
It was maybe two or three years after I left university and it came from the CMO. He was like, you know, we have no idea the customer value that we acquired. It was a gaming company, so they needed to buy a new number of customer acquisition on amphetamines, you know.

Impactful First ML Project Experience

00:33:55
Speaker
And in order to be on plus financially, you need to acquire customers that have a higher customer lifetime value than your cost of acquisition. And this was the first time I was like, oh, this sounds really nice. I would really love to code in this. But there was no tool back then. So I built it in R. Now we have software that builds the same thing. What did I want to say? It's exciting. It never ends.
00:34:21
Speaker
The film is constantly evolving, isn't it? The last year there's been some huge steps. There's still a long way to

Future of Machine Learning

00:34:30
Speaker
go. I suppose the final comments before we jump into the quickfire round would be, what are you most excited for in this space?
00:34:37
Speaker
Ooh, like all the always changing new libraries, easier to use, less complexity. I mean, in the past I coded everything in R, scripts and scripts and scripts. And now you have libraries that did what I used to do with raw code in the past. So what gets me excited is that I think that time to production is faster in the next years and I hoped. And the time to innovation can then be even quicker as well.
00:35:04
Speaker
Yeah, like maybe, you know, have an average model here, but on plus, and an average one there, and an average one there. And all three models perform better, or they do some learning. That's okay, your eggs are growing in these three baskets. And then, you know, when you have time, it proves something that maybe is a lower hanging fruit.
00:35:22
Speaker
Brilliant. Well, Irene, it's been great to talk about the machine learning, data science strategies. And yeah, I think there's been some real valuable

Job Seeking and Interview Preparation Advice

00:35:30
Speaker
lessons. We're now moving on to the quickfire round, which we ask every question, every guest, sorry, the same questions. So first up, how do you assess a job opportunity in your career? And how do you know it's the right move?
00:35:45
Speaker
in my career or in a junior role? What's your advice to people looking for a role? I think you touched a bit on the bits around asking questions, which was really relevant. If you are an individual contributor, I would say find out what makes you happy, whether it is the tinkering or whether it is the leadership. And if leadership is not for you, don't take it for the sake of power. Focus on being excellent at the area in which you are in.
00:36:14
Speaker
And if you are in a mail, really, before you join any company, do diligence for your own health and peace of mind, because especially in these very tough economical times, everybody needs to prove value and you need to be safe that you have a role that is included in the whole strategy. Yes. Amazing. And what's your best advice for people in an interview?
00:36:43
Speaker
Oh, don't be afraid to say you don't know and move on and whatnot, but prepare. Please prepare for your interview. Great advice. Yeah. I think being humble about it and saying, you know, when you don't know something and then you can always have a punter as well. I don't know, but if I was to guess or if I was to have an educated guess, this is what I would say. That shows intuition. Yeah. I think just last couple of weeks I had some candidate interviews and, you know, we have a
00:37:13
Speaker
A spectrum of questions going from, let's say that this is a topic X, but I'm going X minus one and X plus one. Adjacent skills. So if you don't know this one, I'm going to ask you this one because you probably know this one. If you don't know this one, don't be afraid. Just say, it's okay. No, we're on to the next question. There's no point in feeling nervous. Perfect. And final question, if you could recommend one resource to the audience to help them up skill, what would it be?

Curiosity-Driven Learning and Skill Development

00:37:40
Speaker
or Sarah, I guess, or your own curiosity. Find a problem at work that's supported or not supported. Find a manager that supports it, not necessarily your own, and practice. And don't be afraid to take projects that maybe you don't know the value of it yet. There's always a risk, but the gains are also so nice when it actually works. That can be ML models or the data field as a career.
00:38:08
Speaker
Amazing. Hunt out the problems and try and try and answer them. I mean, this project I was mentioning to you about Google, I think my boss actually at the time did not believe in it, but I had a global director of sales who really wanted it. I was like, no, no, no, this has value. Like, I don't care. I'm going to work up 6pm, but this has value. And when I prove it, then I will get all the freedom I want.
00:38:29
Speaker
Amazing. So yeah, the prime example, if you think there could be some value, find someone else that agrees, get them on board and go out and do it. You might fail, you might succeed, but you've got to try. Yeah, exactly. Perfect, Irina. Well, look, it's been an absolute pleasure to have you on the show. Thanks for your time and sharing your insights into ML strategies. Thank you, Harry. I hope to come back one day and go into more detail, maybe. Yes, yeah, that'd be great. Thank you, everyone. Bye-bye.
00:39:01
Speaker
Well, that's it for this week. Thank you so, so much for tuning in. I really hope you've learned something. I know I have. The Stack Podcast aims to share real journeys and lessons that empower you and the entire community. Together, we aim to unlock new perspectives and overcome challenges in the ever evolving landscape of modern data.
00:39:22
Speaker
Today's episode was brought to you by Cognify, the recruitment partner for modern data teams. If you've enjoyed today's episode, hit that follow button to stay updated with our latest releases. More importantly, if you believe this episode could benefit someone you know, please share it with them. We're always on the lookout for new guests who have inspiring stories and valuable lessons to share with our community.
00:39:45
Speaker
If you or someone you know fits that pill, please don't hesitate to reach out. I've been Harry Gollop from Cognify, your host and guide on this data-driven journey. Until next time, over and out.