Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Daniil Shvets – Data Science is more a business activity rather than IT image

Daniil Shvets – Data Science is more a business activity rather than IT

S1 E11 · Straight Data Talk
Avatar
50 Plays30 days ago

Daniil Shvets, CEO and co-founder of ASAO DS: Data & AI Consulting Boutique, previously led various companies' Data Science and Product teams. Daniil sat down with Yuliia and Scott to share his opinion on Data Science being a business department with appropriate data skills rather than an IT department. He explained why having 54 ML models in one of the largest retailers in the USA is the wrong approach. Daniil also shared his views on the biggest challenges in perceiving Data Science's role. We also touched on AI and the consultancy business while Scott made all possible relationship analogies. :)

Daniil's Linkedin: https://www.linkedin.com/in/daniilshvets/

Transcript

Introduction and Welcome

00:00:00
Speaker
Hi, everyone. It's three days to talk back. We're glad to have Scott back and it's me, Yulia. And we're kicking off today with Danielle Switz.

Danielle's Background in Data Science

00:00:12
Speaker
Danielle, thank you so much for coming and please introduce yourself quickly.
00:00:18
Speaker
Thank you so much for inviting. So yeah, I'm Daniel. Pretty much for the last so several years, I was head director of for various so of data science and products in various companies. Startups scale up, so pretty much help them to grow. And like overall, collectively, departments, which I led, brought several hundreds, millions of dollars to the companies in additional revenue.
00:00:44
Speaker
And around a year ago, i e to like together with some partners, we found the boutique agency, ah pretty much which is a consulting and implementation when needed, where we do help various startups and scale-ups to solve their data problems sorry so the business problems. this is not pretty much very thick important thing which I like very correctly to solve the business problems.

Data Work vs. Business Needs

00:01:09
Speaker
And like in most cases, there is some kind of data related solution to it, but not always. Okay, nice. Thank you so much but of the for the intro. Sosko, did you catch that several hundred millions of dollars? Yeah, that is a bold statement, I should say. Yeah, I would love to, you know, go deeper into that.
00:01:32
Speaker
and But as the business person, I think the the focus on the second part is more interesting to me rather than, yo yes, ah impact helps. But the this is you know one of my big bugaboos of late is that people talk so much about data work instead of the business problems, the business challenges, and how does data integrate into that? Because we want to do great data work But if we're not doing the things that matter to the business, you're doing data work for the sake of data work. And yeah know ah youly I'm surprised you haven't started to have like a little timer or a little thing for every time I say data work for the sake of data work. But ah but yeah, i'm I'm very excited to talk about it.
00:02:13
Speaker
so you know I'm i already about to push the button drums or something when you say this. like I'm on the edge to do that. ah But yeah, before we jump into you know our discussion about how we distinguish course data work and business work, in your case, Enel, I would want to ask you, and I want to tell a story which we had you know during the lunch.
00:02:37
Speaker
lanington before Big Data Lantern. Daniel, shared with Daniel that I heard, I talked to one person, ah did a science suite from one of the biggest details in United States, and that person shared with me that they have 54 machine learning models on production.
00:02:59
Speaker
ah helping them to convert their customers on the website. like There is a different model for ah those who already purchased. There is a different model for email marketing, so on and so forth. And I i was shocked by Danielle's reaction because I believe that this is the right way to do things and very sophisticated. And Danielle,
00:03:22
Speaker
prove that was wrong. So can you please tell me why that would tell the audience why having 54 machine and models is wrong approach? but So you remind first of all I will probably start with why do company end up with having 54 models because usually it's not that they establish a company, they have a plan, okay, a roadmap, we build 54 models and then we actually like ah finally start getting customers.
00:03:49
Speaker
So usually how this goes, this goes when there is some model, for example, it's a production model, so set of models. And the reason new case appears, it can be a new type of customers, it can be a new, I don't know, like the type of like item if it's, ah ah for example, the e-commerce, etc. And every

Challenges of Managing Multiple ML Models

00:04:08
Speaker
time building a new model is actually very simple, because you're just building like the new one,
00:04:16
Speaker
Coming with this production, you don't have to intervene to any of the general flow of what you do have. And therefore, you have some new type of customers and you'll say, ah our current model like isn't really a good fit for them. Let's make another one and adjust it. And you have another parallel model for this type of customers. Then there is another type of product, anything else, flow. And every time... Taxically, it seems a reasonable solution to ah build another model, because ah like otherwise you have to literally like take everything and make the groundwork, make the full refactoring, change the way how all the models operate, and therefore they end one by one, one by one.
00:04:58
Speaker
and obviously there is a problem with it because the more actually like models you do have in production which are different the more like effort expensive or like time expensive like money expensive it is to actually let's say redo everything and make it like a single general one so usually like when the company already like goes started going all this stuff it's like the further you like go the harder it is to turn So this is why at the end companies end up with so many models. Then you actually, why adding so many models isn't the best solution. like but There is no way why this is the best solution. There is, ah I think it's the 14th century ah English philosopher, Okam. And there is a principle which pretty much we all know like is the Okam's razor.
00:05:56
Speaker
which is like do not multiply entities. And pretty much having the 54 models is exactly multiplying entities. But other than the just general like mentioning why it is bad, I can go to very particular examples. First of all, all the data is trained, all the models are obviously trained on the data.
00:06:15
Speaker
And if you have 54 models for different use cases, then you have, let's say at average, every model has 54 times less data than your company has.
00:06:28
Speaker
and do Like an example, if you have like ah one ah model, whatever i would like what i would like to say if you have two models for males and for females, then and you have 50-50% distribution of your customers, then pretty much ah yeah out of all your events, let's say half is done by males, half is done but by females, and therefore you actually... 20 minutes separate.
00:06:51
Speaker
i single And obviously i wouldn't like it wouldn't be a secret for anyone that pretty much usually the more data a model has, the higher is the quality of the model. So this is the obviously, therefore, having so many models just literally makes it pretty like pretty weird because they're obviously common patterns. whatever, if you have one model for Europe and another model for US, s then still some of the features might act in the same way in the model and might affect the target in the same way. Therefore, you are actually just ah removing the information you do have from your flow.
00:07:29
Speaker
ah So this is one. ah Second, it's very, very, very hard to monitor because it's not that the models are just, ah you know, like taken and they're all usually they, something is changed in the model. There are some patches that usually if they're different models, they may have slightly different, let's say, sets of metrics.
00:07:49
Speaker
because etc etc etc and therefore at the end you end up with having what like 54 different dashboards with 54 different metrics and also from the point of view of the QA of this metric or this like of the models and of the metrics ah Everything is changed. Everything in the models is changed independently. The team responsible for one of the models finds a bug, changes the way how some stuff operates. Actually, this bug can be persistent also in other models, but they do not know about this, etc., etc., etc. So there is kind of this
00:08:30
Speaker
mis-synchronization means that sound ah it's very hard to monitor. There is no ground source of truth as well, because models use features. And let's say that so whatever one model, the analyst, when like fine-tuning the model decided that it's a good feature if a person is, for example, wealthy. And wealthy for this model means whatever, the annual income above $200, $1,000.
00:08:56
Speaker
And for the other models, people like fine-tuned it and decided that, oh no, power model is 250. And actually at the end, you're having different models, which do have different features. And this makes sense because pretty much for one model, like one definition is better for the other in other definitions. So semantic layers also needs more machine learning models.
00:09:20
Speaker
Absolutely. Obviously not to mention the infrastructure cost because I would say that train one model pretty much which is like ah just like larger is ah isn't as costly and isn't as expensive.
00:09:37
Speaker
as actually like training different models with different schedules and then some of the models so there is some bug in the pipeline and it gets ah not trained and no one actually understands it because if there is no particular alerts on this model for this particular case then it doesn't affect the overall metrics so when the top management product management etc as they observe it then they see oh pretty much the fluctuation is within the reasonable ranges. And going back real quick when you were talking about splitting up your data into different ways, you might still use a large portion of your data for each of these models, but the training cost
00:10:20
Speaker
And you're going to have completely different data sets. And you're going to have all this data that's propagated across. And you start to go, OK, one, the cost of production, the cost of storage, the cost of access, but also making sure all of them are doing best practices around data storage for security. And like all of these things, you have these teams and all the different ways of working.
00:10:44
Speaker
start to change as well. And so instead of having kind of more centralized way, you've got so many different teams that makes it hard to standardize across. And so then you start to have less and less good practices. You're not even best practices. You have more and more people that are deviating from good standard practices or that it does become a thing where, oh, hey, this model is informing us of this. And this model is telling us this.
00:11:12
Speaker
we can't cross-compare because the metrics are different and they're written in different languages and they're, you know, like all this stuff just becomes this, okay, and and then reporting that up, how do you report up? How is the model performing?

Aligning ML Models with Business Strategy

00:11:27
Speaker
And you just start to get into all these things where the person that's that's trying to consume what is actually happening, what's good about this model and what what needs, debt there's not any standard set for you to go,
00:11:42
Speaker
Do I know what the heck is going on? you know Or I have to understand 54 different models, or I can only understand five, and this person understands five. and So then they're reporting up further, and it gets more and more. It's that telephone game where it becomes purple monkey dishwasher, where you're just like, I have no idea what these people are saying. It's a Simpsons quote. Yulia was giving me the the goofy look. but um Yeah, exactly what you're saying is is it becomes a nightmare to manage, not just from the technical side, but from a company side to business side. so Sorry, Yulia, but you were you're going to go down a path. I have a question. Okay. I'm 100% on the same page with you, but the question is, in their case, 54 models were serving in different use cases.
00:12:31
Speaker
But how do you do that? like You cannot do that with a single model as far as I understand, or it's expected. like How do you get one single model to serve 54 use cases? oh so Okay, so this is pretty much pretty much lots of ways to solve this and that they should be combined. So one is like such a rare thing for data science called features.
00:12:58
Speaker
because pretty much if you have one model for whatever, I don't know, like eu yeah US, s another model for Canada and another model for UK, then pretty much if you have a pretty, let's say, good distribution of data, like it was them, etc, then you can have the country as a feature. And if everything else is, let's say, properly fine-tuned, then usually this feature, if it really is very different among a lot of different areas, then it will be important. And many such cases. Another thing is adjusters. No one said that it should be kind of one model, which is standalone, you know, like the request comes to the model, model tells you the result, whatever, like price, probability, etc., and you just send it to your front end wherever it is.
00:13:48
Speaker
It's actually more about the ensemble of models. So you definitely, for example, if for some cases you have the, I don't know, for some people you have in the e-commerce website, the one-step purchase, for some it's kind of two-step purchase. For example, you can have the second-step model if it's very different.
00:14:08
Speaker
But so it should be, let's say they should be ah different based on the stage of the person, but not based on the type of the user or its geography and and everything like this. ah This is one. ah Second, there are such things as adjusters.
00:14:25
Speaker
uh adjusters is something that uh having the overall is another model with the let's say and another set of features or with a different set of features from the model usually like much smaller which takes the result of the general model as the input and then adjusts it based on the importance of but based on the values of those features also certainly the same can going to come about of the inputs of this general model because you can actually like have the something which is inputted in the model as so the results of some smaller models which have a particular cases.
00:15:07
Speaker
So and many, many, many other solutions. So I'm not saying that it's very simple. You just have one model, put all the data there, and it's just magically get trained. Obviously, it requires ah quite a lot of fine tuning, quite a lot of, let's say, pretty elaborated cascading and that kind of making the special tricks to the models. But generally, generally yes.
00:15:34
Speaker
Generally, obviously, it's not about 53 models or 54 models. yeah This is a matter of domain boundaries. like it It really reminds me of, you know you've got subdomains and you've got those little things and those are your adjusters or your features. But if you're like, I'm a 1,000 person company and I have 55 domains, like there's something going wrong. you know You can have those as subdomains, you can have those as well micro things, but in general, you don't need to be splitting it down into every little thing because then your overhead becomes so huge. so you yes ah Also, like I think I would add another thing that so these things with having many models also might happen when the company is
00:16:21
Speaker
really doesn't have let's say the strategic data view and just for example there is like a like product manager there is a new product release and they just give a ticket for the data scientist hey we have a new type of product make model for them so this might happen when the company is very kind ah driven by product management, product releases, et cetera, metrics not based on overall the company's performance. But for example, when just there is a need to make this particular type, et cetera, therefore everything is like made of this, let's say micro decisions. And they decided to say, get tickets, make this model, make this model, make this model.
00:17:02
Speaker
Okay, this is really interesting, especially, you know, treating um this as a very monkey chop. This sounds like a very monkey chop when you just make models. But the question is, not the question, but this is what I realized while you were talking that to build a solid machinery and practices, put it this way, ah strategy and organization, the data science,
00:17:27
Speaker
need to go into the business and its weeds to understand which features can define this model and gestures, which features supposed to be the core ah for a bigger machinery model. Like it's really about specifics of the business. And, you know, so you cannot be a data science like a solid, let's put it, but you cannot be a senior data science specialist if you don't understand the business in reality. I could say even more that now there is a very big misconception, I think, among lots of companies. And this is the, let's say, distinction between those are the business half of the company, and those people are the technical half of the company.
00:18:17
Speaker
and let's say how it works but that business people have no idea what technical people are doing because they are whatever like some nerd sitting like in the computers and making something and the technical people have absolutely no understanding like why if they are doing it and even like what they are doing it because they just like they have the ticket as some person from product or from whatever thinks it should sound like when you speak data science-ish and then they try to say okay I will make what's written here I don't care about what happens next and technically let's say that maybe for some technical fields it is like this and people do not need to understand business and people do not really just have to do something I don't know
00:19:01
Speaker
But for data science, it's not that data scientists need to understand business, it's that data science is extremely business filled. Data science is a very, very, very business filled, which uses technical tools and technical stuff to overall solve those business problems.

Data Scientists as Business Value Providers

00:19:23
Speaker
Otherwise, data science is just, I don't know, what is it? People like doing something for no reason, and people just wasting company's money on no making go like the infrastructure go hot because of training models, which at the end no one needs because ah they don't solve any of the real pain points of the company.
00:19:44
Speaker
That's really interesting. Playing with pandas, you can call those people, you know, the ones that actually, you know, always see the pandas and zoom, but those want to always see the pandas in data warehouse. Also nice work. she Yeah, well, it it's it's exactly what you're saying is, what is the value of the work that you're doing? is and And a lot of data people see pure value in playing with data.
00:20:10
Speaker
And yet there's a significant cost to that and there can be things that come from that. But if you're not focused on the things that matter, I had a conversation with Aaron Wilkerson and he was talking about if you come to somebody and go, we discovered this amazing thing and it's not something that they care about.
00:20:29
Speaker
They now have to take a bunch of time to figure out, is this something I should care about instead of you were focusing on the thing that mattered to me? you know We're in an environment where revenue growth is important, but costs are a massive focus. So are you focusing on going into this new market that could be really valuable? you know you go to McDonald's and tell them Gen AI could be really, really valuable for you to, if you see what happened with open AI, McDonald's should open their own ah Gen AI service, not just leverage Gen AI. And it's like,
00:21:07
Speaker
What are you talking? This is not the focus of our business. This is not what matters. And so like, you know, that's a wild example, but it is a pretty common example instead of like, OK, what are we trying to do and why? And and that also has to come from the business side. I have this problem with a lot of the the the data people I talk to. Are they trying to transform the entire organization so that they that so that the rest of the organization can understand how to leverage data?
00:21:38
Speaker
or you know And so it is somewhat that leadership, but you have to have that idea of what ah what generates value for the company and what is valued because value is not objective. i I'm trying to hammer this into data people's heads. ah Value is subjective. like you know what What you value is different than what I value. And so like understanding what people do value is great for your career and it's great for the the data team, and it's great for the company in general because if somebody perceives it as value, they can also communicate that to investors that this is valuable and all of that. and so you know You just have this this um conversation with people, but like you know you where wherever you want to take it, but I'd i'd love to kind of wrap in
00:22:27
Speaker
how you as a data person, how you see other data people getting to what matters instead of just because most people are in a ticket taking organization when it comes to data. so so julia Ticket taken? A lot of taken. Yeah, that's that's what, yo if you're ticket driven, if your job is to close tickets, you know your job is to close Jira tickets, you're not adding really value to the company because you don't understand what you're doing and how that actually benefits anybody versus, you know, anything else. So I have a question like, you know, to kind of to reinforce your point and you might maybe even to come up with a question to Daniel. So you mentioned that you can see the jury ticket that product manager wrote down and you kind of jump into and building
00:23:20
Speaker
via ah the model. And this is actually super powerful, because what I assume, hearing from you, you don't expect product manager to do the job for you. You don't actually need the product manager. It's fine. Yeah, just like, yeah, yeah. My dog is barking a bit. And I'm just trying to find ways that she's doing good with while she's asleep. Yeah, she's just like barking asleep. So it's okay.
00:23:48
Speaker
Okay. We're very disturbing, but okay. Uh, so the question is, you sound like a person who doesn't need product manager in your chain of communication, uh, was a business and there is so many possible data product. I'm sorry, Scott, if I heard your feelings and the necessity ah of data product management and you know, overall, like how How do you, like, what made what what do you think about data side is talking to business, how to, you know, to establish the common ground, how to approach business people. How did you do that in your career, which actually, you know, obviously led to some career growth of yours.
00:24:34
Speaker
Okay, so pretty much I think I will start from the let's say m b early as a the pipeline and I will start from hiring ah because there is data science and there is data science. The two very different types of data science and most of ah ah the problems with data and data science in the companies is because like people actually like mess with those two and take one for the other. There is a research data science.
00:25:03
Speaker
which is about building beautiful models, moving the like algorithm like creating new families of algorithms, so moving to the boundaries of what's possible, et cetera. It's amazing. I mean, and the people who are doing this are really, really, really cool. And so they just really they are amazing scientists. They are amazing gri research scientists. And if there would be no such people,
00:25:29
Speaker
There would be no deep mind open AI, etc, etc, etc, because those are let's say tech heavy, like tech heavy data products, which really should be done initially by scientists, like the data scientists.
00:25:43
Speaker
and there is But there is only actually a very small like chunk of companies which do need such people. Because in the most of the companies, data science is not about research and data science is about business. so ah And in many, many cases,
00:26:03
Speaker
For example, people like the startup growth, they do not understand. like They say, okay, data science, I know I learned something. It's about neural networks, LLM is cool, AI is cool. And they make interviews ah with like with people interviewing. And actually, when people come and say, hey, I'm going to make you the AI LLM NFT featured spinner, they say, wow, so many fancy words. This guy definitely knows what to do.
00:26:31
Speaker
and Well, this is not the case like in reality. In reality, that like a good data scientist in a business company, in a company, which is whatever, like some surveys, which is like B2B2C, whatever, ill, let's say in a business, usually is first of all,
00:26:49
Speaker
like their own product manager. For example, when I hired data scientists ah in my companies, in companies I managed data science, so ah then pretty much ah for me, ah the most important was exactly this will say this product and this business approach of a person.
00:27:08
Speaker
Obviously, they need to know the technical. Obviously, they need all of this. But it was not about, hey, now tell me about this model, tell me about that model, and what is the loss function there? Because it's like, hey, you have to know how to use it. But if you forgot what's the loss function, you just can Google it. And this is okay. ah But this should be a business and product approach.
00:27:30
Speaker
And when the data scientists are not the kind of technically good people who come to companies because they love making new machine learning models and they love ML, but because the people who are really good in ML know ML, but they love business, they love bringing value, and they actually want to bring value with their knowledge of ML, etc.
00:27:51
Speaker
this is very different. And in this case, they are their own product managers. So for example, we had our own kind of data products in the companies I worked for. And it was like them it was managed by people in my team. So but it was not managed by some external product manager who also is responsible for, I don't know, some other kind of A and B testing, buttons, et cetera, et cetera, et cetera. It was managed internally. And those were data products. And ah this worked out. And this worked out really well.
00:28:21
Speaker
Yeah, it reminds me of the data as a product versus data products conversation. And data as a product is applying product management principles to your data practices. And product management principles are how do we deliver value? How do we sustainably deliver value? How do we ah you know continuously improve? How do we treat this?
00:28:43
Speaker
our function as a product instead of the the goal being to create a data product. The goal is to deliver value. And the best way to do that is via products. That's what we've learned in software. The goal isn't to deliver software. It's to deliver value and then capture that value back for your company. And the best way to do that is via product. Yulia has been, ah we've talked about this a lot because Yulia is very much a product person of like, how are you capturing value is so important, but ah kind of what Yulia was coming back to. Where do you see that missing when people are in their careers? Is it simply

Transforming Data Departments

00:29:23
Speaker
that they're not curious enough about this? Is it that they're, you know, you're talking about data science versus data science and it's like, is it, you know, focusing on the data or focusing on the science? Same thing I see with data engineering. If you're focusing on the data instead of the engineering, you're doing it wrong.
00:29:39
Speaker
if you're focusing on the engineering and applying that to data, right if your folks say but on the data science, it's more about focusing on the data and applying the science to it. um But like you know how how have you seen people thrive in learning how to do this appropriately? You you can use yourself, but like as well, you've brought in these people. ah you know You talk about starting from hiring, but if somebody's in an organization and their organization doesn't work like this,
00:30:09
Speaker
How do they start to get the wheels headed in this right direction? Yes. So technically, this is like part of things which we like helped and helped several companies. ah So like while like for example, or like one of the things which we do is when the company has the data science product department, which is pretty much dysfunctional, then we come and like start leading it to make the transformation.
00:30:36
Speaker
and does that know What transformation? First of all, sometimes, of course, the transformation requires that like letting people go. Because obviously some people are just ah not a fit to this company. Because lots of people went to data science because of their love of models and algorithms and this stuff. And this is what they want to do.
00:30:59
Speaker
Also, there is not much stake in the data science or universities and degrees in data science, etc. It's still very science-oriented, so people come out of the universities, people come out of the courses, etc. Usually knowing much, much less about actually what the business is, but knowing much better about, okay, how to fine-tune the loss in the model you will never see in your life.
00:31:23
Speaker
and ah but pretty much this is first like so first definitely letting people go and hiring the right people because it's like it's inevitable.
00:31:36
Speaker
ah Second thing is you should ensure the proper communication. And the communication should be both sides. The communication should be that whatever data scientists saw say like need to do something, they need to understand the full like business context of what they are doing.
00:31:57
Speaker
they need to understand why if they are doing it, what will it affect, how exactly would it affect what it affects, etc, et etc, etc. So data scientists should not start working on the model, having a ticket to, hey, like this model, they should understand why is it ah like, why do is this model needed, how it will be used, etc.
00:32:18
Speaker
Also, when starting like working on models or anything, first of all, what they should do is exactly the business common sense check, et etc, etc, etc. Because it's not a secret that right now, most of the job, especially for making models is about feature preparation.
00:32:35
Speaker
And feature preparation is purely business task, because pretty much you have the data warehouse, you have the features which mean something, and then you have to make a story. You have to make a story and understand what are the elements of this story, the story which would affect your target at the end, and then put this into into data and then formulate this in the terms of features. Also, you should like you should have the global context generally. This is about this.
00:33:04
Speaker
ah Also, they should be the other way around communication because the data products still they need to be like strong but strategic level. They should be like decided and chosen not by the short-term needs of product, but based on the general, let's say, strategic values and vision of the company.
00:33:28
Speaker
because as we talked about the 54 55 56 I don't know like if there was 54 models when we started our call maybe now it's already 56 so I don't know but but pretty much the idea is that uh uh generally speaking if uh The only thing which shows the data, et cetera, departments will be doing is ah either firefighting or making the artwork requests. ah Then they will be just they like propagating the errors, propagating the worst practices, et cetera, et cetera, et cetera. But they should have a say. For example, if they need to make some strategic project and they understand a business context, then most likely, if they understand a business context, still think that they need to make this project.
00:34:13
Speaker
then probably they have a reason to ah why they want it. And given that, ah they actually want to can tell it to the rest of the company, and they should have a say. So data science is not some kind of service. Data science is ah ah actually the same. And in many companies, it's a very, very, very, very regenerative. So I've seen that, yes, okay, you can do Like lots of stuff and lots of products and et cetera, but the proper model can boost revenue like 40% of already large companies. And, uh, I've seen it, I've made it myself and not once.

Adapting to Fewer Data Models

00:34:50
Speaker
So this is so why data science should be actually one of the decisions maker, decision maker. This is very we interesting. and And, and how, I guess the question is, if your company isn't in this mode,
00:35:09
Speaker
What can you I mean is it leave and find a company that is or is it like like if the business side isn't receptive to your data side. what What advice would you give? You mean to the employees, so to data scientists who came to company and then they are just doing tickets? I mean, like honestly, if you if you don't have the power or some power or negotiation power over the company, if you cannot, like if you try to propose it so and they say, no, just do your tickets, but then yes, of course, change the company.
00:35:45
Speaker
Because so definitely if you feel like you can do much more and you're just, you know, like getting to make these tickets, make this ticket and just be our like ah sir like cetera service when we need it, then ah well if you are like ambitious, if you are motivated, if you want to do more, then go somewhere where you can do more.
00:36:09
Speaker
and Sometimes it sounds easier. but yeah yeah its it's is's It's definitely very hard. It's definitely very hard. I'm not saying that you just go and find a job easily. I'm just saying that think about this and the idea that no, they will change one day. They will come to me and say, hey, do whatever you want. You have the cuff bunch.
00:36:29
Speaker
I'm a pretty much not sure this works, but I know many companies who have the recent kind of initiative which goes from not the top level in place, but from base, but this initiative sounds reasonable, it gets propagated and then the companies make change to the best. Wow. So there are good use cases you saw? There are rare, but good use cases, yes.
00:36:56
Speaker
Okay. Okay. So you need to initiate something. Yeah. Well, you need to be proactive and start from yourself. Yeah. No, this is for sure. So I mean, like, it's pretty, it's not the best idea to not to like something which you are in, not to do anything about it and expect it to change. Generally, not only in data science, probably in literally like every single aspect of our lives.
00:37:21
Speaker
Yeah, relationships and everything like that. It's just like, ah you know, it's it's the definition of insanity. It's it's like doing the same thing. And we're just expecting things to change for the better instead of affecting change. But yet it is difficult because if people are stuck in a job where it's not being useful, it's like, you know, you can try and chip away at it and chip away at it. But are you going to change your, as a single day as data scientist, are you going to change your entire organization? It's difficult. So so you know you i I kind of wanted to, you we were talking about yo the, how many models should we have? How do we figure that out? like I wanted to steer us back into that conversation because you know you're you're consulting with a lot of these companies now, Daniele. And so like how how do you think about having that
00:38:15
Speaker
a common sense approach to how many should we have or how do you start to go like how different are these models where we really shouldn't be doing that and and how have you seen if if you were to go into that company that's doing 54.
00:38:30
Speaker
Do you go in and blow out all those models from the beginning? Or do you find the most performant model and go, we're just going to do some adjustments? like How do you think about if you've already gone down the bad path, how do you come back versus you know starting from the good? right it's It's kind of that relationship counseling where you're like, hey, I want to stay with this person. I don't want to find a completely new new person. I want to fix what's been broken versus that you know Coming back to that, I know Yulia loves when I make ah relationship analogies. She just rolls her eyes at it constantly. that No, no, no. Last podcast, you were missing, so I made it for you. and I was like, this is God speaking in me. Sorry about that. yeah so but it's It's good analogy for, like you said, like it's the same thing. If if you're treating any ah aspect of your life, like it's just going to happen. like This is the same thing. ah When you've gone down a bad path, how do you come back?
00:39:25
Speaker
hello yeah So pretty much I think that so I also love the relationship analogies and ah therefore would say that so if you're going very deep into some kind of to like relate toxic relationship but you still want to keep it, then the worst thing is to make the baby steps back.
00:39:47
Speaker
And pretty much, you like you should be pretty much very like very straightforward and say and say, yes, if you already have 54 models, it's not that, okay, we'll try to reduce it to 33, 53, then we're going to make it 52, then we're going to make it 50, and then probably go further. No, if you're already adding this, you should admit, okay, like oh in this way, it's just going to go worse and worse.
00:40:11
Speaker
We need a full reform. We need to fully, yes, it should be kind of significant and investment, significant everything, but you should build the new set of portal, let's say, from scratch and not try to branch them from ah any of the existing because, yeah, you know.
00:40:28
Speaker
This model is not as bad as that model, therefore, we will try to use this model for everything. No. and you know This should be a pretty large scale, of course, having all the foldbacks. Of course, it's not the project which is done in one day. The refactoring of code, the further you went on the wrong road, the longer it takes, obviously. But still, yes, as a relationship, you should tell everything straightforward. What I don't like and what you want to change.
00:40:55
Speaker
and so So you need to change the naming of the podcast, giving in relationships. Yeah. Well, so what you're saying, you can name it dating and this is it. Dating? Yeah, I love that. oh That was your data. Yes, she is.
00:41:10
Speaker
oh but But it is that that also that tough conversation of there isn't a magic wand of like, hey, as a person, yes, I want to change in in a positive manner, but it's not like it happens oh over overnight.

Impact of Gen AI on ML

00:41:25
Speaker
You know, I had ah one one woman I was dating and and she'd get really, really excited. And one thing that I always loved to say was like, breathe. And she's like, that's like telling me to calm down. I'm like, I'm not going to be able to just shut that off because that's always my thing when people get excited. So it's the same thing of
00:41:46
Speaker
i wish I wish data people more often were like, oh, I can flip this from a zero to a one. It doesn't work that way. you That's the way data works. That's not the way anything in business works. That's not the the buddy of the approach. But actually, like I would say that you know it like sounds like now I'm selling myself, et cetera. But if your relationship do not work and you can understand that you cannot fix it yourself, you go to a family therapist.
00:42:13
Speaker
Yeah, you have an outside counselor. like you you yeah is if If your company like doesn't work, if your data like fields, data is for the company, it doesn't work and goes only worse and worse. And obviously you understand that from inside you will not have whatever like the enough expertise, people power to do this. You go to people like us. And this is pretty much which what companies do because ah there is someone who knows how to, let's say, make some bad relationships with your data.
00:42:39
Speaker
but and There's also an external party that isn't as involved. and so like A lot of people say they bring in a consultant. They already know exactly what they have to do, but they have to bring in that consultant to manage the politics.
00:42:55
Speaker
to be able to say, Hey, we already know we've got to do this, but we need somebody that could with that can, with some gravitas that can come in and have that conversation. And so sometimes, you know, you, you pay, but especially if you're paying like a big four or an Accenture or something like that, you pay exorbitant fees because, you know, even though you already know what to do or, um, you know, but yes, exactly. And, and, and so I, I won't make any more relationship analogies probably for the rest of this, this episode, Julia, but, uh,
00:43:25
Speaker
I do appreciate it ah so so but like I think we've covered a whole lot of of great ground here but like
00:43:34
Speaker
Do you have, a like so again, those incremental steps that somebody can start to take if they're if they've realized that they themselves have been going down that science first instead of data first thing? like Is it to reach out to a couple of of business people? or like how do you recommend Because you know sometimes in these very siloed organizations,
00:43:56
Speaker
how can they start to train themselves to be what you've talked about? like what What are the good steps for on an individual level, not just an organizational level? You're talking about the employees. like You're talking about the data scientists, not the management, not of the yeah and management, just the data scientists. First of all, when you're doing something, ask yourself why and for what? and If you cannot understand it, just reach out to someone who right now,
00:44:25
Speaker
And if you have the request from the product manager or whoever to do a certain model, just to work the half an hour session with this product manager so that so they tell you about, okay, this is why we need this. This is our business context and this is like, et cetera.
00:44:44
Speaker
um Secondly, do not make anything just because you know like technically it should be done like this and because I read it like this everything which you do if you don't understand from the like common sense business perspective etc why you are doing this then probably it's wrong because ah for example but about the feature engineering and e etc it's not that you know I will just take lots all lots of features and try to see which one of them work etc and just like you know dump all of them into the model and something will work out. ah This is the like other thing. And the third very important thing which is missing in those companies is the ownership. ah Because so when the model gets released, usually, ah whatever, like cause ah no, like only, okay, in good companies, there are at least a couple of alerts and metrics which someone from like from time to time, like, observe.
00:45:40
Speaker
But there is no ownership of the model. There is no treating model as a data product. There is treating model as a ticket. And therefore, if you are talking about ticket, then your aim is just to close it. If you treat something as that product, then your aim is to see it grow and bring value.
00:46:04
Speaker
and this is very important and even the data scientists when they start treating themselves not as technical guys but as people with the business aims and business context, but who also have technical skills, and which enables them to really make those business changes, then it will be completely related to your parenting. This usually shifts the kind of the understanding of what you're doing and saving lots of time on doing the wrong things.
00:46:41
Speaker
That's really interesting. I want to interrupt Scott, you know, even I heard you talking in your head. So listen, and then the question is here, what I encountered as a product manager, ah how, like sometimes, you know, what tech, tech people like data science and in our work case can be quite resistant So what i'm I'm talking about, okay, i had um I had a jury ticket, you know, and the person sees it, and then they come back and poke me, literally poke me, oh, how this is going to help business. As a product manager, I'm not 100% sure this is going to sell all business and give us revenue. And I am, you know, and I'm sure is that product managers are not sure about that. They are still learning from their mistakes. They have no idea what is going to, you know, there are some signals better, some signals are worse, some some of the experiments. What I suggest from my perspective is to sit down and have a human conversation with a product manager and maybe in collaboration you together can come up with better features for
00:47:56
Speaker
ah for the model. You can come up with better approach. You can test something with, if approach, not really a machinery model, you know. And that this is what would be my suggestion, to just to have a human conversation.

Consultancy Value and Challenges

00:48:11
Speaker
And of course, I know we're running about to run out of time and I want to cover two two more topics. So Scott, bear with me. One of the... Just sorry, I'll just write to my next call that I will be late for a bit.
00:48:26
Speaker
Okay. Yeah. its Take the time. ah Yeah. I'm going to cut it out. Or maybe not. She won't remember. But but I think, Yulia, a lot of what you're saying there as well is the the question of what are we trying to accomplish? Yeah. Sorry. I'm here. Right? like what What are we trying to accomplish is is exactly what you're you're saying. So you you've got your two questions. Go ahead and throw out your two questions. Okay. Okay. So do you have the and Gen AI?
00:48:57
Speaker
AI, LLMs, all the fancy stuff out there to change machine learning. Like you you know you build your career in machine learning, so what? You're out of business? Or how do you feel? Absolutely no, ah because so first of all, ah that LLMs and AI, like AI, meaning gen AI, obviously, because everything is AI.
00:49:20
Speaker
ah They are extremely powerful. I mean, it's something tremendous. It's a very kind of shift to overall of what humanity can do, but it's and very powerful in a very particular kind of niche tasks.
00:49:37
Speaker
And in lots of companies, for example, we have ah clients coming like potential clients coming to us saying, but we like can you do for us AI for fraud prevention?
00:49:53
Speaker
and fix that transaction from prevention. And we're like, no, it's not about like GenAI, it's not about LLMs, it's about the machine learning. And for many other things, such as the recommender systems, many, many, many, many, many other, there is machine learning. So definitely GenAI cannot replace it. You you cannot replace ah the recommender system or some kind of scoring model with a very, very smart chatbot. You cannot.
00:50:19
Speaker
oh But ah what AI does is AI helps data scientists to build their ML models much much much quicker, ah which is so which is also very very important.
00:50:34
Speaker
because like otherwise like making some data product took lots of time and now it takes it takes much less because people are like people don't have to spend too much time on actually doing you know this like routine coding etc etc etc and exactly AI is what makes data science much more about the business and because Obviously everything about the very particular, whatever, like types, algorithms, et cetera, obviously a good data scientist should understand it and should know it and should be able to write it themselves. But so in most cases, they can just know what prompt to run to write to have this particular module and fill it in in the code, et cetera. They need to have the bigger picture. So what data scientists like right now need to do in the world of AI
00:51:29
Speaker
is understanding exactly the business context and seeing this big picture of how to make that product out of it. And then the very parts of the data product can now be initially quickly drafted by Samyang.
00:51:44
Speaker
Okay, this very reasonable approach and it's also much more cheaper, still it's cheaper to have machine learning model than leveraging all the fancy GNI things. There's a question that Scott was the author of the question and I bet but he forgot.
00:52:01
Speaker
So right now you take companies different size and do different work for them. It could be end-to-end solution or even higher rate processes, right which is really interesting. How do you guys attribute the value you bring as a small you know, service providers, how do you attribute the value? Because it's also different use cases. and ah Yeah, so obviously, like in use cases of ah was like building products itself, it's relatively straightforward. Because, for example, if we build something, there is there whatever like a AB test of our recommender system versus the previous recommender system. And yes, we see we brought like that many percent of revenue.
00:52:46
Speaker
So this is pretty simple. Then usually what shows that the ring value is that when we help them with one problem, they come to us with another business challenge. So I would say that so if ah we made some like that work.
00:53:05
Speaker
then definitely they would not return. And also like we understand ourselves. I mean, if we deep dive like into the data in the business, talk to them and then really see the problem and maybe see the solution and then help them to also see the solution and understand how to go to this solution, then pretty much usually it's a no brainer. So that we are telling something that, hey, you know this, you don't know what this is, just do this model, trust us, so it will be good. And they're like, okay, we'll try.
00:53:30
Speaker
It's so because we're helping solving business problems. Then we talk to them in their language. We're not talking data language. We're talking about the business language, so which helps them solve their business problems. and When we together with them decipher and go to the roots of their problems and go to the solution, this is ah but do pretty and understandable that you see that, yes, this is it.
00:53:55
Speaker
and together with them so it's not that we're saying I don't know like this is the black box I suggest doing this Now, it's interesting because you also don't meet any sabotage inside the company when you're talking to business people. But do you meet the sabotage from ah tech? Yeah, he definitely. Definitely. I would say that, so you know, that in like every large company, like large, I mean, like not to people start out, but every kind of scale up, et cetera, has itself its own Game of Thrones.
00:54:26
Speaker
and ah definitely there are different people with different interests and while we are the people who have no actually like opinion or anything we just want to come help and go because like we don't have any kind of long term plans we don't want to play it in such a way that in the end we will get the whatever like a bigger piece of the pie or anything like this um so but they're not Some people made some solutions and obviously they're in love with them because that's what they made. Some people just don't want to do something in the way because this would significantly they contradict their plans of like getting their own personal like venue and benefit from the company.
00:55:12
Speaker
So yeah, obviously we do meet sabotage, but usually when people come to us, so they come for value. So it's not that they come just to listen, the people come to us where they do have business problems. And when you understand that you do have business problems, then pretty much, I mean, if you are in real pain and you come to a doctor, then if the doctor says that you have something wrong here, you will not say, no, I do not. Okay. Okay.
00:55:37
Speaker
Makes sense. I'm i'm surprised it's not relationship related analogy. Scott, do you have any other questions? No, I think just overall, a lot of what you're saying is none of this is also going to be perfect. So exactly what you're talking about with sabotage, like don't expect everything to go right. This has been a big problem that I've had when talking with data people as they expect.
00:56:02
Speaker
either everything is right or everything is wrong. And if anything goes wrong, it means it's all wrong. and And a lot of what you're talking about is just like incremental progress, but you do have to think about how does this fit into the broader company? And do we have to tear everything down, right? Do we have to restart from the 54 models? Do we have to, instead of just slowly cutting them down, we want to jump like down a significant amount and and you know that's going to take pain and that fixing things takes pain. It's not just and yeah that that it's going to take effort. And so I think a lot of what you've been saying is just that kind of broader conversation underneath of
00:56:46
Speaker
You as a data person have to go out and have these conversations because in most organizations those people don't come to you Except for with a ticket. So if you want to understand what what are you actually trying to accomplish? Instead of what are they asking you to do? Those are two very separate questions And and that that seemed to be what you were saying throughout the whole thing. I didn't want to put words in your mouth, but is that kind of how how you think about these? yeah And also that the people who may give you requests, usually what do they want to do and what they tell you to do ah is like absolutely like two different things. And this is ah also pretty much a reason for many problems because they formulate it in the way they understand and you do it in the way you read it. And at the end, everyone does everything correctly, but pretty much if the fortune doesn't work.
00:57:36
Speaker
You know, did you see that meme that Jin and I are going to take our work and then in 10 situations, we're going to come back and ask humans to and decode the jury ticket? Yeah. So this is exactly, I think, where we are. So, okay, let's wrap up.
00:57:54
Speaker
Daniel, thank you so much for stopping by. It was a very engaging conversation. Love it. Very hands-on and best of luck to you and your venture. I'm sure you guys set for success. That is for sure with all your dedication, making business actual work with data. Yeah, thank you so much. Thank you very much for having me. It was really great talking to you guys. Okay, bye.