Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
The Hidden Costs of Modern Data Stack image

The Hidden Costs of Modern Data Stack

S1 E13 ยท Straight Data Talk
Avatar
66 Plays2 months ago

Martin Fiser, Field CTO at Keboola with 8 years at Google, joined Yuliia and Scott to challenge modern data stack complexity and costs. He advocates for all-in-one platforms over fragmented solutions, highlighting how companies waste up to 40% of time on tool integration. Martin shared insights on US-European cultural differences in data approaches and warned against "development by resume" culture where engineers prioritize trendy tools over business outcomes.

Recommended
Transcript

Introduction and Setting the Stage

00:00:00
Speaker
Hi, everyone. It's Yulia and Scott back with Traded Talk. Today, we would like you to have um and our podcast Martin from Kebula. Before we kick off with the topic, and we're going to be talking about modern data stack. It's cost, it's hidden cost.
00:00:21
Speaker
and a very unique view Martin has on it. ah Martin, why won't you go ahead and introduce yourself shortly? Tell us, tell everyone, you know, what do you do and now what excites you about Kibola. Awesome.
00:00:38
Speaker
um Hi, everyone. um My name is Martin. um I have been at KevoA for eight years. There's a variety of roles, consulting mostly, head of professional services for let's say five years and a recent two years. ah My role is field CTO for America.
00:00:57
Speaker
and um
00:01:01
Speaker
I'm not a technical by profession or technical person. I'm my ecumenist business. And that's why I believe that um I always have this angle of trying to focus on on business and outcomes rather than the technical, you know, ah play, which is, I think, I think in modern leaders, especially in this, this realm.

Understanding Kebula and the Data Stack

00:01:23
Speaker
Anyways, Kevola, I will try to stay away from, ah from each in Kevola, but I will, I will say that Kevola is a representation or a off all in one stack, you know, a full-fledged platform, basically um supported all the data ah operations and automations within the company. So you could say it's, it's a competitor or a competing way how to do stuff with data ah to modern data stack.
00:01:54
Speaker
Okay, absolutely defensive already. Should I be? take and But anyways, listen, this is a nice approach because what you're saying basically is that um who are all in one platform is actually better than acquiring a separate solution for your special problems.
00:02:18
Speaker
But I felt like this has always been a problem because once you tied yourself to a single vendor, they started to abuse their power, which has actually happened for all cloud providers. And ah this is what gets you know clients in the problem. How do you feel about that? like How do you tackle this issue?
00:02:47
Speaker
ah Speaking about lock-in, there's always some form of lock-in.

Approaches to Data Stack: Pros and Cons

00:02:51
Speaker
People lock-in is ah is a thing as well, the same way, ah skill lock-in and tribal knowledge, which is a thing in modern data stack as well. so I would argue there is always some form of, let's say, a lock-in or a hurdle or to to switch um and to change shoes.
00:03:08
Speaker
But in a nutshell, I see that companies do or can approach data stack in three ways. um A cloud way, that means they just embrace you know cloud technologies like you know all the way from S3 buckets and virtual servers and all the SaaS versions of of of some technologies on AWS, GCP and such. You can basically do everything if you want in those clouds.
00:03:35
Speaker
it it has its own It has its own problems or specifics such as you have to be a large company, right? You have to have substantial amount of technical people and and money you know to pay ah to pay for them. um The other approach would be modern data stack, which is a prevalent thing in in the States, I would say. I wouldn't say it's that big in the Europe. At least this is what we have been seeing. ah and And the third way ah is all in one, all in one stack. And the all in one stack ah is built or sits on the shoulder of giants, right? So it's it's a layer on top of those cloud primitives, but it's surfacing them in a in a way that it's ah hopefully easily digestible or easily used.
00:04:24
Speaker
rather than having that technical complexity. So it abstract the complexity for, for companies. Okay. i'm i'm i'm I'm a little bit confused because um cloud cloud vendors as Google Cloud, AWS, Microsoft, they have all the tools and and and many, like for instance, first of all, Google BigQuery from Google Cloud is considered a part of a modern data stack.
00:05:02
Speaker
because when you're using in Google, cloud Google BigQuery, you can be using any other solution in modern data stack. So this is the reason why I was, I was literally, you know, ah Google in it and brick and breaded, Google BigQuery, Snowflake, Databricks, as part considering to be modern data stack. So aren't we, aren't you messing it all together, you know, in your era are here and in your by caps? Good question. I will tell you that Another alternative how to define modern data stack could be it's a data warehouse or analytical data warehouse centric stack where you have an analytical data warehouse in the center and then you have those technologies that operates on it. The
00:05:48
Speaker
Embedded problem with that is that every single tool that is using that incurs the costs on the data warehouse. And that's why it's fairly beneficial for BigQuery, Snowflake, especially, to support this modern data stack thing for the reason it's it's ah it's a consumption that is driven by all of those tools within within this group of of of tools.
00:06:13
Speaker
Yeah. the The way I think about it is there's the approach of just buying from your cloud vendor and it's not kind of an all in one platform, but you just buy all of your stuff from them or most of your stuff from them. Or modern data stack is an idea of mix and match and it's about getting the data. It's about kind of just, uh, ETL, even though it's a lot of times it's ELT and you know, it's about moving the data in.
00:06:39
Speaker
transforming it you know consuming it and then pushing it back out to something and that's where you know I hate reverse ETL because it shouldn't really exist but because the the stuff that

Business vs. Technical Considerations

00:06:49
Speaker
should be bringing in the data you should need another tool to just be like we have ah you know, cleaned all the data. Now we have to actually do a bunch of things to to put it back into the system. They should have better APIs and stuff. But anyway, it's about this cycle of moving stuff through, but it's about that mix and match approach versus I'm just going to buy all of these separate or I'm going to just leverage all these separate things from the cloud vendor.
00:07:14
Speaker
I'm not going to worry about going out to a bunch of different you know smaller vendors, these VC-backed startup type vendors. and then ah Or I'm going to do the the kind of all-in-one, and I'm going to buy something that isn't from the the cloud platform, the you know cloud vendor itself, but that integrates these capabilities, these these needs into one offering. So I don't have to mix and match and and figure out which of these Am I going to specifically choose and how am I going to choose to integrate these and how am I going to pay for these and all of that? So like the modern data stack is about basically from my perspective, it's about buying from five vendors versus buying from your cloud vendor or leveraging an all in one platform on your cloud vendor. Like those are the three different options. That is the way I see it. yeah Even.
00:08:08
Speaker
It gets a little bit more complicated because for instance, our platform, ah you can actually buy us through the marketplace and utilize your commitment. So you know enterprises who have commitment on on GCP or elsewhere, they just jump on us and use the the consumption driven by us as a commitment as a part of their commitment. So it gets muddier, right? But um i get I get what you're saying.
00:08:37
Speaker
but that that's ah that's a ah purchasing factorctor now But that's factor. about like like literally that That's that's the thing that the the data teams typically don't care about. So um I want to kick it to you, Yulia, but literally, like,
00:08:52
Speaker
That's a a business thing versus the the data thing and that's where it gets really muddy is where you start to go like, how does the business think about this versus how does that data team think about it? Shouldn't business consideration be the most important one? um you know To me, it's the most important part.
00:09:11
Speaker
Well, the business consideration is the most important part if they understand the technical aspect. If they do not, their consideration doesn't work might not work well, isn't it? Partially, yes, but prevalent things that I have been seeing um across the board, especially in the States, is that, um and now I will upset some people, but um Data teams decide on technologies and such because they want to play, everyone wants to play, right? we We all have to tinker, of course, but all the consideration of, okay, how does it scale? How does it scale with the team? Do we have to scale the team alongside of it? And now we are approaching that Cosbet, the risk for the future, the risk of not having people on the market or overpaying people on the market and stuff like that. so
00:10:01
Speaker
you know Having technical people purely purely those to decide on ah on the technologies and on the general approach towards the data stack, that's a risky piece. I still believe there needs to be a strong um a strong leadership decision, you know, former sea level C CIO, CTO type of type of person who is hopefully zoom out and, and, and, oh and a little bit out, outside of this, of this tinkering, uh, which, you know, um, where they can decide what is the best for the company rather than what is the best for few people in the team who like to, yeah, I don't know, spend the time in YAML.
00:10:47
Speaker
I have a question. Oops, spend the time in yellow. Look at you watching. Okay, so the question is, I've really enjoyed the part where you emphasize it. Today modern data stack would be considered some data warehouse treated it as data lake in the center and then we layer all around like um we add the layers of different tooling ah to make it look like a modern data stack, right? And um in essence, ah ah even and even Google Cloud in case of Google BigQuery in the center is interested ah for their customers to add these layers because they consume Google BigQuery compute. Yeah, this is how it works. And I i you know i think I
00:11:40
Speaker
I cannot emphasize more to all teams around that when you purchase a solution, the total cost of ownership ship cup comes with the Google BigQuery compute as well. and Well, I'm not entirely agree.
00:11:56
Speaker
Or maybe I agree. I'm not sure yet,

Cloud Provider Strategies and Cost Implications

00:11:59
Speaker
but what I'm trying to say that what I tend to see was from Google ah Cloud, AWS, and Azure at some point, is that they are not pushing back these vendors. They are inviting them to the marketplace. But also what they do in parallel, they're building internal solutions Similar to what they ah yeah yeah so in essence how would that look like okay so do you want to buy it from us as a single platform no you don't feel comfortable okay go and burn your commit on Google Cloud Marketplace and those tools connect you back as you know a portion of your money still. They will yeah they will get the money either way, right? A hundred percent. Yeah. yeah This is called co-opetition because it's cooperation. Yeah, it's cooperation and competition. So you you cooperate with these vendors and you compete with them. And so essentially what what the cloud vendors are wanting to do is, you know,
00:13:03
Speaker
um create Create a centricity of gravity for your data because once the data is there, you're not pulling the data out. you know They make a huge way more money than people think on data transfer.
00:13:18
Speaker
It's not even data storage. It's just the data movement costs. And so they're going to make it very, very expensive for you to pull your data out versus inexpensive to get your data in. And so, but yeah, it's exactly what you're talking about. of they they They do care and they don't care, right? like the Although I did see this with AWS where AWS was doing some very, very shady things where um you would be working with a vendor um through their cloud marketplace or even just running a vendor software on your EC2 instances and they detect
00:13:54
Speaker
what was running on there, which they shouldn't have been looking at that, but they'd detect that and then their salesperson would try and sell you to their solution. So oh it ven vendor beware because it's it's a very, um the companies don't have that much scruples and a lot of salespeople have zero scruples. And so yeah if they can get that information, they'll do that. but But it is this thing of at the end of the day,
00:14:22
Speaker
if you're paying them, they don't really care if you're paying them for 80% of what you would pay them for their own service just because you're paying some of that little bit of money to the vendor and you're still using all the money on the compute and the storage stuff in their ecosystem. Except um what they care about is the stickiness. What they care about is the traction and and and the stickiness of the solution. And I believe that um that's becoming more and more a factor. And that's why you see some, how do I call it? ah Some dynamics out there that are on the market and that you see that, you know, suddenly, you know, companies start starting to cooperate more with particle vendors and such. And you can see the shift that is happening as well. Snowflake does that as well, right? I feel like stickiness for them might be factor, but hey, like,
00:15:23
Speaker
they will get the they will get that money anyways uh you know and uh i think you know yours if i feel that correctly you're you're you're um looking at it from ah from the angle of okay you store the data up i i would argue that yes data storage yes it's important of course but i would argue that the compute is the the thing that everyone scrapes the most most money right like oh yeah oh yeah yeah london day and they store so Sorry to interrupt you because storage is just commodity while the premium comes from the compute. Except nowadays with the lakes. It's the data movement. yeah Google kind of collapses that into that, but the actual transfer costs, if you look at AWS and where they make their margin,
00:16:10
Speaker
their their costs have been essentially the same on transfer costs for the last 10 years. And so the that the actual cost to them of that has come down an order of magnitude, but that's where they make their money and that's where they really charge you. It's not even the compute that they charge you this huge margin on. they make more they More of your bill is in compute, but where they make their money,
00:16:37
Speaker
It's kind of like Best Buy um for folks that don't know this, is it's a retailer in the US. And I used to work there and they they would um you know they would sell TVs essentially at cost, but all of the accessories, all of the charge, the cables and all of that stuff was at 85% or 90% gross margin. Golden plated.
00:17:00
Speaker
Yeah, they they'd buy these things for $2 and sell them for $20 or $30. And that that's where your cloud vendors are. are And that's why, if the more that you kind of dig into it, they're like, oh, they really try to have this gravity around this. But anyway, but yeah so Yulia, if you've got a different question, I want to go with yours. But my my question here would be,
00:17:23
Speaker
ah you've got Where do I go? How do I go with it? Well, after I ask, then you go, no, let's go in this other direction. But like we we kind of talked about this of there is the data team making the decision versus the business aspects making the decision. And I think you wouldn't have technical first if you're still focused on the the business side, but there's you know business focus as to what are we actually trying to accomplish with this data.
00:17:53
Speaker
when you're talking to companies. i I tend to think of the modern data stack being technical focused, like a little bit like what you said, but are when you go and you're talking to people and are you competing against modern data stack or not,

Competing in the Modern Data Stack Market

00:18:08
Speaker
right? Like, is it that you have to be there at the very beginning because they've already picked three pieces and then they come upon Kabula and go, oh, should we throw out these three pieces? Or like, how do you think about that that conversation? Because there is that process of
00:18:25
Speaker
we've already We already have these contracts. We've already sunk money into this. like I think it's easier to rip out the existing cloud infrastructure stuff that's just um you know a Google or an AWS or an Azure offering. like How do you think about those conversations and how do you kind of not necessarily how do you bring somebody over to your side of the thing, but like how do you get people to start to think about the costs and then how do they communicate those costs internally? Because the finance people look at the bill instead of the total amount of cost that you that your people cost was to ah
00:19:02
Speaker
You know, uh, have that bill be 10% higher, but your people cost is 30% lower because they're not wasting all their time building custom connectors. It's like, how do you think about having those conversations so that your.
00:19:17
Speaker
people that want to go away from the modern data stack and move towards that all at one ah perspective, like if somebody's listening and they want to do that, like how do you think about having that conversation of total cost of ownership? Yulia talked about the total cost of ownership conversation when stuff in data, the costs are already kind of squishy, right? You're like, ah, how much was directly allocated to the the data work versus not versus, you know, all of that. And especially the returns are very squishy.
00:19:48
Speaker
I'll try to remember all of those small stuff. Oh, do not try to. You go announce for whatever ah whatever you remember. That's fine. Okay, good. So look, I actually did a reply to one person on LinkedIn in regards of modern data stack today or yesterday. and It made me to realize your modern data stack in a concept is not about okay those technologies that they work all together. The technologies and I would say frameworks or utility out of that. It's about companies working like that. It's about that business model that works all together. That's what I what i believe is a representative of the modern data stack.
00:20:33
Speaker
but the technologies are there and that will always be be there. Like the concepts of, let's say, dbt as a very influential framework for for transformations or let's say concepts of data contracts, let's say some concepts of data quality testing and and stuff like that. This is there. Okay. And it will be there. But what I would argue is that We don't move or we are actually utilizing similar frameworks in the same way, but within an encapsulated all in one stack that has sealed all the
00:21:08
Speaker
let's say, issues that you see with that discrete technologies being somehow, you know, put together. So we are we are using, for instance, you have the option to use dbt as the transformation backend in our all-in-one platform. Yes, you can use the same way, Python, R, open, refine, you can use our native SQL transformations, you can, you know, you name it. So we support multiple backends and we have done quite a bit of opening of the all ah all-in-month owner all inmon stack in a way that we actually are able to use those technologies. So when I have this discussion with ah with ah with the clients or with the prospects, we just discuss, okay, if you guys are not happy with how we operate dbt and you know ah there are multiple options, how you can operate dbt nowadays, right? So do you want to transfer this? Do you want to use this as the option to maybe
00:22:04
Speaker
you know in this bark that maybe disassemble it into pieces optimize it or let's just do lift and shift let's just operate it let's just orchestrate it by our platform but have all of those bits orchestrated at once so all the ingest and cut the cost and in ingest that we actually bring quite substantial while and then having ah what we call writers, but essentially activation type of components where you push data, you you you know, reverse ETL nowadays, very fashionable term, right? But basically a concept like those, but having orchestrated all of those bits in one umbrella and having all of all of that in um
00:22:43
Speaker
in one observability layer. This is something that we offer and this is something where we bring this ah as as as the one of the value of the platform. So when we have this discussion and we are getting more and more people coming to us basically saying highway we have this modern data stack applied we are using those five tools but we can't scale or we have lost half of them our team or we can't hire more people or we can't change a bit because it's super complicated to us and it's it's it's very hard for us to operate and then we identify bits how we can streamline this to you know help them to operate operate it better and sometimes we lift and shift and we say okay let's keep dbt as it is and we will just operate it and orchestrate it by our way but we will maybe
00:23:29
Speaker
ah allow you or enable you to jump into data acquisition, you know a different data acquisition strategy of our extractors, let's say that will you know save you costs and and operation. So I think um this is one i of the bed where for SMB type of customers, ah those are those typical discussions, especially in ah northam or North America.
00:23:56
Speaker
In Europe, it's completely different picture. No one knows about DBT, even like, you know, I'm i'm overestimating this or or exaggerating, but, but 100% any enterprises, you know, if you speak with C-level, no one. Right. It's, on class up it's a DBT, you are still not doing, not doing great job at Europe, at least.
00:24:17
Speaker
I mean, you know it's great. Some of our partners you know are doing great. we have Our tech team is in Central Europe. We have ah strong partners in Central Europe, and they are doing great things with DBT, but still on the enterprise level. you know I know Europe is usually um considered as being old-fashioned and and and backwards right and in many ways, but I would say that in the enterprises,
00:24:42
Speaker
no one sensible would try to bring you know a modern data stack concept into this hot mass of of tool ah tools and systems that they have all all the way from IBM Oracle and you know all the fun stuff and just bring this additional complexity to and nothing. you know ah They always want to streamline and simplify the operation rather than increase the complexity.
00:25:06
Speaker
Does it make sense? this no and trying to I'm trying to see something strange. It's modernization, right? like it's It's how do you actually think about modernization and the US is much more willing to throw things out and Europe is very much not.
00:25:23
Speaker
Right. And so like that transformation is is difficult because ah they don't completely kind of recycle the entire company and their approach.

Data Innovation: US vs. Europe

00:25:35
Speaker
Like the U.S. is is much more willing to do and and kind of the U.S. is willing to throw away a whale lot of things is from from what I've seen. I am also I have hypothesis that ah Europe is more conservative.
00:25:50
Speaker
yeah Europe more is more scared as well in terms of, you know, adding a new solution that my senior data and, you know, like GDPR, AI, yeah and, you know, there are also legislation around it as well in Europe, like United States, and um also so they are not this fast to innovate.
00:26:17
Speaker
Their mindset, if you like Europeans' mindset, is not that fast to innovate. They are moving steadily, I would say, further there but slower than the U.S. While in the U.S. people want to test things, to experiment, and and they are willing to throw money into it and lose it.
00:26:36
Speaker
Unlike in Europe, they are more cautious about it. This is how I see it. But, I think, ah scary dr just one bit, i just my observation is that every single time we speak with a European ah customer ah or prospect, it's more about, okay, let's compare costs and let's, in a cost-benefit analysis, the cost avoidance is a major player. And the in the US, it's about how much would it cost if we don't do this?
00:27:05
Speaker
So it's rather, okay, it's rather evaluating the opportunities that are valid in the cost. And I think that this is the major, you know, a difference between, uh, between Europe and us and oversimplified. Upside downside. It's, it's, are you willing to to make a bet? If you're not willing to make a bet, unless it's a sure bet, it's no longer a bet and your bets don't pay off, right? Like you're, you're, you're like, you can have.
00:27:32
Speaker
You can invest in in ah government-backed debt and you're going to get a small return, but if you invest in something that's more risky, it might not pay off, but in the long run, you know overall, that there's my finance background coming in. so That's why we see all those rusts being driven by states, yes. Yeah. Okay.
00:27:55
Speaker
Okay, I want your financial background you know to stay with you for a little bit, because I have a question. So, Martin, tell me, does Kibula connect, let's say, to Google BigQuery and queries it? or How is it happening? like Does it incur any additional cost for the customers?
00:28:15
Speaker
So we sit on top of data warehouse and either be it or mostly be it Snowflake or BigQuery. you know We are cloud agnostic, so we can actually deploy all three clouds. So but on your commitment and then based on if you go on GCP, you can still use Snowflake on GCP or you can use BigQuery as a backend on you know on GCP. And we have those hybrid modes. and you know ah more complexities, but yes, we do sit on top of that and it's basically the persistent player that stays, right? And then the whole infrastructure inflates and deflates. We are one a huge you know Kubernetes with bunch of Lego pieces. So we always, we always inflight, we run things and then we throw it away. And that's the whole strategy, how to maintain the whole, ah the whole stack.
00:29:01
Speaker
its like question is Okay. Okay. Does this like a piece comes at the expense of, so for a client or you ah have it on your instances? It depends. We have we support all modes. So we support multi-tenant with our own our own provided storage. We support multi-tenant with but what we call BYODB, bring your own database. And we support single-tenant where we deploy the whole platform within customer infrastructure, be it a subscription or but ah you know ah VPC or you know whatever is naming terminology in ah in respective cloud.
00:29:39
Speaker
Now I'm coming with my sort of questions. So why do you consider kimball on it to be a part of modern data stack? Like you are on our shift place. You are sitting on top of the data warehouses. You infer the cost for the glass. Like it's still, you know, because well why, why, why is modern data stack?
00:30:04
Speaker
Because we use all all of those bits are homegrown and we use utilities or features or ah frameworks that are so-called established in the market. So yes, we use dbt. Yes, we use.
00:30:20
Speaker
JupyterLab as a way how to do workspaces in, you know, in Python, for instance. But we have all of that that we we deploy and such. We don't merge different providers of those technologies, you know, into one. And I know that there are on the market, there are some all-in-one stacks that are basically just um joining or bringing together a bunch of subscriptions you know across different different providers, we have everything homegrown. So um that's what I believe, it's it's it's a it's a major ah major difference. And there was one bet that I luckily forgot, so I'll hopefully get still. Yeah, there was one one more aspect.
00:31:01
Speaker
i've I've got i've got but an answer as well for this, which is if you look at like Matt Turks, the mad land in landscape yeah you know um analytics and and ML thing of all the different companies.
00:31:18
Speaker
It's that you're not you don't force people to have to grab one from each of the different categories and things like that. You've got that. If somebody wants to slot that in, you've got the abstraction on top of it. But like all in one platforms, you know Databricks has the same thing. You can drop in and and and replace their own stuff. You have these things of you have the focus is on the abstraction. And this is this is something that frustrates me from the software engineering side that hasn't come over to the data side at nearly as much, which is um it's about focusing on the work instead of focusing on the tool. It's about focusing on what you're trying to achieve and that workflow exactly when you use the word orchestration a lot.
00:32:02
Speaker
um the actual combination between these tools and making sure that once you update one to a new version, it doesn't break absolutely everything. It just doesn't have that. And so people are focused on jumping from tool to tool to tool to tool to get their work done. And so you have this context switching, you have these break points that are very, very easy to get broken and everybody is managing them themselves or you're paying for an I-pass, you know, an infrastructure provider as a service, but it's really just connectors where you go, I'm going to pay for this connector between these two tools because I know these two vendors don't work well enough together where I can guarantee that they'll still work if I use, you know, it's this idea of you go in and you say,
00:32:58
Speaker
If you want to bring in this other stuff, great. But we've got everything here that you might need so you don't have to bring in things to actually get your work done if you want to focus on your work.
00:33:10
Speaker
this This is exactly what we focus on, to be honest. That's what makes us a platform rather than a tool. you know like ah We have ah loads of and hundreds of connectors, hundreds of components out of the box. But what makes us a platform is that we actually have a full-fledged framework to create your own component.
00:33:29
Speaker
and companies are creating loads and loads of components by themselves. and But we do manage them, we do deploy them, and we do orchestrate them or operate them, and we do login and all the fun stuff the same way as any other component. And that's what makes us a platform. so I would say that's why it's sometimes used the equivalent of Lego. Basically, if you have a problem, let's say if you have sometimes what we call a long tail of use cases. you know So ah every single company every single company can execute on on those high level, high profile use cases. right like Even like customer 360 and stuff like that. Of course, everyone can focus, can just throw people in money on it and they can execute. right get all Mileage vary, but you know more or less, yes.
00:34:17
Speaker
ah The problem is that every single company has a very long tail of use cases across every single department. Being a stupid or sorry, smart, small ideas, you know, um within departments like marketing departments, other departments, they have awesome ideas how to automate, how to make their they work simple. Can we just automate this particle process that I'm doing manually, right? Like, and there's quite a lot of stuff that is happening.
00:34:44
Speaker
And for those people, there's no way that they will jump in and they will ask IT department to to give them an access to seven different tools to to do this. It's always a request. you know And and we are what i'm arguing again why I'm arguing against modern data stack is that it just summons the status quo on the market, which is it's always a request to IT department and it's always in a backlog of IT department. There's nothing moving moving forward. And if you need to scale this up to you know to decrease the backlog, you have to hire more people. And God forbid, you have to add a you know additional tool in the mix. So modern data stack very often, it's like five to 12 tools that you can put in the mix. Every single tool has its own skills required, you know its own expertise.
00:35:35
Speaker
its own cost driven by that tool. ah it's ah it It's sometimes using different charging mechanism or pricing mechanisms. You have no freaking clue how much does it cost and stuff like that, right? And I'm actually actually um'm actually sure a strong proponent, for instance, of of data mesh. I really believe that it's a way how to get you know get data in the hands of more people within the company, get more things done and such, you know under a lot of you know circumstances and and great conditions. But let's say, I would believe that, or I would argue that modern data stack is very poor enabler of of ah data match. It's always like IT department wants to implement data match because they have you know a lot of stuff stuff in their backlog.
00:36:26
Speaker
And it's like imposed on on the rest of the company. It's very hard to get them on board and very hard to actually apply that. We actually go from the other angle. We go from business people and from other

Outcome-based Thinking in Data Management

00:36:37
Speaker
departments. And we say, you want to execute on your long-tail use cases? Absolutely, we can help you. And we have such a long list of stories across departments of companies being able to get rid of their backlogs you know across different different departments. And they said, oh, we have been waiting for this for two years. And we have made it ah you know during the weekend, actually, and stuff like that. this is We have those anecdotes all over the place.
00:37:01
Speaker
Yeah, I feel like this is outcome-based versus process-based, right? If you're focused on outcome right versus you're focused on how the work gets done instead of does the work get done and and um do we have a way to make this supportable and sustainable? It's, again, the product thinking, project thinking, right? like If you're thinking product,
00:37:27
Speaker
You're thinking outcome. What are you trying to do? You're trying to deliver value, right? You're trying to you're trying to deliver value and then capture some of that value back as as a company. So how do you enable that? Is it that the data work is the thing that creates the value or is it that the data work unlocks?
00:37:44
Speaker
the people that are doing the things that create value. And and it I feel like I don't mean this in a bad way, but a lot of the the conversations around modern data stacks ends up being about kind of arrogance of that the data team knows way better. And and in a small company, your data team can know way better because you don't need, you know, in this like, you know, if your data team is two people,
00:38:10
Speaker
I mean, you shouldn't be building a modern data stack anyway with that, but like they, you should be the team that is, that is knowledgeable about all the data. Cause you don't have all this crazy of a thing, but you know, you just can't have everybody up to speed. So Yulia, I'm sure you've got many followups you wanted to throw in there.
00:38:26
Speaker
Yeah, first of all, what I want to highlight that I like, and I hope Martin, we're going to be friends after to this podcast recording, that I just don't feel it honest to compare and just to say about Kibula that it's all in one data platform and compare it to Databricks. It's different print solutions in terms of magnitude, like with all the respect. yes We don't have the revenue same same revenue as Databricks, that's true.
00:38:56
Speaker
Well, you don't have the this much investment, I believe so. Yeah, that's correct. Yeah, absolutely. But but as of today as of today, what is built out there, what is people are using, it has a different like, you know, it's a little bit different stack.
00:39:18
Speaker
and dan that was a thing well At the end of the day, it's different because it's used differently and it's used by different personas. We actually coexist in several occasions in companies alongside of Databricks because they have Databricks for their some of the ML pipelines and some of the some of the core pipelines and they use us as an enabler for the data for the rest of the company because no one can actually touch Databricks. you know and i'm over yeah your but that's yeah does one exists yeah Yeah, Databricks is for those special people, you know, like let's put them, you know, but hey who know the rocket science.
00:40:00
Speaker
I would say, you know, I'm i'm i'm really grateful that ah Microsoft pushed Fabric forward. The Databricks are, let's say, sponsoring a lot of people to make a bus, you know, that you can see on the link nowadays. and And it's great because it is helping to push the concept or to push the voice that, hey, for a lot of companies all in one stack, all in one platform makes so much sense. For instance, in ah in the you know in a in a North America, we have quite great success with the QSR industry.
00:40:35
Speaker
The reason for it is that there's loads of data in QSR industry. They are very close to customers. They ah they have very thin then margins, but data is not in their DNA. the The technology is not in their operation. In their operation, of course, technology like POS systems, that's a core. They focus on that.
00:40:54
Speaker
But there's no way they will establish great practice of having tens and hundreds of people and working in data. But they need data. They need data processing. They need to optimize. And you know i I could list like five to ten ah great things why data di is important in QSR. But I would argue that for companies like that, all in one stack makes so much sense.
00:41:16
Speaker
and then Let's decide, you know, can they actually ah embrace data stakes? Some of the QSR industries do use Databricks. A lot of QSR industries use Snowflake and some additional technologies on top. And then ah some large ones as well use Kibola actually to facilitate their data pipelines and such. And they get rid of old stuff and they simplify, simplify things. So anecdotes like from four week development to two days, that's what we yeah we are seeing, especially in here.
00:41:47
Speaker
So, no, no, Scott, I'm sorry. No, keep it here. No. back Okay. this i like No, no, no. This is the reason because I was writing down, like I was, you know, for myself too long. So, um, the case here for me is you being saying, okay, the modern data stack, um, they,
00:42:09
Speaker
You know, they are too fragmented from what I understood and the value is hard to measure. But like we at Mass Hat, we have the same anecdotes. Like that would took us forever to have discovered that anomaly or to calculate the cost.
00:42:26
Speaker
And last to that, at Mass, it would not infer the additional cost for our customers. Unlike any other SQL-first solution in data observability space, the onboarding time for us takes minutes, literally, and time to value five hours. And we still consider it to be a modern data stack. What I'm trying to say like right now is that saying that all the solutions in modern data stack are built this way,
00:42:55
Speaker
they I guess they have a better branding at some point and more money to push themselves forward, but a lot of solutions are valuable for the customers in modern data stack.
00:43:08
Speaker
and And the technologies and the utility that they bring, yes. But the key is, okay, how do you put everything together? And that's where it becomes a little bit messier and and more complex. um You know, every single company, it's very beneficial to be part of modern data stack because everyone chips in for the common marketing,

Integration Challenges in Modern Data Stack

00:43:28
Speaker
right? Like every single dollar spent by every single company within modern data stack help to overall, you know, push for modern data stack, which is great.
00:43:38
Speaker
So I feel like there is a lot more change. I didn't see the help from the modern data stack, to be honest. Like, how do I get it? Where im where where should it where where else should it cheap be to get some help with marketing? Where are these? Yeah, so, you know, and you're seeing, well,
00:43:56
Speaker
There's quite a lot of synergies of companies working you know working together and all together to provide solutions, to bring the solutions in, to have a common message. You know ah you can see it, dbt plus, and then there's a lot of you know technology partnership. By the way, we are the technological partner of dbt as well. you know ah We are, I would consider in, in, in some people, you know, ah between some people friends or, you know, we, we know each each other. It's fine. But, um, I mean,
00:44:34
Speaker
i mean yeah. Um,
00:44:39
Speaker
It's great technology or it's good technology. It's great framework. the the The detail or devil is in detail how it's been used, how ah where is the go to market and such. And again, you know going back and back to that that a whole idea, i I don't believe that there is anything wrong on the technologies being used in modern data stack. I think that the key element and the problem is that you're bringing those tools together and basically like every single tool being brought in is yet another integration project by itself. Literally, like you have to tie in stuff. And then a lot of companies, especially from the angle of of the orchestration orchestration tools nowadays, they are trying to close this loop and close close this um ah this opportunity opportunity by you know providing it's easier you know easier integration of the shelf integrations. But then if there is a new data, I don't know, data
00:45:35
Speaker
ah data quality framework then you know how how much and do they have to work on on the integration of of other bits they literally nowadays they have to make sure that they are part of it you can see it in the data contracts original supported soda i have read some articles about now support and great expectations and then like Let's say if there are three more, you know, and data, data quality frameworks, you know, what is the, what is the work that has to be done? And can you do it, you as a user of those two technologies, or do you have to to ask someone or pay someone to actually make this happen? And that's where it becomes very complex and cost. has it but The question is, again, are you as a data person paid to work with the data tools or are you paid to accomplish?
00:46:27
Speaker
Right. um And, and. Working outside. But the problem that I see in this industry is, is people's, their names get made by, you know, playing with the tools by, by doing the data work instead of ye the business work. And that's how you advance your career and to get hired is it's based on that. And so, you know, yes, you can be all bought in that you should be doing this other approach.
00:46:57
Speaker
But you don't yeah a lot of people aren't seeing the benefit from it unless they're sticking at a company for a long time. and do it like you know it's the same thing we were I was asking you earlier, like how do we talk to to the business person that is holding our budget, our FP and&A person, our financial planning and analysis person and say,
00:47:21
Speaker
hey, the bill is going to be higher, but our total cost of ownership is lower. How do we actually say that ah when when people are like, okay, then what's your return on this total ah cost of ownership? And it's like, well, I've got to go to the business to be able to figure out the return because otherwise it's just the return on data work is you have better data. It's just this morass of of that. And Yulia and I, we we talked about this too of like,
00:47:46
Speaker
this communication problem and this this what you get paid for and how you get famous and who who people are paying attention to. It's not the right approach. you know I mean, there there are some people out there that are are doing the right things like a Joe Reese or something like that, but a lot of the people aren't focused on Why are we actually doing this instead of let's do the cool thing instead of the the valuable thing because the value is is difficult to be, to communicate about. So I don't know how that wraps into this perfectly, but I think that that's the same conversation of modern data stack is we saw this, especially in 2022 and 2023 and 2021, even that modern data stack made a bunch of people's careers, but it didn't make their companies better.
00:48:40
Speaker
And so I, I, and Dana mesh kind of went down that road in in a way that I tried to fight against, but like how. How do you think about as the field CTO, you're arming people to have these discussions? You know, Yulia, you as part of your, you know, the offering for masthead is like, this has a direct return to money, but it's not something that people immediately think

Career Growth: Beyond Technical Skills

00:49:05
Speaker
about. So like, how do you think both of you, how do you think about arming people
00:49:11
Speaker
with the information they need to head in this direction, when it might mean that the bill is higher, but the total cost of ownership is lower.
00:49:22
Speaker
Well, in our case, ah we are lucky because our bill is not higher um because the cost-benefit analysis over and over place shows that the cost avoidance is there, you know, being able to shut down some of the stuff replaced by us. um It's actually bringing in a cost ah cost avoidance, quite a significant one. So we are lucky on this one. Just to piggyback on what you said, Scott,
00:49:49
Speaker
I really believe the development by Resuma is the thing. And that's one of the biggest anime of any CTO and CDO that they have to fight within the within the company. Development by Resuma culture is the cancer for the of the of the newest or nowadays companies. And this has been has to be strongly ah strongly getting getting rid of.
00:50:16
Speaker
but But then you can't hire people because they're not interested in coming. It's awful. look Look, I understand that. And and yeah, ah we even hire ah had ah a ah one or two ah companies in the past basically skewing more towards the but modern data stack after a while because they they had the same argument. They have very strong voice, you know the the data teams and and and technical teams. But when I used to be out of professional services, I was pushing our our consultants to rather focus on on the business acumen, on the domain expertise. And I believe that that's the way how any data person can grow within the industry. Do not focus on tools, do not focus on technologies. Of course, there is a basics and there are basics and stuff like that. But
00:51:05
Speaker
try to have an expertise in particle domain, try to understand the business and what are the drivers of the business that will make your career 100% better than just focusing on on a single technology. There will be always someone who knows dbt better on you than you, or I don't know, yeah you know, we're abusing dbt and dptm my approach to dbt guys, but you know, just as an example, there will always be a person who will come from, from, some from somewhere and, you know, I'm being able to maybe be better you know, than you and then this. But having experience and knowledge of particle domain, that's a very strong, you know, asset on the market. So I would guide everyone within the data industry to focus on that. And that will hopefully progress to carry more than and just tools. So development for SMA, yes, but short term, long term, yeah.
00:52:01
Speaker
You know, Marion, what I realized you're on our call is that you are not actually against Marion. You are, ne let let me finish. it's to how it It's how it evolved in your head. this is feel mu this You are against.
00:52:19
Speaker
Um, short, not short term, but maybe I don't know if it's correct to say in English, short-minded decision, uh, about the tools that are belong to modern data stack. What I'm trying, and, and, and this is what I can share. Like i I do share that with you a lot. Um, because I see that on a market that some clients can, can make decision in the privilege or,
00:52:47
Speaker
um some brand, because some solution have um nobody invested in their brand awareness. And that pisses me off, because the approach of those solution, i like, I don't, I don't share that approach, you know, being SQL first, inquiring higher Cloud bill for the customer and and also the same that you told this um data quality frameworks require integration and that is additional time for developers, from data engineers, etc. and e etc
00:53:26
Speaker
I think what you're talking about is about solutions that are available into easy integration with existing client stack without having engineers additionally work on these integrations and inquire additional permissions and going through the, you know, how that make it easier for the customers to penetrate with stack, to operate those solutions in an easier way, in a smarter way. Part of the reason to that is that you guys, Kibula,
00:54:06
Speaker
I've been a long time on the market, let's face it. I know that you started as a service provider, so you've been working a long time with your clients and developing your solution and and building it, while the majority of modern data stack is kind of fresh. okay And we have to give them kind of respect for what they're building in this term short term.
00:54:32
Speaker
Do I believe that every solution out there are going to evolve and be less pain in ass and be able to integrate with rest of this tech out there? I'm not sure. um and i want I don't believe you wanted to. I see. I don't believe you want them to integrate. No, I just there needs to be consolidation in market. There is no way exactly and this can work. Long-term, there's no way.
00:55:00
Speaker
it will be financially viable for any company to jump onto seven tools and pay 3200K for each, no way.

Future Predictions for Data Stack

00:55:10
Speaker
Like, and I don't, it's just not it was not possible.
00:55:14
Speaker
And yeah so that's why I believe it's this, of course, future belongs to you who govern and compute. That means all-in-one stacks and orchestration tools. And I believe that just dead the um on the market, the ah the consolidation will happen around those orchestration tools. That's my prediction.
00:55:34
Speaker
ah because they try to you know capture that and they they are the best position to to go around this that's why it makes so much sense for dbt to buy someone you know in the long term unless they will i don't know are you on sale are you guys on sale are you guys on sale i still think wait we can we can make it viral hey but but but today um I'm jumping around. You've been in the market for 10 years. This is the other conversation.
00:56:07
Speaker
via yeah Yeah, no worries. But I wanted to say that but we have been on the market for 10 years, let's say. Some of our customers in states have been around for seven years with us, right? Like the ex-science corporation in such firehouse shops now, now belongs to RBI and other companies. They've been around for quite a bit. So we are not new on the market. However, we are not known, you know, commonly known over the market.
00:56:33
Speaker
And you know you mentioned the the the money flow and and such. So we have bootstrapped for, let's say, first eight years. We have gotten investment last two years only when we saw the I think that ah you know the the strength of the voice and the strength of the marketing is very you know important, especially ah in ah in the North America.
00:56:59
Speaker
um And I believe that the companies like us who are originally from Europe do suffer. And especially, you know, we are from Central Europe and we are still a little bit back rather than the front facing, you know, pushy people on the market, right? And I believe that that is one of the curves of the Europeans to not being pushy enough, um ah especially when they try to ah expand on the US market. But hey, let's see.
00:57:28
Speaker
ah We are getting more and more customers, hopefully all of their voices um of and successful successes that they have and and use cases will be helpful here, right? Yulia, can I give my final thoughts and then you can wrap up? Yes, I'm i'm i'm just impressed. I love how the conversation evolved in that.
00:57:48
Speaker
I'm literally in love with Temple already. So one thing I've been pushing for that I don't think will ever happen is that there's a native integration. And I hate the work the the the fact that data fabric is is an actual phrase, because when you think about software, they there's an integration fabric throughout all of software engineering. And if you don't play,
00:58:13
Speaker
you don't get picked up. You don't get used because there's all this ... We don't have that in in data. We need people to be forcing the the market mechanisms to make it so that this integration isn't so difficult. because and And you know, Kubula is taking advantage of the fact that it is an insane, insane pain. And I don't know that anybody can actually fix this. There needs to be an open source project. There needs to be this. but like In data, everybody wants to be the main pane of glass. And so that makes them a major pain in the ass. That's a phrase I say a lot. It's, it's every, every vendor wants to be the single main pane. And like Kapula being the main pane actually makes sense, right? Like.
00:58:55
Speaker
being that thing that you're doing because you're you can push it through all the orchestration. and it's But like every tool wants to be like up on everybody's desktop every day instead of, again, focusing on that work. and So you vendors out there, figure out how you how you can get into this that you know integration that's much easier than it than it is, you know providing open APIs and things like that.
00:59:19
Speaker
but the um the ah buyers out there like figure out how you're gonna talk about this stuff, how you're gonna talk about if you are gonna go for an all-in-one platform. In in most cases, it it is gonna mean that the the bill itself might be higher, but even, let's let's say that they are already very efficient and they're not getting to shut down a ton of things. Your your bill's gonna be higher overall by moving to Kabooma, right? It's just, is the fact that you're- No. No. I just agree with you.
00:59:56
Speaker
This is well on the part. so but oh yeah Sorry. well Just just it was one step, just one, uh, one note in regards of the, since the theme and topic is the, is the cost. The, one of the large driver of costs is the waste, you know, especially in the enterprises and the waste is. such And eliminate in waste is number one concern. And I would say that that's where we bring the value of eliminating the waste.
01:00:26
Speaker
So I literally just said if they have no waste and then so yeah it no, so let's just let's just say I I know I was a cloud cost manager. I know but but even in that like having that conversation with your The people that are funding you to say like we want to be efficient in how we are spending our money and a lot of that is our people and our time So that way, when we do have opportunities that pop up, it's not a six-month turnaround cycle. It's something that somebody can do, like you said, over the weekend. All of those conversations, if you're out there as a buyer and you want to think about and you want to think about going to an all-in-one, maybe you want to stay with Modern Data Stack. maybe you one
01:01:09
Speaker
but You can get the funding for these types of projects to rethink this. If you focus on that conversation around where are we wasting money? That's not just on the bill. Like again, people get really frustrated with like, you're wasting money on the bill. Why, why weren't you doing this correctly? And it's like, that's not the way engineering works. But if you want to go out there and you want to have these conversations about moving to something that's more all in one.
01:01:37
Speaker
talk about those those people problems of I'm spending, my my people are spending 30% of their times reintegrating these things they that tell me that they are integrated out of the box and aren't. And so like that's my my conversation around the modern data stack of really, and if you're doing modern data stack, look into how much time you are spending on this integration stuff, on this this stuff that should that they tell you is just going to be taken care of instead of doing the value add work. and And that's where we've seen a lot of people shift towards these platforms because they've done modern data stack and realize I'm spending 30, 40% of my people time on making these things actually work together instead of delivering valuable results.
01:02:26
Speaker
Two more aspects to this. oh One is ah actually with modern data stack, you have no way how to separate the cost and how to say this use case costs us this amount of money, right? That's one of the major issue. It's not an issue for SMBs that much, but it's a very big issue in the enterprises.
01:02:45
Speaker
um And second bit is, you know, you said cost of people, and I would also argue, okay, can we actually, grade is a bad word, but let's say, can we actually say, okay, those are the skill levels that we have. Can we use, can we decrease the complexity and use the teams that have lower skill values are cheaper for this particle bed and have high value added stuff for, you know, for those people that we overpay or we pay, you know, have to have to sums as we show in the States.
01:03:15
Speaker
So can we, can we do that? And I would say that, uh, this is yet another bed where one in data stack does not have an answer. You know, you can't do that because you always have, actually there was an attempt, the whole analytics engineering you know agenda was fairly smart, you know, from, from this angle, but I would say that, you know, the, the.
01:03:39
Speaker
The lower in the complexity is the way to cut the cost, 100%. There might be a nice closing closing sentence. Yeah, let's, let's, let's put it this way, but I actually have a ton of questions. UNI is again, you know, not a part of advertisement or whatever, but I think I will, but in lapis can go and the way you think of, you know, data space and a data stack. Yeah. It's beautiful. The way it is. I guess I'm like in the business rather than tinkering, even though yes, we, we like to tinker, but there has to be some, you know, no, no, no, the way.
01:04:18
Speaker
Yeah, the way the way you you have your vision, like it's beautiful, and it makes sense. And maybe it doesn't make sense for all the companies out there, I can argue on that. yeah But there is number of companies where it has such a great fit. And this is good enough. This is just perfect. And I have so much appreciation and honor for it.
01:04:42
Speaker
Yeah. You know, if, if there is a way on the market to make open source work and make it work in a way that it's scalable and, and, and such, especially from a data space and, you know, and then.
01:04:58
Speaker
There might be path to open source, you know, beta and such, but I have not seen a clear way, clear path, how we could open source a bit and still make a viable business model, to be honest. I still believe that it might be hard to, you know, for us to decipher this. I would love to. We have but have had talks you know many times, but I believe that this is something that is a fairly complex issue and no one has cracked it that much on the market, to be honest.
01:05:28
Speaker
You don't get paid for it on the data and analytics side. You do on the database side, you do on the software side, but you don't get you don't get rewarded for going open source on the data and analytics side. So and there's cost, but there's no benefit. Okay, folks, let's leave this conversation of open source and data for the next podcast. And um yeah, Martin, thank you so much for finding less minute time to talk to me and my partner in crimes
01:06:00
Speaker
Thank you as well. Bye. Cheers. Bye bye. Yeah.