Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Bogdan Banu: From Zero to Data Platform in Startup image

Bogdan Banu: From Zero to Data Platform in Startup

S1 E21 ยท Straight Data Talk
Avatar
56 Plays1 month ago

Bogdan Banu, Data Engineering Manager at Veed.io, joined Yuliia to share his journey of building a data platform from scratch at a fast-growing startup. As Veed's first data hire, Bogdan discusses how he established a modern data stack while maintaining strong governance principles and cost consciousness. Bogdan covered insights on implementing consent-based video data processing for AI initiatives, approaches to data democratization, and how his data team balancs velocity with security. Bogdan shared his perspectives on making strategic vendor choices, measuring business value, and fostering a culture of intelligent experimentation in startup environments.

Bogdan's Linkedin - https://www.linkedin.com/in/bogdan-banu-a68a237/

Recommended
Transcript

Introduction and Background

00:00:00
Speaker
Hi all, it's Julia from StrayD.org. And today I'm happy to have Bogdan Bano from VIT. Bogdan today works as Data Engineering and Manager. And Bogdan, hello. um Introduce yourself.
00:00:17
Speaker
Yeah. oh yeah ah First of all, it's great to be here. And yeah, like you mentioned, I'm currently a data engineering manager with VIT. I joined VIT almost three years ago as the first data person. My initial title was something informal as a data lead, which was basically because the company didn't know exactly what they wanted to do with data. And that was my first goal to basically oh yeah make sure they know.
00:00:47
Speaker
make sure they know. um Okay, so I have to ask, like, yeah did it work out? oh Yeah, I think they work out. They seem to be pretty happy.

Challenges of Being the First Data Hire

00:00:55
Speaker
um I joined ah very ah yeah soon after our series A, which was with Sequoia Partners. And I think one of the big requirements on how to improve things was that the company did proper dashboards.
00:01:09
Speaker
everything until up to that point was done via amplitude, which is a great tool. But it depends a lot on, yeah, you can't really control for quality of the data, a lot of things that are not captured. And yeah, they basically need it to do proper finance tracking, for example. So, you know, that's, that's helpful, because now I know what secure partners need, but seriously, a ah little bit. They need numbers, you know, like I think all investors need numbers, especially if those numbers go up, they are usually happy.
00:01:39
Speaker
usually happen. This is a nice way to frame it. um So we're done we've been working like we we met each other a first time at yeah at the Google events in Amsterdam Google office. Yeah, we we had quite fun. But also, what is really interested interesting for me is that you joined a startup. um And the pace at which we are growing is impressive.
00:02:08
Speaker
And you also mentioned that you were the first person ah to get into work, to get there, to work with data. So, um, what is, what is, what it is like, you know, to join a Postgres startup, you get hands dirty and, and puts things in order. Uh, every single, and like everything that touch data, how, how, how is black um I mean, considering it was a greenfield project, it was actually pretty fun. At least all of the let's say infrastructure building and setting up all of the
00:02:42
Speaker
technical side of things. At least that was pretty clear to me what needed to be done. And yeah, I think started getting a bit more complicated when we had to start hiring and setting up the human process as well. And yeah, that's where I started learning a bit more. So before I worked as a engineering manager in a lot of startups, but I was never the first person that has to set everything up.
00:03:12
Speaker
ah but I had an idea of what the best practices are. When I joined, there was the big wave of the Morton data stack, so it was, let's say, pretty easy to follow a formula. um Yeah, I think some of the choices I made then were very good. One or two maybe were that twice.
00:03:31
Speaker
But yeah, overall, I think things turned out great. um In time was the team growth. So I started hiring and had analytics engineers, data scientists, a lot of data analysts, and we decided to split the team in two. So I just took over the data engineering side of things and we have, ah you know, hired somebody to take care of the analysts and the stakeholder management.

Building Infrastructure and Processes

00:03:58
Speaker
Initially the data team was split in two, but now data engineering also reports to Max, who is the head of data. Okay, okay. Listen, and this is this is nice. I think what we need to basically frame is that you joined, and as I imagine it, you joined a fast-growing startup, which and if I mentioned myself to an onboard leader person, I didn't necessarily have the time to onboard them. like I wouldn't i um would hire them to make these Yeah, there wasn't a lot of importing. It was a lot of catch-up, basically. So it was to be like, this is the telephone repo. you know Brush up and see what you understand. And if you need any help, let us know. And yeah, the help I got was really good because I could set up a lot of ah and GCP infrastructure fast. I had to do a lot of calls with people trying to understand what everybody's vision of data was and what kind of metrics they needed, what kind of data they needed.
00:04:56
Speaker
VID has a microservice architecture, so there are a lot of databases that we have to connect and stream into our data warehouse. um yeah We had a lot of, let's say, other sources, so things like Amplitude, Stripe are pretty common. And once we started getting up, yeah, but how was it?
00:05:19
Speaker
When I joined, we were weren't using HubSpot yet, so that was added on later. But initially it issueiated was a bit of, okay, let's see what, first of all, what data do we have? Can we get other data? Can we push pull everything in? And then can we start modeling and they have delivering dashboards insights on top?
00:05:37
Speaker
and And yeah, like after the infrastructure part was done was when I said, okay, we need somebody to help me with modeling. I can, I was alone for a long half a year and I don't recommend, you know, doing two quarters of the only data person. At some point it was getting a bit much like, you need help, you can do everything by yourself.
00:06:02
Speaker
No, especially sitting everything like you mentioned, Greenfield, but it's also a lot of responsibility, a lot of moving parts, especially you don't know the environment yourself.
00:06:14
Speaker
what would she Also, the Fida company is 100% remote, so it was a lot of calls, basically, just calling people. long Most of the team, I think, that I worked with works in England, yeah but there were also people like, I know, Thailand, Australia. We had a lot of calls with a cross leader at that point who was based in Australia. Sorry, Dr. Espana.
00:06:35
Speaker
yeah And what I would do differently is probably try to oh yeah like trust my expertise a bit

Learning from Mistakes and Adapting Strategies

00:06:44
Speaker
more. So for example, the thing that I would say like my biggest mistake when I joined, ah we decided on a pretty expensive dashboarding solution instead of going with something cheap or maybe even free.
00:06:58
Speaker
And because initially, like I said, asking people, do you want that? And everybody wanted, you know, a solution with bells and whistles and reporting and everything. And after we started paying, we know that people were only using it. So, you know, the idea would be like, you know, always start small, like just iterate, build something that's cheap and see if it's enough. Well, but as far as you know, you also have lots of modern data stack solution.
00:07:25
Speaker
yeah ah so oh Yeah, so for the pipelines, we started using 5tran. The later modeling there is dbt, which we run ourselves. um Yeah, so I think that approach was ah fine. And again, i nobody was complaining. Everybody was getting ah what they wanted, but from a phenops perspective, like we could have optimized a bit more. So then after a year, we basically closed the contract and started hosting meta base ourselves. Shout out to meta base box. Yeah, we love it.
00:07:57
Speaker
No, I got lots of good to feedback about Metabase, how light it is. Yeah. And so, if yeah, we're running it in our GCP, Insta, like project, our data project. And I don't think we ever need more than two machines running. So yeah, it's pretty light.
00:08:15
Speaker
Okay, we touch base on what you are not really happy about. Basically, the the learning is like this. You wont you would rather iterate and test and trust yourself and your expertise more. um So what do you think you did great?
00:08:31
Speaker
like yeah All of the infrastructure, let's say the initial pipelines, all of the BigQuery modeling, DBT set up, that's pretty much the same as it's been for almost three years. Maybe it's scaled, so yeah we didn't need to iterate on that.
00:08:47
Speaker
so Okay. Okay. Not bad. This is really not bad because like we saw lots of um problems with DBT and the way it's set up because under the hood DBT is also SQL and yeah. Yeah. So it's really good. Uh, on the people's side of things, like shouldn't you have touch base with your manager to hire you somebody else? Uh, somebody, something like Bonnie earlier.
00:09:13
Speaker
Um, maybe, but again, um, it felt like in the beginning there were a bit too many stakeholders in this decision. Oh, okay. So it was a bit like, uh, we have to agree on what kind of people we need to check everybody's expectation of what would be the next data hire. So the order that we hired was data analyst, data scientist and analytics engineer. What do you think is, uh, the most challenging?
00:09:40
Speaker
part when you start in this chaos as a first data person?
00:09:46
Speaker
of I mean, I think managing expectations is hard because there were people that worked in different companies with different data cultures, okay and they expect different things. So a lot of times they would get questions that needed a bit of Bayesian analysis, for example.
00:10:02
Speaker
oh I mean, I can run a bit of things that I can follow. and and say some I can run a notebook and I can do some inference, but that's not my specialty. I'm mostly a data engineer. way It was a bit explained that, okay, I can help you this time, but don't expect this to become a thing when I run all of this analysis. like That was like that's was the one who said, okay, we definitely need a data scientist if we get a lot of questions like this.
00:10:28
Speaker
Okay. Okay. Um, yeah. So this is really interesting. And also like before the call, before the start, I'm like, we are building our old startup. So I'm wearing a startup mode. You guys can be further down the road. Uh, you have your series eight from Sequoia partners. yes Yeah. Uh, fancy partners. I mean, how, how do you see like,
00:10:59
Speaker
What would be the next gig for you? What would be the ideal steps ah yeah in this kind of fast growing startup? ah you mean under the or Well, it wasn't me. like I just want to highlight this. wasn't me like beat yeah Let's talk about we finally the I think the cool things happening right now um have a lot to do with AI.

Data Governance and AI Product Development

00:11:24
Speaker
So we have a new AI product that we're working on. And I actually have a call after this with the AI team to set up some data access things.
00:11:36
Speaker
And yeah, I think that kind of seeks naturally to the big part of data governance because we have like we have a lot of user video. And almost nobody from the company has access to that directly. So we were very strict with access onto user information, user data. So in order to grant the item access to this, like we have to jump through a lot of legal hoops and ah yeah we have like consent service and stuff like that. like We want to make sure if we train anything on user data, yeah it's like the users have already consented. And we have pretty strong frameworks in place for that.
00:12:13
Speaker
So basically what you're saying, as soon as I upload this video of our recording, no one indeed can just watch it. Like it's masked or how do you handle it? Unless you consent to, hey, I want this to be used for training. Okay. we But it has to be explicit. Like you have to explicitly agree to let us watch something. And also like, for example, if a customer has like a problem with customer support, they can also grant um access. or no Like it's based on consent.
00:12:44
Speaker
This is very interesting because most of startups, you know, in their air joinery, not think about data governance at all. Like they trying to survive and data governance, it's multi-year project. and The investment in data governance necessarily in a month or two after the, year you know, after they start this project. So you have you set up the data governance framework once you was set up in the data environment.
00:13:10
Speaker
Um, let's say it was a bit of a mix. So in my previous job, I worked for WeTransfer, which had a pretty strong data governance framework. So there are a lot of things that I learned and I could apply here. For example, we made a decision in the beginning to make sure that almost all of the data and it ends up in our data warehouse is to anonymized. So even if there's a data leak, there's not much to steal.
00:13:31
Speaker
um and yeah Some of these decisions about user video data were pretty obvious from a GTPR point of view. like This is very sensitive information. like if somebody yeah posts a video in the themselves so yeah We want to make sure that unless they are agree to it, nobody can actually view their videos.
00:13:52
Speaker
This is interesting because like we we don't, obviously I don't deal with this kind of data and we don't deal with this is kind of data at Mass Hat at all. But as far as I know, you sort of store clients recording or the video they wanted to process in your cloud storage. ah Yeah. So that's our biggest build the every month. Biggest thing. Biggest build. Nothing com compares to the storage build.
00:14:20
Speaker
No. So, ah do you have any retention retention measures of that? but ah You mean like how much do we actually have? No. that's okay no did that that's I had at some point, and this was maybe three years ago when we had a script to index the videos. Delete.
00:14:41
Speaker
Oh, we had exactly. So initially like VIT was set up as a startup. It had like a legacy framework and then they hired the team to kind of rewrite it into a modern microservice architecture. And then we had a so special service for this video stuff assets. And then we had to delete a lot of the old stuff. Yeah. And that's why we had this script to kind of see what we need to delete.
00:14:59
Speaker
er And yeah like if if for example you have a free user and yeah you can get a specific retention policy. Our goal isn't to hoard data like on the contrary like who would put if we could get rid of 90% of it and people wouldn't mind.
00:15:14
Speaker
well Yeah. And like the data would process for, let's say the reports and right? I would find videos, let's say vertical horizontal frame rates, stuff like that. Metadata about videos. Yeah. Yeah. I mean, and also the usage and yeah whatever the interaction between the services is. And of course there is all the financial data that we process and like we need to kind of find correlations.
00:15:43
Speaker
Do you guys label the videos? Like what kind of content is there? oh I mean, we can ask people what kind of stuff they want to do. And we can scan stuff, for example, using a Google service to see if there's any nudity uploaded. So we ah according to our types and conditions, there are things that you cannot upload. And I mean, yeah, we prefer that people don't check that. And we have a service that can do that for us.
00:16:12
Speaker
But yeah, normally, so right now, like again, we don't label anything because most of the things were uploaded before this consent part. So when the company started, it was before the AI boom. So we didn't think that we will ever use the videos for training anything. And then we discovered that we're sitting on this trove of user data, but we don't have the consent to do it. So then we explicitly implemented this part and then yeah we launched it I think last year.
00:16:40
Speaker
Oh, I didn't realize. And with an initial pipeline to basically copy certain videos that are flagged as safe, and we copy them into a different bucket for the item to train stuff on them.
00:16:54
Speaker
ah wow but How is the data access ah being set up in ah in in vi as an internal organization? like um do you have like okay We have this bucket of the videos that can be used for AI training.
00:17:12
Speaker
But how like it still doesn't mean that somebody indeed could watch those videos, right? ah don't handle yeah like Based on specific group permissions, the users that can access these specific videos to train a model

Access and Democratization of Data

00:17:26
Speaker
for them. And we have a lot of different groups for different for a lot of different use cases. For example, one thing that we do is that we grant everybody access to BigQuery.
00:17:35
Speaker
not all of the data in BigQuery, but people can run their own queries. And that apparently was extremely popular with the engineers, because they can um query and join data from different services. So normally, they would have to look at but replicas and of yeah states individually. yeah But they everything like lives in BigQuery, and they can do that. And of course, you can use Gemini to help you write some SQL.
00:18:00
Speaker
um After that, this is important. I just basically take out as many hardwits as possible, like make sure that you know people have access to data. So I'd strike this balance between data privacy and data democratization. And if the data are pseudonomized, then yeah, you don't really mind people accessing it. And it's like it sounds you know kind of fancy and easy. We balance, sweetly you know we manage permissions, but the standalone managing permission stuff is difficult.
00:18:30
Speaker
Yeah. So again, like we create Google groups and this thing plays nice with Terraform because you can just specify for the specific group, what kind of access they have to what data sets and everything. And again, for most of our users, I would say they mostly use the data in meta base. So there's a dashboard and they just click and they get them. but i think people ah yeah I think you should have a look at...
00:18:54
Speaker
Analytics Hub, the ah Google Cloud team launched that and it it has primarily to use cases internal, the internalin big sharing and the external, basically where you can, I'm not saying Vito, but the organization can put their data, anonymize data, whatever they a look they wanted to have on a public marketplace. And it's very interesting because I know you guys have a very good managed Terraform, but that enables users to subscribe subscribe to certain data products, which can be ah just data sets. And it in this sense... At some point, yeah we just you will find the abuse case for us at that moment.
00:19:41
Speaker
I cannot imagine how you define the use case. It just but takes away this terraform script from you guys. No, but we want the terraform part for, let's say, control. Oh, those three controls.
00:19:56
Speaker
ah I mean, we love our DevOps team and they they've helped us tremendously. A big part of what we did was with the help of the DevOps team. We couldn't have have had this kind of velocity without yeah senior people guiding us. Or met a few of them. What is the name? They can be strict, sure. but it's for
00:20:19
Speaker
Yeah, but it's it's good, it's good. I see this, I mean, I didn't expect you, like, how do I say it correctly? I'm impressed that you have such a yeah strong governance framework in place. And was it you who developed? Like based on your experience? Yeah, so like, I made sure that everything the other day that it builds should have, like, at least some considerations of governance in mind. No.
00:20:47
Speaker
Do you realize how many big organizations don't have that in place? For instance, yes. like the like We always hear about leaks and yeah people don't encrypt passwords or they reuse salts and stuff like that. yeah And it's kind of scary, but I guess it is what it is. you know ah yeah like I think you can just write um hundreds of articles about pitfalls and all of this stuff. Security is a huge domain.
00:21:15
Speaker
Now, again, like we can use tools that help us. For example, we use Vanta, which is really great in like assessing. So for example, you can grant Vanta access to your, um, um, say Google like GCP account and they can see, okay, this group has access to all of these resources is is supposed to have it. So it guides you in there. So I think maybe like this is what helped us have this kind of velocity. Like we use a lot of tools and a lot of times we prefer by overbuilt because we don't have that many resources that many people to build.
00:21:45
Speaker
This is also be an intelligent awareness with yourself. Sure. We always try to make a calculation, how much time, people, money, what do you take us to build this? Would it make more sense to just all yeah pay a service for it? For example, I've noticed this shift now in all of this data ecosystem.
00:22:06
Speaker
Like I think it's moving a bit away from the modern data stack approach or post-modern, how some people called it, into data bundling. There um service are that basically, they offer you everything in one package, right? A warehouse, you got pipelines, and then you just pay a monthly fee.
00:22:21
Speaker
And yeah, that's pretty much it. But this is what I thought. But I would maybe, you know, for another startup, I would recommend don't try to build everything in Terraform yourself. See if you can just pay for a monthly fee with that skill as well. Listen, this is a very interesting discussion with a tapping intro. But do you realize any solution like this won't give you as much flexibility as you have today?
00:22:47
Speaker
Yeah, definitely. And there there's also something that we consider, like we don't want to get locked into any ah solution. So that's why we weren't very strict of running all of these DBT ourselves. So we don't want to get locked into running transform on a platform and they they decide to increase their transform prices because that has happened. But this is also very, how do you say, um, um, this is as ah here's your vendor comes in.

Tool Selection and Vendor Considerations

00:23:15
Speaker
Well, the vendor increased prices, but if you clearly generate value with this vendor, if you wouldn't be able to move as fast as this vendor allows you, why it's bad to basically give back a certain amount of credit just ah to this vendor? Because and anyway,
00:23:37
Speaker
Like during the last years, they added a head count or they, you know, added some features like they continued or they don't give you the solution in sales. Like it's continue to evolve. Why? I know that I different and previous could be like, if somebody jumped it. I'm not saying that everybody does it. I'm just saying that it's good to, you know, consider all the possible outcomes. Like you you don't just go head in, into one solution, say, okay, all our data is going to live here. You always need to have an exit plan.
00:24:08
Speaker
But this is can you can contradict yourself right now when you mentioned that it's to have one solution that gives you a data warehouse pipelines and everything. No, I'm saying do it, but you know consider the like all the associated codes. If what you need right right now is velocity, right you need to develop a data platform overnight, then go with this ah bundled solution. And then I'm sure that that you can find a way to export all of the data. like If you want to change warehouses,
00:24:34
Speaker
It's a pain. oh I feel like no matter what service you move to, whatever service, like yeah migration. and all Yeah. So this is going to be a problem. oh think that if And I think like there's the thing there's there's no one or solution fits all startups. like If what you want is safety, focus on safety. What you want is velocity, focus on like the quickest solution that has you know checks these boxes, and of course doesn't skimp on security.
00:25:01
Speaker
I think, again, like it depends. This is kind of the answer. It depends. In our case, we like we invested a lot in having control over the data ourselves, which was a decision we made. and It was a bit of you know like the requirements we set up when I started working on this data platform.
00:25:19
Speaker
Listen, as I have privilege to know how your data, the environment set up, but you also, how do you say you're on a spectrum? I mean, we did it. No, sorry. No, I think I'm on a spectrum or another. but On a spectrum of... they I get the joke finally. I mean, I also talk about neurodiversity in, you know,
00:25:45
Speaker
Oh, I can talk about it, believe me. I'm meeting a lot of people. but daniel you So I'm trying to say that you guys with your data platform or on a spectrum of being adventurous with solutions. oh like Yeah. But also again, yes, but but um like every time I say, okay, I think we should pay for this i or I think every time it's about, you know, like a trade-off, I have to explain what the trade-off is. Yeah.
00:26:12
Speaker
I can just, I don't have access to the company card and I can just start paying for everything myself. I have to kind of. Well, maybe you need to have one because like, no, honestly, like you and I, uh, I appreciate the fact that you being honest, like we cannot build stuff ourselves. So we have to outsource it, but we also need to understand what is the, well, you did the solution generate for us and what is the cost and what is the cost for us to build it in-house? What would be the cost to move away from this platform? So for example, just one tiny example, we still use 5-tran for ingestion because we're very happy with it. And for pseudo-animization, use assault. But in case you want to match those hash values, you need that so yourself.
00:26:59
Speaker
So then if ever we want to move away from 5.29, we want to keep hashing consistency, we would have to use that salt from them and hash everything ourselves using the same salt. Or, you know, just reprocess everything, but a lot of the data has passed. over so So again, it's something that we considered from the beginning, like what happens if um
00:27:21
Speaker
Well, ah but but this is knowing how much you pay for 5tran. Well, no, I don't know how much you guys pay for 5tran. I saw the ah compute cost rate. But um yeah but but if you like this is point. We have the customer that uses um data fusion.
00:27:41
Speaker
Do you know Data Fusion in Google Cloud? It costs the same money. Just the service, let's see, monthly is more than $3,000. And plus to that, you need to have Dataproc that actually runs these pipelines. But their use case was to transfer data from very old ERP system that I'm not sure if 5tran have connectors to them, probably yes, ah but they figure out that COS and 5tran can be higher than in Data Fusion.
00:28:13
Speaker
Oh, yeah, definitely. Again, like I said, all of the solution is tailored to specific case, right? For for that case, maybe they they got the best up but not like the cheapest solution. And it there are a lot of tools out there, you know and ah they have like varying um price configurations and some build by you know rows or by megabyte or you know different scenarios.
00:28:39
Speaker
I don't think one is great for everybody. I think yeah like you really you need to shop around for a bit. like what When I joined, like we we did trials. I think I trusted like five different platforms and tried to establish. For data management? Yeah, just for data integration. like we and also like Because again, I said, okay, my first in my first quarter to build a modern data stack and then I will do trials of several services So it was mostly try like doing trials of the ingestion and the extract part and also ah We wanted to make sure that whatever presentation layer we use is fast enough, you know, six letter boxes Okay So basically all of that was tied to business so you can deliver faster
00:29:21
Speaker
And that that's, I think, critical for any data team, right? You always have to show your like value for the business, right? Otherwise, like you can geek out and make ah extremely complex solutions that can have i know like geek out huge throughput and everything. But if it doesn't bring business value, oh yeah.
00:29:39
Speaker
it's so um It's very interesting because, you know, here's something I just spot. When I'm talking to a startup guy, so you, you are very comfortable with the startup world and you're comfortable with testing things. You're comfortable to picking up things. You're comfortable to hire things to do the job and then fire them. You're comfortable to just to use startups and, uh, to not waste time on building in house unicorns.
00:30:10
Speaker
Yeah, which might not ever been used, like that's unicorns in a sense of getting you rich. But those unicorns just yeah it stay it stays in a corner. And when I'm talking to bigger companies, maybe I'm turned up a little bit later serious. They're like, okay, we might build it ourselves. And I don't get it where it's coming from. like you get Maybe they are all bringing up them and they have a lot of FTs, I don't know.
00:30:37
Speaker
Bogdan, I haven't met any data team that was not understaffed. Just to be honest, every data team I'm talking to is stressed. Folks, if if you are not stressed and you're from a data team, please nudge me. I want to just sense how how' is being a data person and and being stressed. like This is very interesting. and But also, there other like this is two buckets. The third bucket that I'm meeting, data directors, maybe senior data directors,
00:31:06
Speaker
from companies already on IPO and guess what? They don't want their teams to get dirty with routine tasks like building monitoring or improving their SQL performance. They want to waste their data engineering time on those low level tasks. Yeah, this is also interesting.
00:31:30
Speaker
Yeah. And again, this might be a cultural choice or I don't know. That's what the thing depends on the size of the company, right? Like if you're going to IPO, you probably have a pretty dense hierarchy. And I'd say there are a lot of interesting things um um that we can say about managing versus you know building stuff. And like being an engineering manager is very challenging, especially if you want to still stay a bit hands-on.
00:31:54
Speaker
But yeah, a lot of it has to do with negotiation. So you have to negotiate with other people and you have resources and influence. And I think there's a very interesting book written about this by anthropologist called Moral Maizes, which basically tries to see what managers do and you know what how their work is connected to the supposedly value that should be brought for the company.
00:32:17
Speaker
And yeah, this isn't always like a perfect alignment. I would say that normally like engineers tend to be a bit more on the geeky side and say, oh, but this would be software to build. Or maybe the realistic and say, well, it would be cool to build, but we don't have resources.
00:32:33
Speaker
Listen, and just to clarify, you read the book that helps you to talk to business users and explain the values that you are going to have in any technical task, like or in purchasing any technical solution. like Basically, you read the book to help you communicate with business developers.
00:32:51
Speaker
more like to help me understand right so you you kind of at least my experience like after ah you know getting a degree and everything moving into real world um ah you just okay I'm gonna build this and then all the companies are aligned along this line and you know it's kind of this hive mind everybody knows what everything everybody else needs to know and things just the align that's not real life that's more of a fantasy right like
00:33:18
Speaker
A lot of times ah you see a lot of groups fighting in one company. That's extremely common. So, you know, you need to know how to sell something and you need to build an alliance and yeah, just negotiate. It's not, I would say the things that would have come naturally. So yeah, that's why I had to research around, you know, what's the... Thanks, I was happy. Yeah.
00:33:42
Speaker
Yeah, this is also very interesting because, um you know, when I'm pitching the solution as well, I realized that pitching the value of the solution doesn't necessarily um convey my message to the people until I found something personal to them in it. Yeah, like, you know, you can splash a Google BigQuery cost and hire somebody to help you. The amount of money left.
00:34:08
Speaker
good And this, is yeah, this takes the conversation in other direction. Once people can understand what is in it for them. Yeah, but I learned, I learned the hard way.
00:34:22
Speaker
Yeah, I don't know anybody who hasn't learned this the hard way or like if this is, or at least from my perspective, again, like if you have like a very small company and they're like a handful of people and it's a very geeky engineering driven company, maybe you don't need all of this. But after a certain size, yeah, like we can't ignore the social components of humans in groups interacting with other groups. So it's human nature, right? So you have to, yeah You have to know how to talk to humans and convey message. said What do you think is most effective? Like when you, um, but basically I believe you have some framework in your mind when it comes to pitching some initiative. Um, let's say not necessarily you need a new solution, but maybe you need a new person. How do you gonna, well, first of all, how long are they going to take you? So we could like, uh, to push through this, uh,
00:35:20
Speaker
ah project and how how will you make your managers approve you the budget and everything? Like what framework do you have in mind? I mean, it's not quite the framework, but more of you know like, I would only start like knowing what the audience, like who's going to read, if I write the document, who's going to read this document and who's the decision maker, right? So then you have to read, to write for them. You have to show the value it brings, you know, that they can recognize this through value. You can just say, oh, he's going to double our data processing capacity. Okay, that's neat, but what's the value for the company?
00:35:53
Speaker
and Oh, we can make like faster, we can get faster insight or you know we can aggregate more data points into a more complete something and something. That excites them, foster data points. I don't believe it excites anyone. I mean, a lot of time people think they want real-time data, but I'm just saying, of actually- Still works.
00:36:14
Speaker
There are very few cases where people actually use that. So a lot of times you build something and you spend a lot of time and you want to make sure everything is perfect and yeah and then it doesn't get the volume or the traffic that you'd expect it to get. Other times you build something and you think it's you know it's just a gimmick but people love it.
00:36:32
Speaker
so and so i mean It can be that simple to just know what the effect will be. It's like, you know, building any kind of product. You you learn something and sometimes it's a surprise hit or sometimes it's a huge miss and you can't really know in advance. You can assume ah based on your past experience, based on trends, based on something, but you can never know for sure. Okay. So tell me about your biggest win and biggest losses. Sorry.
00:36:58
Speaker
Tell me a bit about the biggest wins, like what made you and your team shine?

Unexpected Success of the Live ARR Dashboard

00:37:04
Speaker
And yeah, and what made you guys kind of... Okay, so one thing that started with us as a gimmick was basically launching, a call it live, it's not quite live, but it's like lay live ARR, right? So the ARR metric is very crucial for us. We used to do daily reporting, so we aggregate everything and close the I close the book at the end of the day and then put everything in nice format, you know, go through the models, like tons and tons of models, clean everything up. And then we say, okay, this was the ARR at the end of yesterday. And then in the morning, people check the dashboard. And then, you know, we got the question, what can you do live? And, um, you know, and the initial but the reaction is no, of course not. It's, you know, like we could, but it, it would just take so much oh time. So in the end,
00:37:55
Speaker
but which you mean to era like like People in the company really like to refresh this. You don't need to refresh it, it just runs. And you can do a lot of cool things with it. You pass a certain threshold and you want to ring a bell or something, which is nice. We didn't pick it up so much. It had to be so popular. Excitement. I see. It's just a huge excitement.
00:38:21
Speaker
Yeah, but in the end, we're not doing like you know fully like like live precision. like you know Every time you make a subscription, every time somebody cancels, you just sit, fluctuate over the day. Yeah, you need to kind of average things a bit. So it's not live life, but it's live enough that people get put. I see it's live now that people excited.
00:38:41
Speaker
So the task was, like, in Jira, we need life, but not live life. It needs to make people happy, live, to realize how live it looks. I don't even think we had the tickets for it. It was pretty much like there was a Slack discussion and somebody said, hey, why not do this? Yeah, because you're a startup. Yeah, yeah, yeah. So I think even a discussion on a Saturday, that's kind of, you know, like, hey, what do you think about that? You know, could you do it like that? Making it live.
00:39:05
Speaker
okay the But with those services, like which support, like I assume in your case it's Stripe, it doesn't even sense that it's data live. oh Yeah, you can pause it. For example, you can make a subscription towards Stripe events, and then you try to integrate those events. Yeah, but so in theory, again, you can do all of these things, or you can try to find something that emulates.
00:39:33
Speaker
Okay, okay, I hear you. That, okay, just folks for your reference. If you're gonna make RR, I'm not pronouncing it even correct, RR. and um It's like a part of the, like, RR. Apparently I have some some problems with this song. Like, RR, ARR. Yeah, we almost nailed it. There you go. So, if you're gonna make this metric,
00:40:02
Speaker
real time, maybe not real, real time, when your assistant is going to be happy. So this is what we learned from our band today. So how about telling us some things when you actually guys, would you say it correctly?
00:40:18
Speaker
didn't meet everyone's expectations or when the prod was down like I don't know the magnitude of things. I remember we managed to take the website down once or twice and that was victors once or twice so that was because we were testing out um yeah live database replication.

Incident Management and Learning from Failure

00:40:36
Speaker
So if you have, like we use a lot of Postgres DBs and you tend to stream that data off using the right ahead log, you cannot, well, you couldn't do that from a replica. I'm like right now with version 16, you can't, but initially couldn't. So working with using data stream, which is another GCP service and you kind connect with this problem is that you have to create, so I think it's called the replication slot or something. And then if you don't consume those, yeah, the database can drop when it happened.
00:41:06
Speaker
or some things you find out in prod. And yeah, luckily we knew what the reason was. So then we could just drop all of the things we created and it doesn't help out.
00:41:19
Speaker
um Yeah, what else? Maybe that was, I would say, the biggest thing where people are like, oh, you know, data can break stuff. And of course, all of the events, you get the pipeline outage for various reasons. And then you have to dig in between the systems and try to see what's wrong. And then you try to consolidate and replay data. Yeah. So that was exactly like we had a problem with the Stripe database, what you mentioned. So we had to replay all of the events. And I think can do that, like up to 72 hours in the past.
00:41:51
Speaker
Oh, really? It's so short. Like not even send up seven days. I'm not sure, again. so but its What are you guys using? Maybe I can give you a suggestion. where you guys What are you guys using? Is it webhook data? I know. So, again, for the official MLR data, we use the 5-tran strap connector, because it comes up with the...
00:42:13
Speaker
ah fi Again, so we have an official dashboard, ah which you said, like you know, if that's the number of the end of the day, that's when you believe in, we have the live one, the spectator.
00:42:24
Speaker
This is the most hilarious ever. Like, this is the live, no, no, no, Lee, no, how do you say, real case scenario where there are two metrics. I mean, the thing is that we make sure that at night, the two metrics align. At least for like a brief time, you know, they overlap precisely. So I mean, with the live one, you realize that at the end of the day, when you compute the official one, maybe you're off by a couple of hundred or something. So then you're just correct. and you just Should somebody look in a manual, look correct?
00:42:59
Speaker
no No, no, no, no, we actually we actually use, ah so when we compute our MRRAR, we expose that through an API. So then we have various services that that can query, including this live. So it's all done automatic. Okay. Okay. I'm going to know that that stuff is handled automatically.
00:43:17
Speaker
No, but i and creating an API for your data, I think that's very valuable. That's something that we learned building this library of stuff. So it was kind of, okay, what did we learn? And then, okay, we have this API and we can serve any kind of insights you want. Like we can create an API that per a user, like creates a profile and of course, yeah one number of downloads or something. And then another service can use data. No, it's beautiful.
00:43:44
Speaker
and you can just But this is also, but sorry to talk at the same time, right? Just to make sure. No, no, no, no I'm just, there you know what, but I just realized that you guys have created, maybe not the complete, the data platform with managed access, with APIs, outputs, like it's very much a decent project, in which is in the making.
00:44:10
Speaker
yet, but the way you describe that it's also scales to your insane amount of data. And yeah, I know that not all the pipelines are real time, but now I know some of them are.
00:44:27
Speaker
i and yeah if yeah again like There's some um hard requirements that really have to meet. yeah sure that Things scale, security, like you don't really want to compromise on this. There's some soft requirements, right like some precision here and there. there But know what, I think it's great um because when I hear somebody says we need to have data 100% correct, this isn't realistic.
00:44:53
Speaker
No, because... um It's data. yeah i mean Even if you want to measure the real phenomenon, the real life, you know you always get measuring errors. It's something that I learned early on. like you know You get 20 people to measure it, you get 20 different numbers or values, and unless it's counting, which you know counting got money or something. But even with money, it's only now. Yeah, there is going to be a risk to be different range of...
00:45:20
Speaker
My subscription management is pretty hard because you get a lot of refunds and coupons and all of this stuff. Yeah, I know. Special corner cases that nobody thought of. And then you get on top of this, you get migrations from a service to the other and then things get lost in migration. And if you use one of those... No, some of these have lost their subscription.
00:45:40
Speaker
Uh, no, but the like, I don't think someone loves their subscription because again, people still had access to their comm. We just wouldn't be like them. It was like free subscriptions. It can't help her. It can. Yeah, no, it's, um, but yeah, we're like, I like your healthy attitude towards it. It's, um,
00:45:58
Speaker
It's hard to deal with data. We need to understand what are the requirements where we need, what we can compromise and what we cannot compromise. It was like at midnight, we cannot compromise having two different RR metrics. But during the day, it could be a little bit different. Say one is, again, like again, back to investors, you have to report numbers to investors, use the official metric, right? Internally, if you're going to option pay and every time you pass, I know, five more million RR, you use the live one because it's fine, right? it's as a did said No, exactly different purpose. Because for you guys, it's important to understand if the metric is growing. Otherwise, if it's, let's say, given your scale and the pace it at which you are growing, if there are no subscriptions in the last two hours, it means it also could mean that your website is down. Or if checkout was not working.
00:46:48
Speaker
A lot of times where people say, okay, why is this up or down? And as a data team, you don't like this kind of questions because it can be like a fluctuation. But once in a while you actually get, you know, valid questions. Okay. This huge bump here is because of that and that. Yeah. Okay. We just divide that non-official RR metric. I'm glad.

Hiring for Cultural and Technical Fit

00:47:08
Speaker
Listen, I want to ask you the question, how do you hire people to to your team? Because there is a certain mindset. like I met so many startups and you know and so many VCs that wants to have these people, Amazon or ex-googlers. But those people, if they spend less the key in this corporate culture, they don't understand what it takes to be in startup. they I mean, not necessarily understand how it works in the startups.
00:47:36
Speaker
And you guys have very exceptional culture, I should say, because I met ah Jim, your CTO. He's very honest, very up to the point, I would say, and emotional as well. I mean, like, not all the corporate cultures necessarily approve that. So, I mean, there is a certain flavor of your culture. So how do you hire people to your team?
00:48:00
Speaker
I mean that's why the cultural match is also very important. So of course all of this kind of interviews start with oh sit down with somebody and you talk or you sit in front of your computers and you talk and you see if there's like a match and then you also want to verify the technical side of things.
00:48:17
Speaker
And yeah, then we have like usually on all of this position, there's a round of stakeholder interviews. So then you want to make sure that there's a click between the person you're hiring and the persons that all the people are going to work with. Spark. We're looking for the Spark. Amen.
00:48:33
Speaker
Nicely spark, you know, maybe that's what the rom-com tinged about. You need to say something like, ah like for example, personally, oh I like people coming from academia because I have a background in academia. But that doesn't mean that I only hire people that have, you know, world that was up and research. But I like a certain, let's say, inquisitive mindset, all right? So if you know how to apply the scientific method, I find that extremely valuable, right?
00:49:02
Speaker
because you come up with a hypothesis and you try to test it and falsify it. I think that's actually maybe undervalued in a lot of startups um because a lot of time people just want confirmation of their hypothesis. like yeah That's a different discussion. And um yeah, I really like to see people that know what, you know, proving something means, you know, what does correct proof of something mean? What is, I know, like, or, you know, building a correct, like, a system that can prove that will work correctly and you can, like, a test or you can implement.
00:49:31
Speaker
ah I would say that maybe mindset, oh might but in some cases, might be more important than, you know, experience. That might be controversial. This is important, but I wouldn't always just say, okay, let's hire ah seniors. For example, like our first analytics engineer had never worked with DBTP before.
00:49:54
Speaker
But that he applied later for the data analyst position, and he said, OK, this is an extremely strong candidate. And then we just discussed, OK, what if you know you we were to hide you in this position for this analytics thing, and then we created another kind of challenge in which he passed?
00:50:09
Speaker
and yeah he's like we've been extremely happy with him he's been extremely happy with us and it was pretty much giving somebody a chance to try something new was you know just jumping in and that was in the beginning like we weren't like oh and i was oh yeah i think that was in the first nine months of our nine joint and yeah maybe at that point had a bit more flexibility into saying like okay we're going to hire this this and that um And maybe a bit less stakeholder click, you know, culture click as well. So maybe right now we're putting more emphasis on this culture click.
00:50:43
Speaker
the That's interesting. And just, oh we might be a controversial like together, but I agree about your controversial point because when we hire at Mass had, we believe that any technical skills could be upgraded and work.
00:50:58
Speaker
Definitely. Especially now with AI persistence and all these things. yeah But do you, do you guys have this? No, do you have personally fear of missing out on all the, yeah, I stopped because like, you know, I, I'm in this chat with, with my investors and the investor, like the person who doesn't actually work in tech, like they hear about the tech was all the respect and love for them. ca Who uses this, who is already using this solution? Okay. It's taking production.
00:51:32
Speaker
so This is the question was, I was like, what is this? What is going on? I don't know it. Yeah. And somebody's like, we're not using this, we're using ABC, but next week we're like trying this one. And I'm like, who the hell are you? Like, how come we all want this solution? And I'm not. I think it's hard not to have a bit, at least a bit of your missing out because there's so many things. Every year we get this mad landscape. Oh.
00:51:59
Speaker
Yeah. Like machine learning AI and data, right. And every year like this thing just balloons and just you get more and more. It feels like a bit of a pre-company and explosion of tools. And that's why I think like maybe consolidation and bundling will be the next step. There isn't maybe enough money to support all of this ecosystem. I don't know. Like this is another interesting question. Um, and we might be riding on a AI boom. Sure. Who knows? Well, it's there.
00:52:24
Speaker
Yeah, I mean, like lipstick, right? Like it it got launched and then I think Nvidia is down 10% on um investor fears that maybe some of these things have been overpriced or we've been, you know, maybe again, nobody knows for sure what the future will bring. I can say that Bitcoin just went under $100,000 today. but Yeah. so No comment on that.
00:52:50
Speaker
I still look like I own one, but still, like ah there are so many trends and obviously have a full-time job and also been fluent in all those stuff. Oh, yeah. Again, like you know crypto and AI. so Last week, like the Trump administration passed this photo, this Stargate.
00:53:07
Speaker
and AI meme coins, so it's a subset of meme coins, which is just related to AI, like PEP AI, I don't even, I might be saying them wrong, I know they exist, I don't, um any of them, I know they just exploded, they had the field and I was like, oh, I should have, of course, you know hindsight, of course I should have bought meme AI. Tell me what we need to buy, because I have no idea what to buy. No idea, deep-sick coin, you know, see if it exists.
00:53:33
Speaker
that I'm going to totally make it. um I don't know. oh i Again, I think in the end, solid principles, long term. It can't all be found in memes and booms and stuff. like Those are quick gains and quick losses.
00:53:50
Speaker
I mean, definitely there's a lot of usage of this kind of AI stuff of that's happening with us. Maybe we're not doing it properly right now. and there There's definitely a lot of risks involved. It's very interesting to use it as a coding tool to help you you know kind of leverage and churn out stuff faster. But there's a lot of risks. I've seen a lot of people kind of complaining about you know it takes more now to debug the AI solution. they did It took to debug a human solution because again, like writing the code is the first step. Somebody else has to read it later. Hopefully. Yeah. Um, yeah, but again, probably deep-sick is pretty good with writing code. I don't know. I haven't really, uh, used it. I like, I think I tried to get online access. Try to get access. I don't remember what it takes us to seg fast.
00:54:47
Speaker
Yeah, exactly. And accept the startup life, but it's also real life with kids. Yeah, so first thing that box up Chinese AI chat box parks market turmoil for rivals and sell off sell off shares in major tech companies.
00:55:01
Speaker
is the top rated free app on Apple's App Store. Okay. These are all being extremely fast. There's a lot of volatility, instability, and maybe this is kind of another reason that sometimes you have to just go with the flow. Again, by building something, you don't necessarily know how it's going to look like. Like, okay, we need data. Okay. Let's see what we can build and let's see if it's good enough. Right.
00:55:27
Speaker
And just to be intelligently honest that we cannot hear unless you are in now some healthcare startup, we cannot hear with it.
00:55:40
Speaker
Yeah. Okay, Bogdan, thank you so much. It was a pleasure and always pleasure talking to you and having fun. and i I like, I enjoy your healthy human perspective on data. How like you're trying to make it, this is what I see from interacting with you a lot. You're trying to make it solid, reliable at the core, like at the infrastructure, so it could scale. And then afterwards we're going to see what use cases we have and try to make them work. I do appreciate this approach. Yeah. Thanks. And yeah, thanks for having me. Absolute pleasure.