Introduction with Guest Sergei Zakolenka
00:00:00
Speaker
Hi all, it's Yulia from Straight Dita Talk and I'm back with a new episode together with Sergei Zakolenka. um Suresh, I'm so excited to to have you here.
00:00:15
Speaker
So jump in introduce yourself um briefly and let's kick it out. um So, he is akaenka I'm a um ah founder of a startup currently based in Berlin, but they used to live in in the Pacific Northwest area quite a bit. and I have to travel to Bay, so it's on my mind right now.
Sergei's Journey from Engineer to Founder
00:00:39
Speaker
What we do is we ah we're building a compute platform, but we'll talk more about this in a second. In terms of my own story, um
00:00:51
Speaker
I'm a computer engineer by education who somehow ended up being a product manager, would then launched a startup in the 2010s, which went interestingly and sideways, but still was a bunch of learnings for for me.
00:01:06
Speaker
And now I'm at my ah third startup because there was a small startup in VTIN as well. So I'm at my third startup after a career in big tech companies um and living the dream.
00:01:19
Speaker
Wow. I didn't realize that you're a serial founder. honestly. Thankfully, I'm not a serial killer, but yes, I'm a serial founder. And hopefully the third one is the one.
00:01:31
Speaker
is the one You know what? Actually, once you pronounce that about the serial founder, it actually feels very much the same because you go through so much shit.
00:01:44
Speaker
It does sound very dangerous and almost anti-societal. he heard Exactly. This is good. ah
Experience in Major Tech Companies
00:01:53
Speaker
Listen, Suresh, also what is very interesting that you've been very humble about your experience, like you've been product manager in Fox Get Ready ah for Dataflow at Google Cloud Snowflake and wrapping it all up with Databricks.
00:02:13
Speaker
Yeah, so, um you know, I want to get into your latest startup, TowerDev, but ah what made you, you know, like, what made you go again and write this, um you know, very uneasy,
00:02:32
Speaker
um right, I guess, after being in security jobs, you know, at greatest companies out there to work in in in data?
00:02:44
Speaker
Well, Hula, you always want what you don't have. um So um when you are... Actually, right now, I don't want to go back to a big tech job. and I'm actually very happy where I am right now.
00:02:59
Speaker
My next job, if there is a job after the founder's job, will be probably entirely something else. But um yeah, and they when when you work in a big tech company, you always kind of hear about... ah the true start-upers or the true technologists who launch their own um tech and businesses and the and they persevere and they become something better or they they they you know they contribute to the society.
00:03:31
Speaker
I guess you don't feel it as much when you when you work in a big company. um Money is not everything. After you have enough money, and money is not everything.
00:03:42
Speaker
So actually, you know, the but the pyramid the experimentymet of needs that people know of. Yes, you do need to provide for yourself and to be secure. Feel like secure, you're not going to starve.
00:03:55
Speaker
But after you have the basic security, other needs emerge. you You want freedom. And that's probably what drives most start-uppers.
Introducing TowerDev: The Data Compute Platform
00:04:08
Speaker
Well, we can talk about that, but so let's, ah this is very interesting. um But i genuinely interested in what made you give up this very secure jobs to build TowerDev. So what is TowerDev? What is it that you believed so much to leave one of the best places, you know, um tech companies? Yeah.
00:04:36
Speaker
So Tower creates a new generation compute platform. um we We think that data is going to be the basis of everything. um It is already the basis of everything.
00:04:47
Speaker
um I mean, unless you're a farmer, even if you're a farmer, your tractor is still... Nowadays, it actually has a bunch of ah computing power in it, and it provides supplies data to...
00:05:01
Speaker
via satellites to John Deere and they collect all the data and actually they can ah direct your um your farming equipment. So these days um we are in a world of data producers, data consumer processors and data consumers.
00:05:19
Speaker
um And ah interestingly, though, ah the dominant vendors right now who are the market leaders, and the folks who ah whose names you mentioned, and they all use technology that was developed about 15 years ago.
00:05:37
Speaker
So it's actually a very old technology. it's And it's actually the same. ah In 2010, someone at Google wrote a paper. I know the guys who wrote it. sort of I know the names. They're great people.
00:05:52
Speaker
They wrote this paper that defined how distributed data processing should war ah should look in the future. And after that, all the innovation that happened in the data industry was based on one or two papers that were written in 2010, 2011.
00:06:08
Speaker
Apache Spark, Dataflow, BigQuery, Redshift, separation of storage and compute, um dividing one big node into smaller, smaller nodes, and let's use steep hardware in the clouds to execute all data processing.
00:06:27
Speaker
So all that stuff was invented in 2010, 2011. ah Since then, lots of things happened. ah the The tech didn't stay in the same place.
00:06:39
Speaker
And so ah what you what people are buying right now is basically 12, 13-year-old technology and that is overly complex ah that actually costs quite a bit.
00:06:54
Speaker
And it's not easy to use.
00:06:57
Speaker
As I was working at the companies you mentioned, last Databricks, before that Snowflake, before that Google, before that actually Microsoft and AWS. Yes, I hit all the ah big sevens. So I have stock from all seven in my portfolio.
00:07:13
Speaker
um Now we know your reach. No, i'm i'm I will not starve. I achieved the kind of the first level of safety, starvation in an old age.
00:07:26
Speaker
ah But after talking to probably 2,000 customers over that period of time, um i was a PM, I was a product manager. My job was to go and talk to customers, ah try to explain them what we do, try to learn what they want.
00:07:42
Speaker
Sometimes we build the features they want. Sometimes we convince them to use the stuff that we have. ah So after doing these 2,000 customer conversations, um Eventually, ah even the stupidest person will realize what customers want. What customers want right now, there's a new generation of data engineers happening right now.
Trends in Data and AI Technologies
00:08:07
Speaker
um They want three things. and they want ah well and Everyone is in an enamored with AI. They want to do data and AI. They want to create data for AI. They want AI to create data for them.
00:08:22
Speaker
um So ah we're not escaping this trend and it's beautiful and great. um It's real. It's there. But practically what it means is Python.
00:08:34
Speaker
Python is kind of the language that binds data engineering and AI. And this is what data engineers want today. They want frameworks based on Python ah with a bunch of good libraries that were developed.
00:08:47
Speaker
um Trend number one. um tre but Trend number two is, well, people are tired of having to learn these complex ideas that were necessary 15 years ago when the data processing technology was created, Spark and Dataflow and others.
00:09:09
Speaker
They're tired of having to learn this. ah knowing that there are newer and easier to use ah Python libraries that don't require them to do this.
Simplification in Tech Processing
00:09:19
Speaker
So the the second trend is and folks are moving away from the old tech and moving towards simplified, non-distributed processing, single node, ah very simple. You kind of you think as if you're you're running on your laptop.
00:09:37
Speaker
and the back And the backend technology kind of does the magic and scales automatically for you. And the third trend is... um People are also tired of being locked in into the big tech companies.
00:09:52
Speaker
um Folks in big tech have created world gardens around their platforms. ah the kind if They draw you in and they give you great features and and benefits, and then you have to stay with them.
00:10:08
Speaker
um and And folks want to have choices. ah they They want to be able to use ah BigQuery for one task, which it's great at, you know, SQL-based, large-scale processing.
00:10:20
Speaker
They want to use Spark-based technology for something else, maybe ah ML integration. And they they want to use Python only for um things that don't require BigQuery and Spark.
00:10:32
Speaker
um So choice is a is a big need from customers, ah and we they also capitalize on this. So Tower is combining these three trends. three trends we We've developed technology that doesn't use distributed interfaces. that you You can write your applications as if you were as if they were running on your laptop, and we will automatically scale the infrastructure for it.
00:11:00
Speaker
um We don't have a framework you have to learn. Like literally, we don't have a framework you have to learn. You give us your existing stuff, we will run it. And we will magically ensure that it doesn't crash, it's secure, it's scaled, et cetera.
00:11:17
Speaker
It's another kind of big benefit. I don't want to but tell people what to do. um They give me their applications. I will happily run them. And I right now am tower.
00:11:30
Speaker
um And lastly, and I'll finish and shut up and let you maybe I'll speak up. but help And lastly, we we um we respect the the fact that folks want choice. um And so we coexist with big data engines.
00:11:47
Speaker
And the way we coexist with them is by, like i like to talk about, we complement them and we coexist. And it's truly the the reality because, of course, Snowflake is great.
00:11:58
Speaker
Databricks is great. ah BigQuery is great. i'm not you know We have to ah be a a big, happy family of data providers solving problems.
00:12:09
Speaker
users' problem So the way we coexisted by running on the same storage layer, this is actually the fundamental idea. um Storage is now or should be, and it's actually a big, big trend. The industry is pushing for it.
00:12:24
Speaker
It's happening right now with Iceberg. It's happening with Delta Lake. um Customers are pushing vendors to adopt these OpenTable formats, which allows customers to then go and use Snowflake for use case one, Databricks for use case two, Tower for use case three, and so on.
00:12:48
Speaker
Well, it's interesting because it's also very bold. ah You basically...
00:12:56
Speaker
I don't want to say the word compete, but you give the customer... Everyone competes with everything. I'm competing with soap. because a user can buy a tower or soap.
00:13:06
Speaker
So in the world of, you know, we we live in a world where are people have choices. um I like to think more of it as, and sorry for interrupting, I like to think of it as more of, I don't put the compete hat on me. I put the, what can I do together with the others?
00:13:26
Speaker
Yeah. Yeah, so and that's how I think. I reacted to the word compete because I don't really think of it as competing. It's more about ah being you know letting users to choose what's best for what they want.
00:13:42
Speaker
Okay, so forgive me for silly questions, but you still need ah servers to do that, right? Where like, so you actually have the servers because it is also, and and you know, like if we think about Google, they have the, as far as I know, the bigger compute ah servers all over the world, like the presence in almost every country, region, whatever you you want.
00:14:14
Speaker
And this is not only about the legislation and data being stored somewhere. It's also about the latency. ah How about you guys? How do you step on this field?
00:14:26
Speaker
Because this is a latency question. i
TowerDev's Multi-Platform Approach to Latency
00:14:29
Speaker
want my Python application to be fast. Like how are you guys managing? with This is, you know, the on the top of my head, the question I... Yeah.
00:14:38
Speaker
um Well, ah we are multi-platform. It's it's like kind of a standard approach to to solve this problem, to reduce the latency to the end the user.
00:14:50
Speaker
um Every data platform ends up being deployed into GCP, AWS, and other. There's no way around this. We have to do this.
00:15:03
Speaker
So we they are quite happy running on the infrastructure of Google, Amazon, and Azure. um And we can be in the regions. We have deployments in in the regions close to the end customer.
00:15:17
Speaker
And that's how we solve the latency problem. You're right. um
00:15:24
Speaker
at at the end, everything needs to run on some computer at some data center in some dark storage in Utah. Um, and
00:15:38
Speaker
you know I give you the kind of the big vision and the big story, simplicity and ease of use. At the end, someone needs to solve the problem of a byte traveling from your laptop go over many networks towards that darkroom in Ohio, because that's where the majority of data centers is.
00:16:00
Speaker
um So we solved this problem. And and actually, the ah the cloud provider solved many of these problems. Yes. We are using standard cloud technology provided by the three major vendors, virtual machines, blob storage, ah networking.
00:16:24
Speaker
That stuff is is common. ah We're not trying to reinvent how to build s three or S3. how to reinvent EC2. um What they are trying to do is, um ah we go one layer up to the application level.
00:16:42
Speaker
um ah We expect a data engineer to come to us and bring their Python code to us. right ah And then we figure out a way to run this Python code on a virtual machine ah in a way that is safe and scalable and cost-effective.
00:17:00
Speaker
that's our That's our contribution to world peace. I can tell you that this is um So i don't I don't run myself any Python applications just to, you know, like I don't do that.
00:17:14
Speaker
ah But what I see from our customers, because I'm i'm coming back again to what I see from the customers, you know, while everyone on LinkedIn talking that every ETL or data transformation, not ETL, but data transformation is happening through DBT and how popular DBT is,
00:17:33
Speaker
I can argue with that because I see how many ah data transformation is happening through Python. And I don't believe Python is going anywhere anytime soon.
00:17:44
Speaker
It's just getting stronger and stronger because now I see like this is a flexibility. You don't need to learn other frameworks. You can do lots of stuff with Python. And and i did the...
00:17:56
Speaker
or Point one that you mentioned, how well synergy between like for data engineers and managed data via Python for a long, long time, and now they can use this as a prime language to do the AI and then support the AI use cases.
00:18:18
Speaker
this is This is actually getting in making so much sense to me. So um do you want to tell you know some use case you had? Because ah if you had one or maybe your your task use case, what are the improvements you actually guys provide?
00:18:37
Speaker
um The use cases we serve are some of the ones that you mentioned. So ETL, EOT, you have data in in some file storage. It needs to terminate in a BigQuery or a Snowflake or Redshift instance.
00:18:54
Speaker
um Great. So people write, kind of this transformation code and ah what they usually do, they end up running it on a self-managed VM. Not super effective. You basically have to create your VM, shut it down, start it up, scale it. It takes a lot of time.
00:19:12
Speaker
We replace all of that. We we can run it for you. Other folks maybe run this set of ETL scripts on Lambda functions or serverless functions. um works fine in some situations.
00:19:27
Speaker
In others, you end up hitting limits for the duration of the execution, or you hit limits on how much memory you can consume. So it's serverless functions are great for really small tasks, ah but when you start doing data processing, they're actually ah less great.
00:19:45
Speaker
We solve this problem because we right-size your resources. So ETL is one use case. Another popular use case is ah a batch inference in ML.
00:19:57
Speaker
um ML is divided into like two parts, right? You have training part and inference part. Yeah. yeah On the inference part, they' actually, it's a branch of two subcases or two use cases.
00:20:13
Speaker
Sometimes you want to do inference on 100 input records or maybe a thousand or even a million because you and you've collected this data over a period of a minute or a couple of minutes.
00:20:27
Speaker
Let's imagine you are a website, and you collect logs of visitors to your website. ah You get, say's say, youre a big website, you are a ah e-commerce provider, you get, you know, thousand visitors every minute.
00:20:42
Speaker
ah So you can collect a pretty large set of logs of visitations to your website within a very short period of time. And you don't want to individually run inference on, well, was this visit from a bot or was it from from a human?
00:20:59
Speaker
um You have a model that you trained before, and this model is able to detect um it's a visit by a bot or it's visit by a human. um Many customers want to batch up the information.
00:21:14
Speaker
ah the inference calls into, let's say, a hundred or a thousand at a time and run the inference this way. Tower is great for this. we You can do batch inference on us.
00:21:25
Speaker
ah You can load a model into our ah ah power node. ah This gives you ah latency and cost benefits because then inference becomes local and and you can very cost effectively and very fast do a thousand evaluations of if someone something was an anomaly or something was a ah mr um um say a ah non-human trying to probably um make a transaction on your system.
00:21:58
Speaker
um We don't yet support, let's say, single-pointed inference calls or maybe a stateful ah API hosting.
00:22:09
Speaker
This is something that we don't do today. I'm not telling but um not sharing any future plans, but this you know we're not doing this today. ah But there are a bunch of other use cases we could be doing on Tower, and we just started. ah starting We support these two, batch inference and ETL.
00:22:28
Speaker
and probably support more in the future. Well, listen, talking about the future, ah how you guys, um you know that last year, you probably know that you are active angel investor yourself, that 42 to 46% of the deal and deals, venture deals in the United States were into the AI startups.
00:22:56
Speaker
genetic AI or whatever. you know so how do you Because we are talking about very hardcore problem here. We're down to earth ah and kind of tacky. There is um um want to say how how you guys are thinking about this AI influence onto your startup. do you see any... Yeah, I know you as a startup founder, you're going to say, yeah, of course, we see a lot of opportunities to add AI and, you know, sprinkle everything with AI. But ah just, you know, what well could be the possible use cases for you guys to adopt AI? Yeah.
00:23:40
Speaker
but when Tower got founded, ah we initially a thought we would build something for AI.
TowerDev's Pivot to Supporting Data Engineers
00:23:46
Speaker
We would be an AI platform. um ah Because all of us are but basically a social animals. We like to run in um in groups and follow leaders. um But then we realized, actually, sure there's quite a bit of...
00:24:07
Speaker
again, startups that are trying to do this. And so we pivoted away and we we focused on a group of people that we thought were underserved. And that's those data engineers who are trying to not use Spark and who are not trying to ah do distributed processing, but who want to do their Python ah transformations and integrate with AI libraries.
00:24:32
Speaker
So the way we think about Tower is um we We are a platform that but runs any Python code. ah What is the language that is um that's used to large extent by folks who write these adjuncting AI agents or who write anything related to AI?
00:24:51
Speaker
What is this one language? It's a third question. It's going to be Python. It's going to be Python. It's Python. and The answer is Python. um So ah from that perspective, ah we built the perfect platform to extend data engineering into the next the closest next cousin of data engineering, which is um AI inference and ML training.
00:25:18
Speaker
um I don't think we'll do Lots of people do will do training on us, but I think lots of people will do inference on us. They will load the models, and especially this is a new trend in and the AI space, ah open models that can fit um a single node.
00:25:37
Speaker
Yeah. um you've You've heard of DeepSeek. It's as good as the the big guys, but that it can fit into... much and i beef you A beefy node, but that's still one node and it costs you 10 times less than um then the thing that you need to buy from AWS.
00:25:55
Speaker
So we think that people will load models into our nodes and they will do local inference. They will enrich that data. What is it good for? you know You might ask, why is local inference a good thing?
00:26:09
Speaker
It's a good thing because now you can um you can enrich data with insights ah based on models. ah You can produce the outputs of your ah chats ah not by paying ah thousands of dollars every every day, but by you know paying reasonable amounts just for the hardware where the stuff is running.
00:26:36
Speaker
um That is the future we think we are going towards. um Local inference on nodes with models running inside of our processing nodes with data processing and inference happening simultaneously on the same infrastructure.
00:26:52
Speaker
This is very interesting because I had the chat with my previous guest, Andriy, right? And there is a clear...
00:27:04
Speaker
sign that AI technologies are not just getting better, but but it's gay we are getting more and more opportunities to use them in a cheaper way.
00:27:18
Speaker
i think it's a great democratization of AI because not everyone can, as you mentioned, they even not every business i can afford to have this fat-ass nodes from or
00:27:34
Speaker
GCP, the case is to do more with less. No, not even more. I think with AI, the case is to do the right thing. ah Well, to do it at all, I would say. to the Absolutely, yes.
00:27:51
Speaker
um If you know, for example, that to to be able to um ah
00:27:59
Speaker
Let's pick a use case, financial financial decision making. So you have to make a decision on ah offer a credit, make a credit offer it to a person or not.
00:28:10
Speaker
You probably want to involve credit history and ah perhaps some recent purchases the person does. don't know. ah But it's kind of it's a process that involves collecting of data and making a prediction of default of that person on the loan offer that you will make.
00:28:30
Speaker
ah There are probabilities involved. you will You will want to calculate. Will the person with a probability of more than 50% default on your um um So this this all involves a lot of data collection and a lot of data processing. And then at the end, you run all these features through an inference model that spits out binary decision, give the loan, don't give the loan. Maybe they give probabilities.
00:29:03
Speaker
Of course, fintech companies, let's say mortgage companies, they invest a lot in developing these sophisticated models that produce high accuracy results.
00:29:14
Speaker
ah To run them, ah you have to run them on expensive compute. Yeah. um Let's say the minimum cost of running it would be $100,000 maybe a million dollars.
00:29:27
Speaker
um or maybe a million dollars If you knew that the minimum investment you have to make
Affordability of Prediction Models for Small Businesses
00:29:33
Speaker
to use this technology is a million dollars, um and you are a smaller shop that is just starting out, ah you you're just getting into the credit business, I mean, you can't even afford using this tech, right? So you will be using continuing using excel spreadshets for Excel spreadsheets until you made enough money to be able to afford the million-dollar infrastructure needed for state-of-the-art.
00:29:59
Speaker
Now, if someone came came along and offered you the ability to run these sophisticated prediction models on ah infrastructure that only costs $10,000.
00:30:10
Speaker
ten thousand dollars then even in your current state of a of a starting out company, you could afford using it. And so this enables um you know folks who are not well off, but you know businesses that are not huge, small businesses, um maybe traditional businesses that are not as heavily capitalized as ah as the needers of an industry.
00:30:37
Speaker
in any industry. we We talked about a financial services situation, but it's basically any industry. um Folks who are not well off or don't have the capital, now they can use the sophisticated tech, and that's actually blowing your mind. then then Then you can start talking about the economy behaving differently.
00:30:57
Speaker
You can have growth from areas that previously were declining. You know, Sergey, that sounds awesome to me. Just, you know, the point that, you know, I have.
00:31:13
Speaker
So when we talk about, you know, big organizations that have lots of money to, to you know, to run those models, to train those models, they have lots of data. And they are high-end companies with ability to invest in that, which means also they can invest in their ah skilled data engineers.
00:31:34
Speaker
And this where we're heading into the skimming problem because this organization normally has you know has skimmed high-end data engineers sophisticated enough.
00:31:45
Speaker
Right now, we're talking about the problem that itself requires some, not some knowledges, but like... um substantial knowledges.
00:31:58
Speaker
let's Let's just be transparent on that. And if you think about Mitzha's organization, do you really believe there is um
00:32:09
Speaker
a concern? Like, what I'm heading into The democratization wave is happening. You're making it cheaper for smaller companies, organizations to have it in.
00:32:24
Speaker
But do they have skills even to assess the problem they have to search for the solution like you provide? You see what I'm talking about? Yeah, yeah. Look, the... air um So Tower deals with two use cases. One is data engineering. The other one is inference.
00:32:44
Speaker
um For inference, your what you described is true. ah Selecting the model, fine-tuning it, even knowing what inference is. That's already kind of requires a master's degree and a large salary.
00:33:00
Speaker
um We can make some... So for the inference part, I would say um reducing infrastructure costs helps somewhat, but um yeah maybe an average org ah somewhere in the playing fields will probably not be able to start using AI just because we made it 10 times cheaper.
00:33:23
Speaker
um For data engineering, actually, it's a different story because data engineering doesn't require that many skills. ah We have customers who um they have, let's say, a 100-person organization, ah but they only have two data engineers.
00:33:41
Speaker
and Actually, it's one data engineer and a new hire, and the new hire and needs to to be onboarded and ah knowledge transfer. yeah So what what this customer did was they used us as a way to um involve internal customers that they have, people who are asking them for changes. Because typically what happens is this one core data engineer ends up getting requests from customers 10 data analysts and project managers.
00:34:11
Speaker
I want this table to be created in Snowflake or or BigQuery because i I can only do SQL and I you know i cannot process textual data. So please get this data into BigQuery.
00:34:24
Speaker
um So that this data engineer ends up ah dealing with 10 internal customers who are asking to create data pipelines. um And usually for the first version of the data pipeline, you do need some skills. you know It's not very simple. it yeah You're not just writing 10 lines of code.
00:34:45
Speaker
You typically need to know what you're doing. Schema changes. You need to handle schema changes. You need to be able to do incremental data loads. ah You need to handle failure or ah ah data that doesn't fit the destination schema.
00:34:59
Speaker
So the first version takes some skill. But then what happens is your 10 internal data customers, they have small changes once in a while. Yeah, I want another field to be added to my table.
00:35:13
Speaker
I want ah maybe a yeah summary table to be created on the basis of the first table. All that can be solved in SQL. But say the first ah example of I want another field, ah that's such a small basic task that actually a data analyst who only knows SQL should always almost be able to do themselves if they see some patterns.
00:35:36
Speaker
Like if if the rest of the pipeline looks like this, then adding and it moves 10 fields from a to B, then adding 11th field that also will move a to B. It's the same pipeline. Nothing will change in schema detection or incremental loads.
00:35:55
Speaker
So adding one additional name of the field that needs to be transported is not a hard problem. So but what this customer did was they used us as a way to um bring in the internal customers and let them make small changes.
00:36:12
Speaker
ah We just released a new feature called Teams. ah The concept of a team is um you can have you know a sophisticated, maybe a skillful data engineer leading the team, but then you have other folks who make small changes and they do them in a self-serve fashion.
Empowering Internal Customers with TowerDev
00:36:33
Speaker
And it's it's i mean it's not a complex feature, but that the it allows you to gain time to work on strategic projects. You as a this lead engineer, ah while you're not blocking ah your data analysts who need just the field to be moved, you're not blocking them. They can do it themselves. It's easy.
00:36:54
Speaker
um it kind of you know enables lots of innovation in a company. No, it gains back time, basically, it buys time back for everyone.
00:37:06
Speaker
And it's it was something small, but the change in workflow could be drastic. So this is this is whole point, I guess. No, sounds good. um Listen, we we talked about... What also fascinates, fascinating to me is the fact that you mentioned that, you know, you started as um part of society, you wanted to jump on a hype, build something AI-ish, and then you realize, okay, yeah, AI-ish is nice, but there is really um problems that we can solve and we know how to solve them.
00:37:40
Speaker
And there is underserved market and audience, which is which is cool. I admire... the honesty here and I admire the you know you guys are not optimizing for VC money hype as well but it also sounds to me that you did a great ah analysis of today's trends what do you see is next beacon data like what are the trends that you guys anticipate in your future
00:38:16
Speaker
Well, I talked a little bit about this ah early on, kind of to to explain why we built Tower. It's
The Essential Role of Data Engineering
00:38:24
Speaker
the same three things. I'll just ah maybe ah ah summarize them again, maybe provide a little bit of um additional context.
00:38:38
Speaker
Number a as they used as they say, number A. Data is not going anywhere, and it's the basis of AI or not AI. um We'll always make decisions based on data.
00:38:52
Speaker
We'll probably make bad decisions if we don't, if we cannot extract good insights from data. But humankind, our brains are built such that... the We want to justify why we are doing things, and we usually justify by looking at tables or spreadsheets or documents, and spit out our decisions as the basis of our analysis of data.
00:39:17
Speaker
So because of this, since we are dealing with humans, we are not yet dealing with aliens who behave differently. As humans, we will always make decisions based off ah based on data.
00:39:29
Speaker
Therefore, the needs for data engineering, data processing, ah extracting insights from data ah will be there. And it's a good business to be in.
00:39:40
Speaker
We want to be in this business. We want to be be like being in this ah industry. My co-founder and I, we've spent the last 15, 20 years either building databases or data processing frameworks that run on databases ah or infrastructure that runs the data processing that runs on databases.
00:40:10
Speaker
maybe Maybe there's bright a bright future for data engineering as a professional ah profession and data processing as an activity. Can put that?
00:40:22
Speaker
irian go ahead no I mean, like I cannot help myself, but don't you think that data engineers are at risk given all the advancements we have with today's LLMs?
00:40:35
Speaker
Well, we are at risk. I mean, and even police police people are at risk. when There will be robots patrolling the streets in in in know next year. Of course, we will evolve. We will we will behave differently.
00:40:48
Speaker
look when I, for example, when I write examples in Tower, I now use um um code generator enabled IDs, but I still kind of know the concept concepts that's and I know what I want to get out of at the end.
00:41:06
Speaker
um So my my job as an engineer is changing. i I have to think more. um I need to understand my stakeholders, what they want.
00:41:18
Speaker
um Yeah, so data engineers will become, they will will go up the chain. They will have to deal with people. I know it's bad. I know it's super hard. and No one likes it, dealing with people. right yeah We'll have to actually understand requirements more. We'll have to understand the players and how person A and person B cannot talk to each other but have conflicting requirements.
00:41:43
Speaker
So have to do a conflict resolution. can't solve skills like folks leverage. just Yeah. yeah i mean that's it's that That skill will never never go away.
00:41:54
Speaker
um We will have to create requirements for that all of that automation. I had someone has to say my group of users, um I mean, they told me 10 different things, but I sense five different requirements here and three of them can wait.
00:42:13
Speaker
But these two need to be shipped tomorrow because the business is trying to achieve that level of sales. Now tell me what kind of an LLM produces this right now?
00:42:26
Speaker
that's Manos AI is able to create me this set of requirements. No, not yet. Okay, so we still have a year until MCP servers will be spitting out those requirements.
00:42:39
Speaker
um see it So it kind of yes, our jobs will evolve. We'll probably become more analysts and in our and our thinking.
00:42:53
Speaker
But I think it's still good to understand how the ground technology works. um Because if you don't know what a prompt is and why the LLM spits out what it spits out, then you'll be...
00:43:08
Speaker
then you'll be instead of guiding the LLM, you will be um um like ah throwing doubts at it, hoping your next version of the prompt will end up in a in a better ah in a better output.
00:43:27
Speaker
So, yes, we are at terrible risk, all of us, but there is um ah light at the end of the tunnel. The light means you have to learn soft skills. You have to upskill to the level of understanding requirements from your business users, ah maybe being able to ah formalize what people want.
00:43:47
Speaker
being able to but prioritize um ah based on business priorities. And that will be the role of data engineers in the future.
00:43:59
Speaker
no And also, but as you mentioned, learn and understand the basics. Because like I see we cannot get away without knowing them as well. It's also like it to to to understand that on a a deeper level, I guess.
00:44:14
Speaker
um Yeah. So I interrupted you. You mentioned that this trend, data not going to go away. This is the first trend. Yeah, I get it. This is kind of the base that makes it all...
00:44:32
Speaker
It allows us to stay in this industry and keep going. But um I do believe we are ah in the middle of um and what I call a perfect storm. um We talked about this ah shortly before.
Coexistence of New Data Platforms with Existing Tech
00:44:45
Speaker
um a perfect storm is a meteorological condition that happens when several other meteorological conditions simultaneously happen in the same space.
00:44:58
Speaker
um It usually results in some very extreme ah behaviors of nature. For example, um you can have waves of hundreds of meters or thousands of feet in height, ah which is a terrific thought or terrible, actually, not even terrific, as I say.
00:45:18
Speaker
ah i mean i would be horrified. Yeah, it's a horrifying thought of you know facing a wall of water in front of you. So we we think there's a perfect storm happening right now in the industry. It's not horrifying. It's a it's a good thing.
00:45:33
Speaker
um the The other conditions that are causing the storm involve ah advances in infrastructure.
00:45:44
Speaker
um Amazon, Google, and Azure did a good job on developing ah very nice VM technology, ah great storage technology. So they've built great building blocks.
00:45:58
Speaker
They have the time since 2010s to build great building blocks. Now, a vendor like ours, like us, or several other vendors as well, we now have better building blocks to build data platforms out of.
00:46:13
Speaker
on top of this. um Yes. So we we are using the better compute, the better storage ah to build a very simple system for users, ah but that is still very powerful and can do everything they've used to do in the past.
00:46:31
Speaker
um I've been being a bit bit the abstract to show you the big picture, but and if in practical terms, what it means is Tower is single node based and doesn't care about distributed computing.
00:46:45
Speaker
ah Tower is using Python instead of the Spark and data flow frameworks that we used to have in the past. um Tower is assuming and that we have to coexist with other vendors, so storage will be shared by everyone.
00:47:07
Speaker
ah We have to be like nice, you know, nice children on the sandbox. We have to play with each other. of without fights. um So we have to coexist. And that's that's kind of the basic principle that we have in Tower. We will coexist and complement the other vendors in the sandbox of the customer.
00:47:27
Speaker
That's very inspiring, the picture, how you draw it to me. We're going to be a nice kid in the sandbox. We need some positivity in this world, right?
00:47:38
Speaker
No, I love it. No, I genuinely love it You know what? ah What you're talking in right now, and it reminds me of the talk I was two years ago at DBT ah conference, call us and i I was at the talk of Jordan Tiganya, right, from Mother Duck.
00:48:00
Speaker
And the fact that he was describing, okay, so now we can no run something. ah no, no. no the The fact that he emphasized is that we don't need BigQuery or Snowflake. We don't need all those you know giants as well as Databricks.
00:48:20
Speaker
but and At that time, two years ago, I had a very internal conflict saying, but how come these companies are growing 30%, quarter over quarter, or even more, while someone says that we don't need it while we got the completely different signals from the market?
00:48:45
Speaker
And right now we should say, and I mean, I'm just comparing, ah you don't say that we don't need those. We just need to find a way how to simplify some things, how to play nicely.
00:48:57
Speaker
um it It actually resonates more with me right now. like we When those giants are here forever, that's true. Even the spark, like how it could be hard and difficult, but it's not going to go away.
00:49:10
Speaker
It feels the same. yeah ah Yeah, I think so. It's it's hard to change. um our developers, what what kind of tools they prefer. Our customers are developers and they are not easily influenced by words.
00:49:27
Speaker
um Tell even more of about that. yeah Yeah, so it's ah you you kind of have to ah show them and and use maybe a subtle language. And it's mostly by showing and by um um also giving them the flexibility of not using some parts of your system.
00:49:48
Speaker
um But kind of the point the the point I'm trying to make is this. um
00:49:54
Speaker
Power, in a sense, is like mother duck, but for Python. and we I worked with Jordan, and like we signed up for the same principles that he is advocating for, a single node, distributed processing not important or not necessary anymore.
Complementing Technologies with TowerDev for ETL and Inference
00:50:10
Speaker
um ah So we kind of think of ourselves as we are like mother duck, but for Python data applications. He's focused on scaling DuckTB.
00:50:21
Speaker
ah we are focused on scaling Python data applications. um
00:50:28
Speaker
The approach that we take is probably better because we say, yes, I'm not going to influence in one year everyone and and um I'm not going to convince everyone in one year to suddenly start using Tower.
00:50:44
Speaker
ah People have been developing and, don't know, give me some ah old school language. CoCoval used to be the old example, but I think it's truly dead now.
00:50:55
Speaker
I don't know. There's probably some people who are doing C++. plus Okay, C++. plus plus C++ plus plus has been out there for many, many years, and people are still using C++. plus plus yeah And actually, not half of Snowflake is written in C++. plus plus A bunch of Databricks is written in C++. plus plus So that language is not going away. It's good for some things.
00:51:19
Speaker
um why fight this war? I mean, it's a stupid conversation. you know Python is better than C++. plus plus I don't know. It's probably not better. It's better for some use cases. It's not better for others.
00:51:34
Speaker
i mean, Python is definitely better if you want to gently process some, ah gently transform some input records and run the the latest DeepSeq model on it.
00:51:49
Speaker
Python is definitely better. If you're using C++ plus plus for inference against an LLM, something is wrong in your process. But for other things, C++ plus plus is great.
00:52:01
Speaker
So we are goingnna we're not going to fight the war of or trying to convince everyone that our tech is better. um think we want to to complement existing technology to give users better choices um for ETL, for batch inference, um on shared data, ah make it really simple, ah not require them to learn new frameworks.
00:52:28
Speaker
That is our goal in life. That's... ah No, that makes a ton of sense to me. Like, given the fact that you mentioned, you don't use screwdriver to open a jar.
00:52:40
Speaker
Like, yeah, this is a very realistic approach. I like it. Very down-to-earth. We can make it happen. And um I love it because it's also...
00:52:52
Speaker
Let's put it this way, again, from what I see at at where you know working with our clients at MassHead, there is tons of Python. There is so much Python applications. And and i I bet if you're gonna come up with to data engineers and tell them that they're not gonna have to deal with the all um hosting and dealing dealing with DevOps team, because this is also um conflict for for resources, time By the way, I love SQL.
00:53:23
Speaker
I spent half of my professional career dealing with SQL. So I think um data engineers, they can be kind of divided into ah two groups. ah Data engineers for like SQL based approaches and data engineers for like other procedural logic. And usually they go with Python, although there is some Scala kind of old school Scala folks as well.
00:53:48
Speaker
Again, I love both of them, ah both of these groups. it's ah It's fine. SQL people, you're great. Don't get offended by me by mentioning Python more than three times.
00:53:59
Speaker
ah DBT is a great product. If you're in a SQL stack, then ah orchestrating your model updates using DBT, I totally get it.
00:54:11
Speaker
It's just for the Python people, it doesn't exist. We are trying to make it as easy or simple for the Python folks as you so SQL people have it in your world.
00:54:25
Speaker
um Yeah, so, um you know, folks. We're not trying to sell Python to SQL. It's a war that we are not fighting. We're just trying to help another group of folks who don't have it as nice.
00:54:39
Speaker
at listen, how how people can find you? Tower.dev. It's very simple. Tower.dev. um And there's a link that kind of allows you to um sign up for a wait waitlist.
00:54:54
Speaker
It goes directly to me. um but My co-founder and I, we get the notification and we will immediately respond and ah try to probably to book a meeting with you to learn more about you.
00:55:07
Speaker
um And we are happy to invite you to our private beta. Beautiful, beautiful. ah Sergey, thank you so much. I'm very excited for joinery and it makes ton of sense.
00:55:21
Speaker
um Love the problem you're solving. And yeah, I'd love to have you sometime maybe, you know, later this year to follow your joinery and to know more use cases.
00:55:34
Speaker
Thank you. Sure, we'll be happy to be back. Yes. Thank you. so Thank you so for spending time with me today. Bye. I'll stay with you. Bye-bye.