Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Deepti Srivastava: Systems Thinking for Enterprise AI image

Deepti Srivastava: Systems Thinking for Enterprise AI

S1 E20 ยท Straight Data Talk
Avatar
39 Plays26 days ago

Deepti Srivastava, Founder of Snow Leopard AI and former Spanner Product Lead at Google Cloud, joined Yuliia to chat what's wrong with current approaches to AI integration. Deepti introduces a paradigm shift away from ETL pipelines towards federated, real-time data access for AI applications. She explains how Snow Leopard's intelligent data retrieval platform enables enterprises to connect AI systems directly to operational data sources without compromising security or freshness. Through practical examples Deepti explains why conventional RAG approaches with vector stores are not good enough for business-critical AI applications, and how a systems thinking approach to AI infrastructure can unlock greater value while reducing unnecessary data movement.

Deepti's linkedin - https://www.linkedin.com/in/thedeepti/
Snowleopard.ai - http://snowleopard.ai/

Recommended
Transcript

Introduction and Background

00:00:00
Speaker
Hi, all. It's Julia from Stree Data Talk. And I can tell you that I'm a little bit nervous today because I have... Yeah, yeah, it's all because of you, because I have the pleasure here to host today, Deepti Trivastav. And this person back in 2013 launched... Started to work on Spanner, one of the, you know, greatest product in Google Cloud.
00:00:30
Speaker
And yeah, I feel so privileged to host you today and talk about your startup duty. Go ahead, introduce yourself and Let's have this conversation. Yeah, that sounds great. I'm already embarrassed, so good job you're here. I am super excited to be here as well. um I am so excited to talk to you as fellow female founders, yeah um solo female founders, I guess that's even more exciting to me. um But yeah, like, as you said, my name is Iti Shrivastav. I have spent the
00:01:08
Speaker
majority of the last two decades, building, um evangelizing, productizing, selling all beings of data products and data systems. I started my career as a distributed systems engineer at Oracle, where I was um working in the distributed lock manager in the Oracle database rack database kernel.
00:01:33
Speaker
um That was a very interesting experience, of course, because I'm passionate about distributed systems. That's how like I went to grad school for distributed systems at Carnegie Mellon. um And i was even then, I was really excited about the impact of technology on people, specifically.
00:01:52
Speaker
um So I was never interested in sort of technology for the sake of technology. I was always interested in technology as a way to help people or help the world move forward in some positive way. I didn't have these words, obviously. So I just, it just drew me to go like, you know, go from grad school to like where the action was, I guess, which is industry. um So yeah, went to Oracle, but you know As an engineer, I was like really vertically focused as you get right like one you know you're solving problems deeper and deeper. But then I was like, well, how is this affecting, like how is this bug that is stopping the database from working, how is that affecting you know large companies like British Airways or any of the financial institutions or you know any of the, like Yahoo dot.com used to run on um Oracle as well. And so I was like, OK, how is this impacting those systems and the people?

Transforming Spanner into a Cloud Product

00:02:44
Speaker
And it so happened that in 2012, 2013, I got this opportunity to go work at Google on this new project that was a research project at the time. It was called Spanner. It had like 15, 20 people maybe at the time. And my job was to make this project, take this project and make it into a product, whatever that means.
00:03:12
Speaker
So that was my my journey into what is now called product management. So yeah, I launched Banner within the company, within Google in 2013, which was very interesting because like my ICP at the time was Google engineers and they were all really excited about Bigtable. And so telling them to use this SQL-ish transactional system instead of eventual consistency was a whole thing. So, you know, things like go to market and all that stuff were very interesting to build in that space. and
00:03:44
Speaker
i you know like Basically, we onboarded Google Ads, Google Play Store, etc. There's a whole story there on why we started to do that, which I will not get into. But but then from there, you know like when the whole cloud shift happened,
00:03:58
Speaker
we moved to launching Spanner as a cloud product, which we had to actually build a bunch of the stacks. Again, because we needed to be externally, like provide external APIs, right? Which I was actually, me and a couple other engineers were really passionate about at the time. Like we should really put Spanner out there for people to use, not just in front of paper. So that was fine. I launched that in 2017 with our, obviously with our team.
00:04:27
Speaker
and And then I moved to Observable, which was a data visualization company in 2020. And in 2022,
00:04:38
Speaker
um you know I kind of was burned out. you know back Back in the pandemic days, I was kind of like, it was a really fun ride, actually. I learned a lot about startups there. um I learned a lot about sort of how data is used because it's in the sort of visualization stack, right? Data app stack, part of the stack. So that was really fun.
00:04:56
Speaker
Um, and then sort of, I was really like infrastructure and data systems is like my true calling and my true passion. And so it's kind of like, okay, you know, I took a break. I just, you know, I needed to recharge after the whole sort of post pandemic, all that sort of stuff. Uh, if you can remember back to 20 years ago, but, um,
00:05:22
Speaker
You know, as I sort of started to look around, I was like into the infrastructure

AI in Business Operations

00:05:27
Speaker
and system thing. I'm like, Oh, what's exciting? What's there? Um, and obviously chat GPT had just sort of broken out at that time. Um, this is the end of 22 beginning of 23. So I started talking to people about how they're using this like new and cool AI. I do, I do think that things like chat GPT, like.
00:05:47
Speaker
democratized AI, I felt like this was like not just a hype, it was like a true thing at this point. And so then my question as a systems person is, okay, how are people implementing this? How are they using it in their enterprises? How are they helping customers, right? And that's sort of what led to me. Finally, there was a couple of theses I had in my head that I thought somebody else was going to solve.
00:06:09
Speaker
But nobody else was solving it that way. So I was like, okay, well, it's not like the time to to just do it. And so that's how I started Snow Leopard, which is a data and AI company. This is a very broad description, I should say, okay, data and AI company, like we are also data, but maybe not the AI company, but if we can, you know, kind of go and enable deep seek, for God's sake.
00:06:35
Speaker
ah but That's a little bit more involved in that, yes. but but that So yeah, let's go. be Tell us what you guys feel that snow that apart. Yeah, so.
00:06:50
Speaker
Here's the thing, right? Like, as I said, I'm more interested in always have been in, like, what is the tech, like, what is the technology doing? Right? How is it helping people? How is it being actually used? Right. And to me, I'm an enterprise person. I've been an enterprise sort of software my entire career, like Oracle, Google, right? Spanner.
00:07:09
Speaker
but kind of yeah So for me, it was very interesting to see how people were in the enterprise space or in like sort of, you know, system space, how they were using this, this new way, right? Because I had seen transformers, transformers were at Google. um So I had kind of seen how transformers were built, I kind of understood the basic technology, but I was like, Oh, this is this is not something at the time, I was like, this is not something that's going to be mass market, because of the data and the compute requirements, right? Like you need goblin gobs of data, and you need goblin gobs of compute. And so yeah how are we gonna
00:07:46
Speaker
How are we going to get there? But with as I said, when you make it an API, and now everybody can use it. It's the same thing as Spanner. right like When you make it an API, everybody can use it. So I'm like, OK, how are people using it? So I spoke to my friends. I did market research as any good product manager. right um I spoke to VP of Infra at sort of CRM companies. um I spoke to folks, my friends, and in various larger companies, and how they're going to use it. and over and over again, two things stood out to me. One was that, you know, everybody kept talking about this as like a new sort of stack, right? And my LMs. Yeah, sorry, LMs, generative AI. Yes, sorry. Yeah, generative AI and LLMs. Yeah, I know. I'm like, of course. We should use the words. But with generative AI and LLMs, like,
00:08:44
Speaker
How are people using it? right like It's great, right but but how are people using it? right And it's great to chat with your PDF, but what

Challenges and Solutions in AI Adoption

00:08:50
Speaker
is the actual use case? And so you know as I talk to people, like the problem started to show up with me, which is one is that you know in order for anything to be used inside an enterprise stack like or inside any technology stack, really, it has to actually fit in that stack. right like We can build the most beautiful systems right? But if they don't fit in somebody's existing stack in an easy way, right, then it's really hard for mass adoption, right? Because you're not going to like build a new thing here and then, you know, use it something like it's always going to be peripheral.
00:09:29
Speaker
dip to I just want to highlight how far you are in your realization of understanding enterprise, the majority of startups and startup founders there. like You just highlighted one of the important things. If it's not going to fit in the stack, it's just not going to work because nobody is going to use it. but That's it.
00:09:50
Speaker
This is yeah fundamental. Yeah, this is a fundamental sort. Yeah, I mean, it's not that's I mean, this is the thing, right? Like, we built a beautiful system with banner. But the thing I learned, as much as systems and data systems are exciting to me, they still are, right? Like, I still like I'm very passionate about databases and like data systems. But Until other people can use it and we can make it easy to use, it becomes really hard. So every every data system and and technology problem really becomes a usability problem at the end, in my opinion. right So that was one key finding for me that I was like looking at. um And the other thing was, and this is you know the the two decades of learning, is that data silos are here to stay. They exist. They will always exist. There is really no way to sort of um
00:10:39
Speaker
say that you know you can use one place to to query or to really get access to all your data from your entire business. like it's just It's not physically possible. It's not possible. right So this idea of having a data lake like and data warehouses like are is really important. like I think it's really helpful from an analytics, from data transformation, those kinds of use cases. But it's not a panacea. right like It can't be the end all and be all because there's always going to be some business unit somewhere that's storing data in a different way right that you haven't put into your lake. There's always going to be that. There's the problem of getting to the right data at the right time right from the right place, like getting the right data from the right time at the right time from the right right place. right um and That's sort of a problem that exists. and The way people solve it because you know lakes and oceans and houses are so popular is that
00:11:40
Speaker
you do you build very large and complicated data pipelines, right? And so what what does it mean for the developer, the AI developer or the i app developer? It means that we're spending, and this is true regardless of AI, by the way. I saw this pre-gen AI, I'm seeing it even more in-gen AI, which is that developers spent majority of their time building, maintaining, manipulating, fixing data pipeline.
00:12:06
Speaker
Like it's, it's, it's a nightmare for developers to do that, right? I saw it like um in my entire life and like, getting Again, getting access to the right data at the right time becomes really hard. and And anytime there's a new use case, you actually have to build a new data pipeline or a new data workflow to accommodate that use case. right So it takes months. It's not an order of minutes or years thing. It's like a minutes or weeks thing. It's a months thing. like You have to go start from scratch and re-inject a bunch of data into a space and then like connect that to the rest of the app.
00:12:44
Speaker
And that's not trivial. That's really hard. And then I haven't even touched upon things like data governance and like, you know, ah freshness and scalability. And yeah, there's a whole whole bunch of things there. So the, so the thought process with Snow Leopard is What if we could get rid of complex data pipelines that are unnecessary? What were you what if you could get rid of unnecessary ETL, unnecessary pipelining, squiggly lines everywhere, nobody knows what's going on? What if you could erase all that? And yes, you still have your warehouses, because as I said, they're they're really helpful in a lot of use cases. But but what if you weren't forced to do that? like What if you could just go to the right data source
00:13:28
Speaker
and fetch the data that is live, that is fresh data, right? So you could make more business critical decisions. In the world of Gen AI, when you can ask ad hoc questions, like your customers and you both can ask ad hoc, that's the that's the shift, right? Instead of having predefined workflows, now you have questions that can be, any ad hoc question can be asked, right? And so that as a result, you need to have systems that accommodate that.
00:13:57
Speaker
So just to take a step back, as far as I understood, yeah the problem that you see that one enterprise wants to use LMs, what they want they need to, they are trying to push the data to LMs and basically train these large language models using kind of subset of data. And your point is, it's actually not enough and not good enough.
00:14:25
Speaker
you can ah do it better by accessing the most like reliable and fresh data and exactly in data sources without fetching it to a single centralized data lake, the approach that we all kind of adapt it or get used to. Yeah, it's become the de facto. And I'm saying it's not, and while it's very helpful, it's not the only way to do stuff.

Innovating Data Access with Snow Leopard

00:14:52
Speaker
I think there are better ways, in my humble opinion.
00:14:55
Speaker
ah Well, I'm very interested in your humble opinion, why it's not the best way to do that today. Like why, you know, training LM model to resolve customer issues on subset or previous issues, whatever, how the training happens is not good enough. but but i'm I'm just trying to point it at the very, you know, hands-on use case, as everyone is using today, the LMs for customer success.
00:15:23
Speaker
Yeah, that is the most popular use case for obvious reasons. Chat bots basically are the most obvious things. like There's also agentic frameworks. there's There's stuff down the line that you know we can talk about. But the point is, um to answer your question, like first of all, when you train an LLM, it's a static snapshot of information. right So it automatically like is not going to be great for constantly changing or even semi-constantly changing data. It's great for like, here's our handbook of company that doesn't change every year or even every two years. right that So first of all, you pre-train your model on those kinds of information. That's why it's been trained on internet right data because it doesn't change.
00:16:08
Speaker
That much. are um So it has general reasoning capabilities in that in that way because it has general access to general data. That's number one. But the point there is it's stale. It was a snapshot taken at the time to train it. So great. So what have people done to give access to more more business-focused data, more company-focused data, et cetera?
00:16:32
Speaker
which is RAG. right like the The de facto way of doing things in the last year and a half with Gen AI is to do retrieval augmented generation with the concept makes real makes a lot of sense. LLMs basically generate information. That's why it's Gen AI. Based on previous information, so now you augment that with no relevant data in the context. right But RAG is also ultimately ETL into ah into a data store and then connect that data store to the LLM. um So again, has the same problems as ETL. This is not like, I know AI seems to be this like thing where it's like very hyped or like, oh my gosh, it's something new and different. It is, but this part, the data part of it, frankly, yeah I think is very, you know, what what we've already seen, at least I see a lot of parallels to already existing data systems, right? So it's ETL based essentially, data pipeline based.
00:17:30
Speaker
So all the problems, like whatever you have in ETL, the way RAG works for the most part is you put data in a vector store, which is you know called vector database, but it's really a vector store. And then you access that data at the time of but when a question is asked to go back to your support use case, you like try and fetch the information that has been stored in the vector store.
00:17:54
Speaker
and you know make it part of the retrieval make sorry make it part of the generation from the LLM by giving a context of that data. But again, it will only know what exists in that vector store.
00:18:08
Speaker
Which is expected. Yeah. So it's a game. that again is Right. It's a snapshot of data. Even like nobody's doing pipelines that are like microsecond or millisecond or second, just to be clear, right? But even if you're doing it every few minutes, it's still got two problems. It's inherently stale. It's never going to have the update that just happened to your business, operational business data. And second,
00:18:31
Speaker
whatever you have, in ah like there's always a data silo out there that hasn't been put into your vector store. And so therefore, your chatbot, your supposed support system cannot even begin to answer those questions. right even Even if you built the most elegant solution for this, right it still has the same problems of you have to maintain it, it's hard to maintain, it's brittle, it's got problems, it's missing data, it's stale data.
00:18:57
Speaker
But you can't do this for business and critical systems. And so as a result, you haven't seen mass adoption of AI in like business critical operations, right? This is really interesting. So you blame the lag.
00:19:15
Speaker
so the left Relatable data to solve this problem. Relevant. Yeah, so so relevant. late I might have entered words. No, no, no, I'm just, I'm a very active thought process. So yeah. This is really interesting. Explain, explain to me how you guys tackle it. Like, because you have a completely different approach. I mean, if you have to add something, go ahead, but, uh, it's still hard to believe what you propose. I mean, to me as a person who sees this ATLs daily.
00:19:49
Speaker
I mean, I saw it daily for two two decades, right? So it's I come at it from a, well, there has to be a better way because it's so hard. What does that blame it? I feel like, again, when you look at it, when you which you know I had the privilege of taking a step back and looking at it and like seeing it ultimately
00:20:10
Speaker
It's just hard. I've always like, ah like you have seen in in your world too. It's really hard. People are always trying to understand what is the data that I missed? How do I get the right data? Where do I get the right data from? What source is it like, right? So I think that this is sort of, this is the fact of life. It's not about blaming stuff. And so the question in my mind was now that we're in the generative AI world where that's that ad hoc,
00:20:38
Speaker
questions can be asked and need to be answered until you can unlock it. Sorry, stepping back. For me, what I have seen is that most business value exists in operational business data. Those are the crown jewels, the Postgres, Salesforce, even BigQuery, you know, data warehouses, whatever, right? So if you can't get access to the operational business data, yeah you're actually just working on the periphery of the problem, right? You're not really tackling them the the center, of the meat of the problem, the core of the issue. So if you really want to like unlock the potential, the trillions of dollars of potential of of AI, new AI, Gen AI, we have to connect it.
00:21:26
Speaker
to the crown jewels, you have to connect it to the business operational business data. And how do you do that in a way that you can, again, get the right data at the right time from the right place? So if you approach the problem that way, and I i think another thing that I talk a lot about, well, when I talk, I talk a lot about like having the systems approach, right? Like I feel like we need to apply the decades, not just mine, but in general, decades of distributed systems thinking to AI to make it a reality, right? So again, AI has to exist within a contact ah context. The context, every company, every tech stack is slightly different, right? So saying that you have to rewrite the whole thing is just not practical. And if you put it on the side, then it's always going to be on the side. So how do you bring it in slowly, like but cleanly, right? Easy to use now.
00:22:20
Speaker
Those are the principles that we built Snow Leopard from, right? so So Snow Leopard is an intelligent data retrieval platform. whose job is to take away the burden of building the glue between AI and your your operational business data. So Snow Leopard sits, roughly, between your operational business data and your AI system, and based on the question, very roughly, based on the question being asked, it's going to fetch the right data from the right source at the right time.
00:23:00
Speaker
Good question. Yes. How it knows where is the right data? Yeah, great question. too i' been like This has just popped in my mind. How how does it know? Yeah. So SNOLAPR basically is is utilizing the fundamentals of two systems. right like One is distributed data systems and the other is AI, or specifically LLMs. LLMs are really good at classification, summarization, pattern matching.
00:23:29
Speaker
and we know how to build deterministic data systems. right ah But how do you connect? Because LLMs and AI is inherently non-deterministic, and then we really want like to get the right answer. like We've spent 40 years building databases and data warehouses and data systems, but when you ask them a question, they say yes or no. They don't sort of make make stuff up. So we're using those two fundamental things to build Snow Leopard. So to answer your question, like The idea is that we can use but when a question comes in, we can use the LLMs power of pattern matching summarization classification to figure out what is the best place to answer this question. So it's doing intelligent routing for using LLMs, and then it's we're building we're not doing Text2SQL just to be very clear. We're building a query using a Query Builder.
00:24:26
Speaker
right for the actual system that has been identified. right So we're on the fly, on demand, creating a query and for the natively for the ah data source that it's supposed to go fetch from. So for example, if it's a page Postgres database, then we'll build a Postgres PgSQL query. If it's a Salesforce system, then we build a SOQL query, for example. right If it's a BigQuery system, then we build a BigQuery query.
00:24:57
Speaker
so We're building queries on demand using a Query Builder, but we're figuring out where to go you know using the power of elements, essentially. Okay. And I have a tricky question for you. Go for it. So,
00:25:19
Speaker
with your system, so the part is different to Glyn.
00:25:26
Speaker
Do you know the Glyn startup? Yeah, it's a search inside the organization. yeah yes again sort of doing ah like it's It's essentially around unstructured storage. right And it's like trying to scan your entire data source, et cetera, to like figure out what's going on. I think the fundamental difference between us and other systems that are tackling data for AI is that we are a federation system rather than unification system. right So instead of putting your data in one place, right we are saying we will go to the system and fetch the data at the right time. right And fundamentally, we are not a non-structured search system. right We are a structured query system.
00:26:21
Speaker
So we are focused on structured data, which is operational data. We are writing queries for the native and data source to fetch it from that data source on demand.
00:26:33
Speaker
The whole point is that it's not predefined. The whole point is you don't have to do a predefined data workflow because we know how to do that. We know how to answer the five questions that exist for a chatbot. Let me give you an example. Will that help? There are many use cases, but let's go to your chatbot support use case.
00:26:55
Speaker
right so And this is actually one of the reasons that really made me think I should start Snow Leopard. So, ah you know, I ordered something online, right? ah And it was like anniversary bouquet for somebody in a different country, right? So I go online, I order it, and then, you know, you go through all the stuff and then it takes your credit card. And then finally, there's like a screen that's supposed to happen that says, hey, your order went through, here's your order number, right?
00:27:21
Speaker
That's why it never happened. And I got really confused because I was like, oh, did it go through or not? right And the question there is is less about did my credit card get charged, but more about if I'm sending it to somebody for their anniversary. like I want to make sure it gets there. So do I have to reorder or not is the question in my head. right So I'm like, oh, what happened?
00:27:44
Speaker
So then obviously you have to go to support. And the first the own like the first thing that you have to do for any e-commerce thing now is chatbot, either through phone or like a full window, right? Text chatbot, but it has to be a chatbot. So then I was like, okay, ah you know what happened to my order? And the first question that the chatbots ask is what's your order number? Because they know only how to how to pick things, yeah retrieve data based on order number. And I was like, well,
00:28:12
Speaker
don't have an order number. So what does the chatbot do? It transfers you to a human, right? Because the human has to look at the order management system to see whether a row in that order management database created or not, right?
00:28:26
Speaker
so So I wait on the phone, ah sorry, in front of my screen for 45 minutes to get to that human who then looks at the data and says, oh yeah, ah the row was created, but I see here that the email bit is not set, so email was not sent. So let me push the email out. Here's your order number. It's gone through, you're all set.
00:28:46
Speaker
Okay, yeah so now the point here is that data existed, right? I didn't have to wait for 45 minutes, right? And I didn't have to take up the time of this human who could have answered more complex questions, right? To stare at another screen. So instead, what if, you know, the chatbot could go look at the auto management system and and Based on my customer ID, you retrieve that data. Because the thing is, even if the data is connected, like if there is a RAG pipeline for it or an ETL pipeline for it, it's not going to happen immediately. right now It's going to be batch processed.
00:29:27
Speaker
right and so and yeah it's never going to be ah like because That's too much QPS. and so like What if you could just go fetch that? And with Snow Leopard, if Snow Leopard is sitting in between your chatbot and your ah your Postgres, your order management system, all of your data sources, then it can look at the question and be like, oh, let's see if they have an order number or not. but Let's see if they have a presence in the order, like did they make an order or not. Okay. Fantastic. Like I love it. I love the idea. Yeah, but but here's a question.
00:30:03
Speaker
There is a but. And you mentioned that very beautifully at the beginning, like I wrote it down and like in these big letters, usability, like I loved it so much. Usability problem. Basically what you said, and I highlighted it earlier, how that fits into the year stack.
00:30:24
Speaker
and like gift here ah I don't want you to treat that as a little bit of skepticism, but I'm also a vendor. Here you a vendor comes in. No worries. Here you a vendor comes in.
00:30:41
Speaker
So for us to deployment, I said, we just need to connect to how we did logs on Google BigQuery, where we also apply filters that is hosted on client and filters out any PII data that may have me hey appear there, basically. Like we create super, how do you say, secure connection architecture where we don't touch the data, actually. And in your case,
00:31:09
Speaker
the SNOLA part would need to touch so much client's data. But this is the other story. The first story is being able to actually connect to all of those distributed system and all of those spreadsheets that financial team owns and keeps private, um almost private to themselves. How do you envision that part connected to those distributed distributed systems?
00:31:38
Speaker
I think both points are around privacy and and discovery. right like So the discovery we're saying we won't um we won't touch anything you don't want us to touch. right You have to give us access explicitly, first of all. yeah and and here's the Going back to the privacy case, right like let me get to there. First of all, you give us access, you give us access to the tables, you give us access to the sort of specific datasets you want us to give access to, and you can expand and contract that in one place. so this This thought process around governance,
00:32:17
Speaker
You know, each data store these days, the databases specifically, are very good about access control, right? They have some level of IAM permissions, etc. And they're very fine-grained, okay? But as soon as the data leaves that system, what's happening to it? Nobody knows. And and in the most in most cases, especially for answering operational questions,
00:32:43
Speaker
you actually dump it somewhere else, mostly in a spreadsheet that has not that level of, you know, access control,

Data Governance and System Efficiency

00:32:50
Speaker
right? It's at best like some kind of user sort of access management rather than like, oh, you have to have a service account, you have to like, you know, it's logged everywhere, there's auditing, logging, etc.
00:33:01
Speaker
All of that beauty is left behind as soon as the data is out. So that's number one problem. So if you just leave the data where it is and fetch it but in a more systematic way for the data you need, first of all, you're not dumping everything out. That's number one. right because youfo um Because that is actually a vulnerability problem for most people who build these systems.
00:33:23
Speaker
okay second the like your Because you are having multiple systems where you need data from, this is actually a reason why one of the many reasons why people don't ETL everything everywhere or try to ETL everything everywhere and it doesn't work. um But each system can maintain its own permissioning and access control. But with Snow Leopard, you can set all of that at the Snow Leopard level. right The roadmap for Snow Leopard is it can actually help you with cross data source
00:33:56
Speaker
governance of data, right? Once data is out, you can you can put access control there because it's sitting close to the data rather than sitting close to the application, right? Because what happens with microservices type architecture is each application is pulling data, yeah right? And then it's it's sending data across APIs to other services, but this part of the system, the the app part of the system is not actually being tracked or can't be tracked by the data source because it's separated. There's no leopards close to the data. Then first of all, you can you can do the access control cross data store.
00:34:34
Speaker
at the Snow Leopard level, which is you know helps with governance. And that's kind of our roadmap moving forward. um And then the other piece is the sort of the privacy piece. We sit inside the whole point of Snow Leopard is to that you don't have to ETL all your data to some rag-as-a-service type architecture, right a cloud. like we can sit If you want to do that, we can do it.
00:34:58
Speaker
But we can sit next to, within the same VPC, or um right like a cloud prem sort of situation, where we can sit next to your data. So you have access to everything we're looking at. We have the logging, et cetera.
00:35:13
Speaker
And you can try to resolve. It makes no sense. Yeah. It makes no sense. You have a tenant basically that is resigned in the client's cloud. And it basically glues all the data from, it actually doesn't even glue the data. It creates the routing for the data when some bits of data is needed. Right. So we're not dumping data out.
00:35:39
Speaker
right In any case, right like we're just saying we'll fetch the data and then because we you know you can have auditing and logging, you can know which thing fetched what data. That's what we're doing. In fact, one of our customers um has a data lake and they've only given us access to a portion of the data lake because they're only using it for one use case right now. and so they just They just give us that access. and In fact, once you've set that system up,
00:36:05
Speaker
they they can give us access to more tables if they want to open up more use cases. Because it's all new stuff, right? There's a service account, like, et cetera. So it's already hooked up that way. And so you can reduce or increase access much more easily, right? Instead of having to worry about now there's a new sort of opening or new tenant or new something that you need to get access to. But listen, what I also realized that you're also now What you say, you are not fixed on what cloud it is. and Basically, you don't care what kind of data sources are there. are You just connect to those data sources through service account or APIs and and kind of govern that. and It's all stored in clients' environment, that connection. But you also can use different LMs. You can drop some LMs and basically get an another one that performs better for this or that task for your client.
00:37:04
Speaker
That's right. We are a BYO LLM type of situation as well, right? So for example, if you have trained your LLM for your internal data, we can connect to that to get better results. We can use different LLMs on our side, which you don't, it's a black box. We're using an LLM somewhere, but that's not your problem, right? We can use it ah like Again, the roadmap is you know we're early, but the roadmap is we can actually train LLMs for specific types of data stores or so specific types of queries so that they can become more efficient. right But we are still like, again, we're not training on your data necessarily. right
00:37:42
Speaker
So that's one thing. But second, secondarily, to your point earlier, your data is never leaving your VPC, your trust domain. I think this is really important for most people, most enterprises that the data doesn't leave their trust domain, right? So even if lots of interesting technologies exist, they can't use it because it's outside their trust domain, right? it It requires the premise that you dump all your data into some other system. And And our aim is to not open up that hole. right like to just make the The closer you are to the data yes while being separated from it, the closer you are to it, the less you have to a do all these squiggly lines. Also, you don't have to worry about, um oh, I have to dump all this data out and put it somewhere else. like The ah trust domain can be respected and maintained.
00:38:37
Speaker
how with today and and I like how you highlighted with all the IAM permissions today. You can be very granular up to the columns. Yeah, let's say you implemented that so I remember it. Oh, look at you. Yeah. No, I mean for Spanner, right? Like BigQuery. Yeah, I got it. and You can be super granular when you work with your clients. This is so beautiful.
00:39:04
Speaker
Yeah, so the thing is, like again, it's all like i I am very attached to you know having strong data systems and databases. like I spent my whole life in it. like We can make those systems so beautiful, but once the data leaves that system, all the beauty is, like frankly, up to whoever's handling the data. right So a lot of what I say, people build in-house, but the point here being, instead of all the companies building the same thing over and over and over and over and over again, the glue, right and then maintaining it and managing it. Let us build all of that and let us take care of the glue so you focus on building like value for your customers and value for your users and value for your business. right like That's the point. What we're doing is hard. like You've pointed out meant multiple times. Of course.
00:39:58
Speaker
Yeah, okay, let let it be done once ish, right? So that each person doesn't like it's the whole thing around being a hamster on a hamster wheel and you never get off it because you know, you spent all your time trying to deal with the data. and it's a Like, so great analogy, like, and the top and this is a reason why the dog cannot change in how to get the welfare from data.
00:40:21
Speaker
Like, you know, it's all over all of the conferences you go, they're talking about, they say, Oh, how do we achieve data governance? like it's whole Data lineage, data catalog, data governance, and you know, all those problems exist until I have been to a lot of those conferences. And I have given talks at those conferences. And I have probably given a talk of how do you make, you know, probably somewhere on the internet, there's like, a how do you make use of your data from a database, but What I realize is we have to respect the fact of life that data silos exist and that each system is unique and your beautiful data system has to exist, database data stored system has to exist in those contexts. And so how can you make it easier for developers to just focus on building the application on top of your data?
00:41:17
Speaker
I just want to clarify that you are not against... I mean, this is a very loud word. You are not against data transformation. It still needed to be there. You're not against... Again, it's a very loud word. I just, you know, I'm completely stoned by how you think and how differently, how you even build a You are not against the BI. You still have to be there.
00:41:43
Speaker
No, no, BI. Business intelligence. yeah Looker, looker. Does it triggers ah no it's been joke i sorry you? I have reached ZAN state, but I hear you. I hear you.
00:41:58
Speaker
and so I'm sorry, just a little cloud.
00:42:06
Speaker
and
00:42:08
Speaker
Listen, the way you think, you know, majority of people in this conference is ah still unlocking the power of lineage. I think, so I, yes, that's a great question, Yulia. And I want to be, I said this at the top of the hour too, like,
00:42:26
Speaker
I am not against those things. Those things are really powerful, right? It's just that because those things exist, everybody wants to fit everything into those paradigms. And those paradigms are essential, especially for business intelligence, right? And analytics and historical trending and those kinds of things are really important and should be done. The question becomes when you try to like, because you have this warehouse that can take lots of data,
00:42:57
Speaker
the de facto has become let's dump everything there because it sounds cool, right? Then you have one place to go fetch data because otherwise lineage and cataloging and where's everything becomes so hard, right? So I have a lot of sympathy, empathy, and like,
00:43:14
Speaker
Like I understand that thought process, but when you start to implement that thought process, you kind of go into this hole that you can't get out of, right? And so every time I have gone to conferences and talked to developers, whether for Spanner, whether for Oracle, whether for, you know, other ah other sort of systems, it always becomes a this system is beautiful as long as you have everything inside that system.
00:43:40
Speaker
And then all the developers are like, Oh God, how do I get everything in that, in that system? And Oh God, how do I get live data and fresh data? And Oh God, how do I maintain all these things? And it just becomes like this crazy rabbit hole, right? So the idea is you erase away all that crazy, like squiggly lines, which is data connections and whatever, whatever mapping and thing. and For things you don't need that four yeah for. Yeah. So things, things unnecessary is really important in, in what I say, like the unnecessary pieces we need to remove.
00:44:13
Speaker
This is, this is beautiful. dita Just to be fair with you. So I'm coming from data world, you know, with certain perspective on details and certain love for details with all of that complexity.

Pitching Snow Leopard's Value Proposition

00:44:27
Speaker
And it took me some time to realize how you build the system because it's basically work backwards. Like.
00:44:35
Speaker
Yeah, we're not in denial of details, but we are not a fan of details anymore, according to your pitch. So my question is, how do you pitch it? First of all, to your customers, because it's a complete challenge of marketing as well, you actually go in against the market and against what market used to this is first and second, which is With all the respect to all the investors out there, how do you even explain it to them? And I love my investors. That is also worth mentioning here. Yeah, I love my investors too. I love my customers. I'm doing this for customers. So yes, great question. It is not easy. I will say that just right off the bat. um I don't have any um magic pills.
00:45:22
Speaker
but important but yeah I think you know what's interesting when I started pitching it a few months ago not everybody got it to your point right they're just like oh no that this is no this is like I understand that I'm being counterculture but that's why I'm doing a startup right like the whole point is I saw something that I tried to ignore, but then it became obvious to me, even in my sort of you know discussions with VPs of platform or eng leads or CTOs of you know startups. I did a lot of diligence, just to be honest, like and asked people a lot. like My PM background sort of helped me here because I was like, as I said to you earlier, right like I thought this technology is great. I think it's awesome. I was blown away by it. But then the next question in my head after chatting with
00:46:09
Speaker
chat GPT for a while, and then cloud and stuff, and Gemini. The next question was, okay, but how do you use this, right? And so I was like, instead of me making up stuff, why don't I go ask people whose job it is to make use of this technology, right? So I actually did market research, right, and talk to people. And that's actually how I got to this problem, like, and therefore the solution, right? Like, I i wasn't I didn't have a solution in search of a problem, first of all. i had a I discovered a problem that existed. Of course, the people at the tip of the spearhead at the time were experiencing this problem a few months ago, right because there was there was still talk of how do you even implement RAG. right But um that was the first thing. like I actually like found a problem that I thought
00:46:58
Speaker
I could use my data and systems and infrastructure expertise to help the world adopt this new technology that I frankly think is is game changing. But in order for something to become like widely adopted, it has to be easy to use and has to make sense in the context of the people's technology stack and their current use cases.
00:47:22
Speaker
but So that's number one. And I think number two is, so when I spoke to investors, and you know there are investors who have companies who are dealing with this problem of like, oh, how do I connect my Postgres database? Oh, I've connected it now. Great. How do I connect my BigQuery database? Oh, I connected it now. How do I connect Salesforce? Oh, I connected it now. How do I connect all of that to some unstructured thing?
00:47:45
Speaker
right So that same problem existed for people who were trying to actually implement AI solutions for there for their companies,
00:47:57
Speaker
right which is how I started this whole journey in the first place. and then So so that I just basically explained the problem. How do you do this? And they were like, well, you know some of the people were like, I'll just build fancy pipelines for it. And I'm like, cool, more power to you. But like everybody doesn't have the engineering power.
00:48:14
Speaker
Right. The time and the bandwidth to do that. And so, uh, that, that's sort of the pitch, like we'll do it. And so if you were doing it over and over again and building blue and constantly maintaining it and spending 80, 50 to 80% of your engineering workforce doing that and maintain it managing and and maintaining it, we'll do it for you. And I, and I, you know, to be honest like and honestly more power to them, I found investors, and I'm lucky for that, that actually understood the problem. Right.
00:48:44
Speaker
This is like, it's a cherry on top. You found investors who understood the problem and the solution. And we're willing to must support me, right? and And since then, actually, since I've gone out to market and talked to people, everybody is like, yes, how like there are so many people who are who are who have this problem that you know For people who want to build their own solutions, like that's great, but not everybody wants to build and maintain complex workflows and solutions. right and that and that's like We just think that it should be a ah systems and infrastructure thought process that should be applied to it, and like we should just build it in a way that developers can focus on building value. I even think that if they can
00:49:32
Speaker
you know, take the model and pre-train it because still you need to collect the data. Even even to to create the pre-trained dataset is still a work and the application of data team time into that area is much more valuable, I guess, than creating pipelines. This is also, you know, like my pitch about Masshead.
00:49:57
Speaker
people stop creating money drawings. We will have you covered and they already get excited and hear you coming in and saying people stop creating unnecessary data pipelines. We will have you covered. This is beautiful. I love it. Yeah. So I think people can use mass head where they it's needed, right? But instead of trying to, even for you guys, right? Like instead of trying to fit everything into one paradigm, which just doesn't make sense in a lot of cases, like how about we just shift the paradigm where it needs to be shifted?
00:50:26
Speaker
Why are we maintaining and managing all the things that we need to maintain and manage around ETLs and things like that? I can reinforce your pitch so much. like Once we connect, and you know we see the number of pipelines and client data warehouse. It just doesn't make any sense. like First of all, what we highlight is ever the pipelines and tables which have recurring compute and that not in use. and this and ah and and You're going to be amazed by this fact, not amazed, sorry, who I'm trying to use.
00:50:55
Speaker
I am always amazed by how people use things that we build. Okay, so you know what? At least 10% of compute, recurring compute, compute, even the compute that is excluding the compute that people are making from console, like excluding that just recurring compute, 10% of recurring compute goes to based. The updated table said,
00:51:25
Speaker
Not in use by no e details. They are not being used and in any local dashboard, whatever. And there are no people acquiring it from the console, but those tables keep being updated every hour. Yeah. Imagine the waste.
00:51:43
Speaker
Yeah. and And this is only about how do you call it data ocean lake house, how ah you call it. Yeah. So, so, so much waste. And this is what we're also tackling and Matt said, and you're just saying like, listen, we can make it lighter. We can make more sense of it by just erasing the complexity of
00:52:09
Speaker
inherently building on and fetching all the data warehouse. Inherently like unnecessary complexity. there is there is There's a lot that is necessary. We don't have to spend extra bandwidth and power, brainpower and just resources on wasting it on like unnecessary complexity. That isn't even used, right? like As you said, it's not even used. What's the point? like let's Let's use that to add value.
00:52:36
Speaker
But listen, this is our internal mass and what's not used. like we don't Obviously, we don't know the business of the client and we just see it from the metrics, metadata around it. But there could be, let's say, the effort that somebody is looking at that effort once a month or something like that. Maybe we can just go fetch it at that point right instead of putting it in the place. Exactly, because upholding all the environment around it, upholding you know Even even paying for look for maybe you can go to be much cheaper, but lower tier unnecessary. Yeah, I think there's like an unnecessary complexity to that. Yeah, that you cannot against it. I'm against it. Because once we onboard, we also see how those legacy SQL
00:53:30
Speaker
that wasn't updated like 10 years, still powering in their major, you know, look at that board. It for the customer base and product, whatever. It's the chip payment. People try to uphold that. Yeah. I know what you're talking about. And um this is what we pitched to all customers. We need to,
00:53:59
Speaker
reduce the amount of data and a amount of compute basically times you're using. Yeah, I mean, if they use MassTet to actually get visibility into their system, then they know where like they don't need to do all this stuff, right? And like they can reduce complexity and Snow Leopard can come in and then help them with you know utilizing the data they have already right to connect to their AI systems. So I think visibility is really helpful in illustrating that point that I'm making, that there is a bunch of unnecessary data in motion that takes a lot of energy to maintain, manage, and doesn't even add value in a lot of cases because it's unnecessary. right So what if you could like enhance the value generation for your customer and your business?
00:54:45
Speaker
by not focusing on that, right removing that complexity that is unnecessary and like focusing on getting the right data at the right time right from the right source. Deep to you, it was such a pleasure talking to you. I feel honored and privileged. Oh my gosh. It's so fun to talk to you. Thank you for having me. It is a privilege. It's all mine.
00:55:06
Speaker
ah Listen, for those who, you know, first of all, snowleopart.io, this is how you can find DPT's product. No, snowleopart.ai. I am AI. snowleopart.ai, this is how you can find DPT product. yeah This is going to be a lot of editing in this podcast.
00:55:28
Speaker
It's a fun podcast. but Yeah. Tell, please, um people can reach out to you yeah elsewhere. Absolutely. I'm on LinkedIn, Snow Leopard is on LinkedIn, and you can always email me at dptsnowleopard.ai, but you can

Collaboration and Future Development

00:55:42
Speaker
also sign up. If you truly are interested in the product, um please sign up. We we are accepting, um you know,
00:55:49
Speaker
Design partners, we have some early design partners and and we're looking at you know working with more people who have this problem so we can help you solve the problem and you can influence how we build the product. So you can sign up on the website snowleopard.ai um and you can um you know talk to us, you DM me on LinkedIn if you want, or just send me email on dt at snowleopard.ai.
00:56:09
Speaker
I mean, folks, this is one lifetime opportunity to work with such a brilliant person as DPT. So please do that. No, no, no, do not embarrass me. People just sign up.
00:56:22
Speaker
yeah i said yeah Thank you so much for your time. And it was a great pleasure. Thank you so much for talking to me. Thanks, Julia. No,