Introduction to 'Stray Data Talk'
00:00:01
Speaker
Hi, I'm Hewlett Kuchova, CEO and co-founder at Mesh Herdata. Hi, I'm Scott Herlman. I'm a data industry analyst and consultant and the host of Data Mesh Radio.
Featuring Data Practitioners
00:00:11
Speaker
We're launching a podcast called Stray Data Talk and it's all about hype and data field and how this hype actually meets the reality. We invite interesting guests who are, first of all, data practitioners to tell us their stories, how they are putting data into action and extracting value from it. But we also want to learn their wins and struggles, as a matter of fact.
00:00:34
Speaker
And as Yulia said, we're talking with these really interesting folks and that a lot of people don't necessarily have access to and that these awesome conversations are typically happening behind closed doors. And so we want to take those wins and losses, those struggles, as well as the big value that they're getting.
Unscripted Conversations for Insights
00:00:50
Speaker
and bring those to light so that others can learn from those. And we're going to work to distill those down into those insights so that you can apply these amazing learnings from these really interesting and fun people and apply them to your own organizations to drive significant value from data.
00:01:08
Speaker
Yeah, so every conversation is not scripted in a very friendly and casual way. So yes, this is us. Meet our next guest. And yeah, I'm very excited for it.
Meet Andrew: Data & Tech Expert
00:01:22
Speaker
OK, hi, everyone.
00:01:25
Speaker
Andrew, it's a pleasure to have you here today, and I'm going to briefly introduce you. So first of all, Andrew is one of the most smartest people I know, and the level of his doubts and thoughts and the way his structure is just so impressive. He knows it inside out, and I wish I could be as smart as you someday. But there is an outstanding, you know,
00:01:50
Speaker
feature of yours, let's put it this way, is that I think you can get rich with just building things and playing with data, but also playing poker, because you are the most stoic poker face person I ever saw in my life.
Andrew's Journey: Early Coding to Google
00:02:04
Speaker
Yeah. Honored and happy to have you here. Please go ahead, tell us a little bit about yourself and professional standpoint of what you're building today. Yeah, for sure. Thanks for the warm introduction.
00:02:19
Speaker
I never played poker. You have to think about it. Maybe that's my next. I'll find my success. Sure. Yeah, my story starts way back when I was seven years old. I started coding. Well, I wouldn't call it coding really. It was just copying some source code from
00:02:46
Speaker
book into the compiler on an ancient computer. But that gave me the basic understanding of the operating system, MS-DOS back then, the notion of compiler and basic things. Then my computer was breaking all the time. It wasn't really reliable. So I was super annoying because I couldn't play games. And so that forced me to dig deeper and understand, OK, so how can I fix this thing?
00:03:17
Speaker
all the things and parts and all of that. I had to dig really deep. And so that kind of got me into a lot of CS, I guess, and generally shaped my interest in computer science. There is history. I graduated with master's degree in computer science and electrical engineering. And then after spending a couple of years in Ukraine and the United States, almost 15 years ago now,
00:03:47
Speaker
And I was at Google.
Big Data Challenges at Google
00:03:49
Speaker
Google was my dream job back in the day. It was the glory time of Google, like 2008, 2009. Chrome came out, Android came out. Google was still running Google Developer Days in the venture of CPS. And in addition to Google I.O., there was like the developer day. And it was all fun and nice. And this was like the prime time of Google. And it was almost like a dream job. And so I spent a couple of years just preparing for the interviews.
00:04:16
Speaker
running multiple timers. Eventually, I got to Mountain View, where I got a chance. This is where the Google's headquarters. We got a chance to work on Google Analytics, which we all know. Tracking the traffic, you know? And that actually was the first time when I got introduced to really super high, like, hyperscale, because Google Analytics is number three.
Project Loon and Data Streaming
00:04:45
Speaker
sort of product in terms of traffic. Well, I think the one, the first one is either search or ask and search and then it's almost, it's striking almost the entire internet, so it has to be big.
00:04:57
Speaker
So that's where I worked on data and where I was exposed to big data challenges in some data infrastructure. And then in the year of 2013, I was actually helping Google X as well to work.
Joining Uber: Marketplace Optimization
00:05:18
Speaker
elevated balloons. The project was called Project Loan, which was a similar idea to the starting different latitudes, essentially, launching gigantic balloons into the stratospheres, then streaming the internet from building a mesh network and using jet air streams that are at that latitude. And so essentially, that's how the
00:05:40
Speaker
propelling boards for those buildings. There is no other propellant other than the natural jet steams in the atmosphere. And so we use that to stream into natural areas. And I was working on a flight analysis system so that we can understand what's going on during the flight and read data from all the sensors from the board. And then analyze the data, essentially.
00:06:09
Speaker
So a lot of streaming and stuff like that. And then around that time, I got approached from Uber. From our investor circles. And he invited me to Uber's HQ. It was before Uber became the unicorn. Now you know your reach.
00:06:37
Speaker
That was before all the controversies and all of that. Basically, end of 2014, Uber was still growing really crazy.
00:06:52
Speaker
building the logistics platform for the world and optimizing how people are moving around the world. The world just seemed kind of natural to me. I actually sold my car literally two or three weeks before Ben approached me from over just to get incidents. But I got rid of my car because I wasn't using it for a while. And so they didn't have to try hard to convince me that this is a future because of that.
00:07:18
Speaker
And the rest of history essentially just quickly went into the interviews.
00:07:25
Speaker
I ended up in the early team at Uber, which was just forming around that time.
Uber's Self-Driving Unit
00:07:30
Speaker
And the idea was to build the marketplace for Uber. So essentially, the system that, in real time, can analyze all the supply and demand and optimize the economics for a three-sided marketplace, the rider, the driver, and the Uber. So it's a large optimization problem at scale that is based on behavioral economics with researchers from Columbia University, from Toronto University, from MIT.
00:07:57
Speaker
professors in behavioral economics. So they wrote a lot of papers on surge pricing. Initially, that was the revolutionary paper like Dan and other behavioral economic problems. And so my job was to figure out how to translate that science into the systems that we ran at Uber down. So that was what I was working on before them joining the self-driving unit at Uber, where it was one of the first hires on the new team as well to build
00:08:27
Speaker
similar platform, but for soldering cars, actually.
AI's Industry Impact: Past and Present
00:08:30
Speaker
So my team owns the full stack vehicle infrastructure from activity to the cloud from the vehicle, remote assistance, telemetry, security, cloud, and data streaming, essentially. So that's kind of my story before founding Mnemonic and now working on the company that I work in, the company on the AI space. Super exciting. Scott, how do you feel about Andrew's background?
00:08:57
Speaker
Yeah, well, it's it's one of those kind of crazy things of when you, you know, I mean, I lived in San Francisco for a while myself in the Bay Area a couple of times for five plus years. But it is one of those things where you're just like, I knew this one person and they pulled me into this thing. And either it blew up one way or another. It either blew up in a good way or it imploded. And, you know, that you have a lot of those kind of stories. But it's it's so much of this is just
00:09:25
Speaker
I think we're in that space of when we were looking at large scale data and things like that now where people are doing the same thing around AI, but it's difficult to tell what's snake oil and what's not, because you have a company, I don't remember what the company's name was, but Databricks bought a company that was a year and a half old for $1.3 billion. Right? Like...
00:09:49
Speaker
Yeah yeah yeah and so it's one of those things where in a space like this you can either have things that just absolutely take off either because they found that product market fit or because there's enough hype that somebody buys it whether that's investors or.
00:10:05
Speaker
an acquirer. But yeah, I think it's a really interesting space. And I think the background, you've seen these kind of what is a bubble versus what is not. And Uber managed to really catch something that was huge, but there were so many things that were the Uber of blah. And that just, even that raised hundreds of millions of dollars and just imploded because they didn't have the actual
00:10:29
Speaker
economic staying factor or the differentiation of well like a car has such a massive initial car moving back to the US soon. I have to buy a car when I move there and so it's just gonna be unfortunately I'm not gonna be in the bay area where you can move around without a car very easily but yeah it's it's I think you've seen enough of these things of.
00:10:50
Speaker
What makes a difference and what doesn't so i'd love to i'm sure you is gonna ask about this but i wanna understand like. How are you making sure that you're not following out a stake oil kind of thing or that you're you're finding the product market fit that actually matters i love to hear that throughout the conversation.
00:11:10
Speaker
been here long enough, not just in the Bay Area, but in the industry overall.
Enablers of AI's Rise
00:11:15
Speaker
You have seen a couple of waves. The first wave was the internet era in the mid-90s and early 2000s, where I've been working different jobs while studying at the university, just the night shifts of the internet provider next to board kind of thing, figuring out networks and servers.
00:11:40
Speaker
So that was the early internet era where a lot of connectivity was shaping up. That led to e-commerce and then explosion of the patient language. So we went from infrastructure to communication and that's where the web became useful for a lot of people because they could transact safely and then social media was born and so on.
00:12:05
Speaker
So that was the kind of the way it was enabled by connectivity and a lot of the networks that had built-in communities by companies like AT&T and others, mostly AT&T, actually. And then in the two towns, we've seen other ways where it was a mobile app, essentially, and that's Uber would not have been possible.
00:12:26
Speaker
And actually all this was very shown anything that was pitched like Uber for us, because Uber was in a way, not a phenomenon, but like it was sitting on the perfect intersection of the existing industry that was long awaiting for disruption.
00:12:42
Speaker
a huge massive market and also in a way corrupted and hard to penetrate. Only people like Travis could actually do that. I don't think that if Uber had anyone other than Travis, the founder and CEO, I don't think it would have been a success story that we've seen. And then a mobile, obviously. Without mobile, that wouldn't be possible. And overall spread of cell networks around the world so that people could actually use it.
00:13:11
Speaker
If you take away any of those variables, that wouldn't have
Data Quality for AI
00:13:14
Speaker
been possible. Now, we're seeing a third layer, kind of a third iteration with AI, actually. And a lot of the industries where there is going to be this overlap of the whole industry, like healthcare, and that's why I'm trying to work in that space, and the AI, which actually makes a lot of things possible that were not possible five years ago.
00:13:41
Speaker
So that's kind of like a similar, so. I have a question. I like how you again, I just love how you structure that what wouldn't be possible without
00:13:54
Speaker
cell availability and people get used to take-ups. And you also highlighted that there is a new era of AI, which is equal to LMS today. But what do you think the essential variables that enable this era? I think just the accessibility of this technology, like
00:14:22
Speaker
Basically, up until last year, probably, AI was mostly the field for researchers and scientists and hardcore engineers. We were playing with a lot of features, data, and filling with models and so on. There was just not enough capability of that.
00:14:44
Speaker
And so in the last couple of years, we've actually, well, first of all, we've seen a demo, which I would say consumerized this technology in a way, right? It's not like it was invented last year. It's just the real working demo was showed for the first time that people could actually see the value of that.
00:15:10
Speaker
And I think that changed a lot of things and it was sort of like a mind shift. Similar to how mobile happened as well. When people saw what's possible with mobile apps and how easier it becomes to do a lot of things, then only it started kind of catching up and to grow.
00:15:30
Speaker
I also have a theory that the volumes and diversity of the data that organizations collect today also enable the AI era as well. We're not talking about the quality and the compatibility of that data, but this is definitely one of the enablers of this era. A lot of how you highlighted the accessibility and I think part of that, that syllable management were able to play with it.
00:15:59
Speaker
was amazed by the results and I think this was a buying high level of management to actually start investing in it and seeing the well you're hands on. That's a great point in terms of data I completely forgot about. But yes, a lot of the fundamentals in top of which the modern AI is built was available as a research for quite a while, like more than 10 years ago. And some of the
00:16:29
Speaker
papers that have been used to build, you know, like transformer architecture and
AI's Disruption Potential in Healthcare
00:16:33
Speaker
so on. Those principles and methods have been developed in every 90s and so on. You can go back as far as like 80s. What was missing is the infrastructure, first of all, to train it and to data as you pointed out. And data was not have been possible without the growth of the internet and mobile era. So that's kind of like back to my previous point is that it's all kind of each each
00:16:59
Speaker
iteration, it kind of like creates another layer that enables the next layer to be built on top of it. So I've got a quick question for you. Well, I mean, it could be quick. It could also be a five hour discussion on this. I was at Datastax, one of the company Apache Cassandra, so big data space, all that fun stuff. Do you think
00:17:21
Speaker
that AI is the thing that makes kind of the approaches we were trying in big data work or not. My personal one is still not because we were collecting all this data that we weren't sure how we were gonna use, so we weren't collecting it in such a way that we could end up using it, even if we found that we'd have to go back and do a massive amount of reprocessing or cleaning up the data after the fact. How do you think about
00:17:49
Speaker
companies in the world of AI not falling into that same damn trap of collecting data for the sake of collecting data and expecting there to be value in it somehow when part of the thing of AI is to, you know, when machine learning in general is to go in and go like, what is this telling us? Like, how is this helping us optimize this? What is it? What's actually happening here? Like, how are you seeing people go maybe down the bad route versus the good route and in kind of the AI space with
00:18:20
Speaker
the trappings of big data that didn't work. Yeah, that's true. I think that actually cleaning data has been pretty much solved. And essentially, all the problems that require you to figure out the quality of transfer or migrate data from here to there, processing all the ETLing and cleaning up, all of that, that's pretty much been automated. And a lot of our companies are doing that in our tools. What I think is the problem, like you said, is data quality.
00:18:49
Speaker
And that's going to be the next frontier. Actually, I would go as far as to say today, it seems like any kind of data is gold, anything. And then there is different quality of that gold. But it is gold anyway.
00:19:16
Speaker
There is a public internet, right? And there is all kinds of degree of quality there. And most foundational models are built and trained on, not proprietary, but mostly on the public data set. And that's where they have to deal with a lot of quality and moderation and things like that. They make sure the model and so on.
00:19:42
Speaker
But the, um, I would say there's an enterprise space where the data has always been like in the last 20 years or so, the data has been collected and stored, uh, in some way, mostly structured, uh, either it's logs, uh, or actual structured data somewhere in the storage, hot storage or cold storage. That doesn't matter. And that's mostly quality data, but it's hard to get and it's proprietary, right?
00:20:11
Speaker
And so I think a lot of the AI companies that train foundational models will pay a lot of money. So there's going to be a battle for compute and a battle for this proprietary data as well. But I think that quality solves itself with the maturity of the business. The more mature the business, the higher odds that they have quality data inside.
00:20:38
Speaker
Oh, this is the iPad. I need to say that you sound way more optimistic than, for instance, Gardener, that claims that only 4% of organization data is ready for AI. Is it at this 4% or 3%? There is, you know, such a fraction of the data that the organization has. And you know, Gardener... How did you find the readiness, right?
00:21:03
Speaker
Okay, okay. Well, this is what you shared with me that actually to get ready this golden data set that you're going to train the foundational model with, it takes much more time than actual infrastructure and this model and everything.
00:21:30
Speaker
The training and making sure it's reliable takes more time. And you guys, I want to argue here because you guys are building everything from scratch. And when we are talking about big urbanization, which legacy of data, legacy of code, legacy of logic, I wouldn't be that much sure that their data is actually somewhat close to accepted rate of readiness that you or me would imply.
AI Regulation and Ethics
00:22:00
Speaker
saying that there is no processing or any kind of retelling is not needed.
00:22:10
Speaker
alluding mostly to the actual bytes. Those bytes are stored somewhere, mostly in a structured way, somewhere, whether it be CSV or Excel files, or even some weird 30-year-old binary format. I don't know. FoxPro or one of those MySQL things from the 90s, it doesn't really matter. We know how to extract this data. The actual value is in bytes. The rest is just the engineering problem.
00:22:40
Speaker
That makes sense. Just to address it, which you're saying if they have data, this is already something where they can start from and make sure they clean it and prepare for the usage. Okay, that makes more sense to me.
00:22:57
Speaker
I think for a lot of these enterprises, especially the big ones, they will definitely have a hard time catching up from this AI wave. But they have a huge desire for automation and improving processes. The partner I published last week, they have automated how many
00:23:12
Speaker
thousands of their representatives with just one LLM, it wasn't in use. And so you can think about how the enterprise companies want to do the same, just even from the economic standpoint, but they have, but they like expertise and some share inside the company. So I think there's going to be an emergence of kind of like a boutique style companies that just provide services.
00:23:37
Speaker
And there are a bunch of consultants. You can call them forward deploy engineers, which volunteer by a year. But that's probably going to be the case. We market for that. So you mentioned that these big organizations will want to invest more. Whatever. What do you think is the biggest challenge for these big organizations to hop this along with?
00:24:08
Speaker
I think safety is a big concern for a lot of them and making sure that, first of all, it's difficult to exploit, especially if these companies are thinking about deploying these models for their end consumers. That's where the big risk for them is because a lot of these models are susceptible to all kinds of attacks.
00:24:34
Speaker
jail breaks and obviously that can damage the brand and the business in some ways as well. So I think that's going to be a big area ensuring that safety is a lot.
00:24:53
Speaker
companies are trying to implement some kind of guardrails, whether it's whether these guardrails are applied as part of the fine tuning process, early chef, or in real time when we're essentially an inference. So one of the ways people are thinking about it right now.
00:25:19
Speaker
What was I amazed when we were preparing for this call and we discussed the topic and my question was about this guardrails and safety measures that the organization needed to put in there. That the cases when users could retrieve this data set that the model was trained based on
00:25:48
Speaker
Yeah, that's exactly the jailbreak. That's what it means, basically, in many cases. I actually posted it on my Twitter a couple of days ago that
00:26:01
Speaker
That's a lot of the problems in AI space, especially with LLM. LLMs have this interesting emerging property, which is behavioral programming. Essentially, you can ask the model to behave in a certain way or inherit a certain style and so on. I think that this opens up
00:26:26
Speaker
and essentially the opportunity for a different kind of programming, or a different kind of thinking about how to program the systems that are missing. Instead of procedural or semantic programming that we typically use in regular deterministic languages, here we're dealing with behavior and non-determinism. And so that means that a lot of the approaches and methods from
00:26:54
Speaker
game theory or role-playing, actually one of the key to extracting the value, I'd say. And these are the same methods that are used to jailbreak models. And so by role-playing, you can actually treat the model to think in a certain way, or direct the output, I would say. The model is not really thinking, but direct the output in a certain way.
00:27:20
Speaker
You know, like there was that thing of, you can get, you know, you can't ask chat GPT or whatever to give you the recipe for making a bomb. But if you say, my grandma was telling this story about how you make a bomb and, you know, I want to help, you know, her to remember the story. So if she were to tell that, how would you do it? And then it'll just spit out, you know, all these instructions for making a bomb. And it's just like, you like these, you know, I mean, we saw the whole,
00:27:48
Speaker
Making random diverse pictures that shouldn't have been diverse pictures and things like that and all this stuff like. Yeah there's or there is just literally like trying to figure out what somebody else put in or chat gpt basically getting caught with their. Open ai getting caught with their hands in the cookie jar where you could get it to basically spit out an entire new york times article almost word for word.
00:28:11
Speaker
Because of these things and so it is like what are we trying to accomplish with these things when we trying to accomplish with a high.
00:28:19
Speaker
Is that necessarily bad when it's consumer facing? It's very bad, but when it's internal facing to your organization, is it bad if it spits out something that has the quality of a New York Times article? No, but you don't want it to be plagiarized. There's all these ethical issues, but I'm hearing that ethics people are talking about ethics,
00:28:42
Speaker
but anytime i talk to companies are you finding that they care at all about me you said you're in the health care space they're really supposed to care about ethics they're really supposed to care about privacy but a lot of times you know they kinda hand wave added up to. Yeah it's what we should do and we kind of do it but like are you finding that that. New ways are emerging to do that or that that people are really.
00:29:06
Speaker
Yeah, I don't necessarily agree with this whole alignment concept. The objective for every domain, like we mentioned healthcare, let's say there is finance, there is other domains, right? Each of these domains might have different objectives and standards. Obviously, in healthcare, we have a certain system that exists today and
00:29:34
Speaker
There is a certain expectations that from what potentially I cannot do. You probably wouldn't 100% rely on AI telling you to take certain drug or not. Just like, I'm just going to start. Because first of all, you can imagine the liability for something that those kind of cases will definitely happen. And even doctors today, they
00:30:02
Speaker
kind of different kind of coverage and insurance for themselves. If something happens to you, they're not liable, essentially, for that. And insurance covers them. A lot of doctors spend almost like half of their income on those insurances sometimes. So they're welcome. All LLMs are not protected. Or can you imagine a company that develops this kind of LLM? What kind of liability they might have? So certainly,
00:30:29
Speaker
healthcare might have a few different directives from the finance, for instance. In the public domain, it's also like the foundation allows your GPT. There are definitely other objectives. And I don't necessarily agree with this kind of like general alignment. Like someone said a couple of days ago that they don't want to align the models, they want to align you.
00:30:55
Speaker
We saw that with Google, for example, and the kind of agenda there.
00:31:07
Speaker
pushing, but it's clearly part of the whole process of model creation, training, creation, things like that. Those policies are embedded in this entire pipeline from the inception to when the model is released.
AI in Healthcare: Improvement or Hindrance?
00:31:24
Speaker
So yeah, answering your question, I just think that it should be defined per domain and generally should adhere to the existing laws. And if some of the laws are flawed, they should be just improved.
00:31:36
Speaker
not enforced at the scientific or engineering level or other ways. I think people are just not understanding with AI their vectors of harm of like, what could go wrong? I think we're still experimenting with it, and yet we're experimenting in the public of what could go wrong in that.
00:31:59
Speaker
That has not gone well for perception or? Well, this is also the fastest way to test things. You know, this exposure and then the public access. Yeah, there we go. It's a big social test. I don't know if you've heard about that, but lawmakers in your parliament are ready to vote for the AI Act. Have you heard about it? It's a good one.
00:32:28
Speaker
they're trying to create in the UK and EU as well. So this one, so yeah, just I'm going to unboard you really fast about this one. So basically, you know, EU was on the pants of all the privacy and security regulations and this AI act was actually in the making five last years. So I'm going to read briefly. So what they want is to regulate the AI applications.
00:32:58
Speaker
They also benchmark it. So, long-risk systems such as content recommendation system, spam filters only face light rules such as revealing that they are powered by AI. What that means, if there was a picture created by AI, it should be labeled tag that this is AI-created picture. What is interesting to me in this case is that you was on a front making sure that pictures are not edited with.
00:33:27
Speaker
Photoshop. So what fascinates me, how the hell they want to regulate it and safeguard that this is actually happening. I don't understand how on earth they can sort of guard rail it. This is the first question. And the second is that high risk users of AI, such as medical devices, regional infrastructure like water, electrical network,
00:33:53
Speaker
will face tough requirements like using high quality data in providing clear information to users. And this is actually very close to what you guys are doing at your latest startup. I don't know if you're considering it to be a low risk application or high risk. It's sort of an edge and it's going to be interesting. So yeah, coming back to the question, how do you see them actually
00:34:24
Speaker
ensuring that every picture, every AI-powered application is tagged. And second one is, how do you think about regulations in medical space, for instance, of using AI and high-quality data? How is he going to regulate it? I think the first part reminds me of the GTPR thing. And honestly,
00:34:54
Speaker
I have not heard of anyone that would say that these laws or regulations have actually helped. Instead, it's created so much pain for everyone in the whole industry and for the users as well. Technically, they could use on every site thousands times a day.
00:35:18
Speaker
led to a really bad experience over. I don't know if there is any actual usefulness in that part. Maybe I'm not aware. My web fault is not going to agree with you. My web fault is not going to be useful. I mean, definitely the user's data should be protected, no doubt.
00:35:44
Speaker
All the best practices in terms of, you know, depersonification, PI data, encryption, those kinds of things, encryption and transfer, they largely exist already. And the free market as we have it demands those things itself. You don't necessarily need to regulate.
00:36:06
Speaker
Sure, companies should be responsible and if they are not responsible, people are not going to use them. Also, it depends on the domain, like I also said in the beginning. Some domains might have completely different expectations. Again, health care is way more... The costing mistake, I would say, is pretty high. That's where the domain lever, in this case, I must say.
00:36:39
Speaker
I think this regulation, I don't want to sound like unprofessional or whatever, but this is at this state and saying that medical industries need to make sure they have the highest quality data. They are doing that to this time already, because as you mentioned, the stake of their mistake is just image.
00:37:02
Speaker
So having that double down in this use case is just an obstacle for them to innovate and to move further because I assume they would want also to have the samples of the data or provide the reports like it's going to be a little bit pricey as I feel. I'm not sure about that disclaimer, but this is an obstacle towards innovation as I see it when we're at higher risk applications.
00:37:32
Speaker
I agree, and everyone should watch this talk by Bill Gurley, the founder of Benchmark. We have to tell where he talks about regulation. It's very spot on talk that everyone should definitely watch. The gist of it is that any kind of regulation benefits only the incumbents always. That's the case because they have the longest timeline and they have the largest budgets.
00:37:57
Speaker
And in fact, most of the regulations that exist today have been lobbied by consumers because that way they protect their market and protect themselves from the bottom-up competition. They protect themselves from the two people in the garage and Mountain View who can just create a better product, better experience.
00:38:21
Speaker
You may not have the breadth of all the features and the client base that the human has, but they can solve one particular narrow issue much better and more faster. But if there is thousands of regulations that they have to go through and get a close, or that just prevents a lot of the innovation. So that's largely the case in many industries.
00:38:43
Speaker
You can understand that this is the point of you, basically, that I genuinely share with you. And this comes from our experience, from our background and from our level of faith and ethics.
00:38:56
Speaker
Yeah, that's the problem is regulatory is a framework for clamping down on those with no ethics.
AI and Inequality in Healthcare
00:39:04
Speaker
And that's that's the big problem that if you don't have regulation, then the unscrupulous the unscrupulous will still try, but they'll at least, you know, fewer of them will and the punishments ramp up. But if you don't, you just have
00:39:17
Speaker
You know people polluting and you know treating human life as if they're just experiments instead of that so it's just you know it is a double edged sword though exactly if you say of like the ones that could do it far better.
00:39:33
Speaker
They have to comply with silly regulations. This is why the Danish government has a really interesting thing where they have somebody that's actually part of their government that every single law is reviewed by somebody who's a data expert.
00:39:48
Speaker
And the data expert helps them rewrite the laws so that companies can comply with it. Because like you said, GDPR in what it was trying to do was great. How it actually did it was very much lawmakers that don't understand data or anything like that. And so I think they tried to go a little bit further with the EU Act. But like you said, the lobbyists end up going for the incumbents to make it
00:40:18
Speaker
High high risk to anybody that's trying to come in and take anything that they're trying to take away and that that is a problem with a is that we have people like. Open a i don't i think are very very poor on the ethics side because they said what we just have to be able to take contact from anybody if we if we don't that we can't develop this. It's like well but that doesn't give you the right to be able to do it because you're just declaring that this is a good and therefore.
00:40:48
Speaker
you can do it so it's an interesting conversation so how are you seeing that kind of manifest in the healthcare space because there is so much regulation and there but there is so much good that could be done but there's so much so so much bad you know i mean if if trump gets reelected and they can get away with doing away with the american you know cares act a ca or obama care what people call it then
00:41:14
Speaker
pre-existing conditions, you're screwed. So if you have a pre-existing condition, then all of a sudden healthcare providers can reject you from all sorts of things. So like, there are so many potential goods to help people have far, far better patient outcomes, but there's so, so many bads. So like, are you feeling that, that, that again, people are trying to use this as profit driven versus patient driven, outcome driven. Like how are you, how are you seeing that from talking to people?
00:41:44
Speaker
I think that's a perfect example where regulations have created essential led to the misalign and incentives and the whole system is screwed because of that. And no one in this system, neither provider nor insurance, or have any kind of interest in actually solving the root cause of a problem or helping the patient.
00:42:10
Speaker
All the regulation that led to you in this space is that there are just a handful of companies that get to set the rules. Not even your doctor, but the insurance company, can decide what are the, let's say, hemoglobin levels if you're testing for diabetes or pre-diabetes. There are different ranges.
00:42:31
Speaker
you say, well, from like 4.6 to, I don't know, like 5.6 hemoglobin level, HBNYC, is considered a normal range and you have nothing to worry about. And then like 5.6 to 6.2 is like 3D diabetes and like above that is most likely your diabetes. So that range is essentially
00:42:51
Speaker
set by the insurance company. It's based on the science that was available 20 years ago or whatever. There's a whole lot of new science that tells that it's actually not true and those ranges should be adjusted. There's no direct indicator that if you have, let's say, a HPA, A1C,
00:43:12
Speaker
result like 5.3 that it's necessarily good and there are a lot of other factors that need to be considered. But insurance doesn't care because they have this wide range of numbers or like this wide range that they can slot people in that they don't need to pay for because they essentially that increases their margin because they are just charging for insurance and it's unused.
00:43:39
Speaker
And so that's one of the main problems. That's actually similar to what you said in the Dutch government with this data quality expert.
00:43:50
Speaker
It's the same thing that insurance has a scientist who looks at this and essentially decides, oh yeah, this range, for example, is good and we should use it. But whose incentive for this scientist is protecting and who pays that to the scientist, right? So you should always follow the money trail and whose incentive is effective in this system. So at the end of the day, the patient is screwed no matter what.
00:44:20
Speaker
That's where the problem is, but also from the perspective of the health provider, the doctor's hands are tied in a way because if they don't follow the standard codes that are defined by insurance, they're not going to be paid, right? And that's essentially, yeah, that defines this circle and makes the system completely misaligned.
00:44:48
Speaker
the patient is the one here who is the most. Do you think that AI can bring us back more in alignment? Because what we're seeing with a lot of the AI is that it's creating more
00:45:10
Speaker
income inequality, that's what we're seeing with the chat GPTs. We're seeing that the haves are getting more haves and the ones that are kind of struggling. So do you think that that AI can be in general a good? Because right now, in a lot of cases, it's a bad for the vast majority of people, at least from what I've seen. I definitely think that AI will be a huge kind of equalizer in the system and might lead to
00:45:40
Speaker
rebalancing of the incentives in general because the AI becomes generally available. The data is largely available. We all have our
Reflecting on AI's Societal Impact
00:45:50
Speaker
variables. We all have all kinds of devices that are now available to us.
00:45:57
Speaker
for glucose level measurement, for example. It used to be that you have to pay $300, $400 a month to have a glucose monitor, and it has to be prescribed by the doctor. Not anymore. Last week, FDA approved the CGM device that can be bought over the counter. So that will drive the prices down. The tech in itself inside the CGM is pretty basic. So eventually, maybe a year or two, that device will cost the same as the digital thermometer, most likely.
00:46:26
Speaker
So more people will get access to it. That means more data is coming from all kinds of sensors, how you slept, what's your glucose level, how you're exercising, what's your diet, how you slept, like if someone is using AIDS sleep or some sleep tracker or Apple watch, that's all the kind of data that you can collect. And by using this data and the real latest scientific research, that's where AI
00:46:51
Speaker
not just generative AI like an alarmist, but also all kinds of predictive models can be used to really assess your condition better than any doctor can and really diagnose the root cause quickly and in an affordable way. I think that that's going to be
00:47:12
Speaker
a future and a huge disruption as well where essentially like all this insurances will have to fight. You actually will see maybe in a year or two how
00:47:25
Speaker
all the narrative that is going to be developed by lobbyists from all these insurance companies, how harmful AI is, how bad it is for the provider, in disguise of protecting the patients, of course, because anytime you hear that someone is trying to protect you from something, stay away. That's probably not in your interest.
00:47:48
Speaker
being any policy being created. So you'll see that this is going to emerge in a couple of years for sure and how protective they will be for their piece of pie. But I think that's just inevitable. You talk so beautifully about the future and this, you know, it's just so inspiring the way you see the application.
00:48:10
Speaker
But also, the realisation of children. So actually, most organisations can win if they would be focusing primarily on the problems they can solve. Like not trying to solve all the problems with LLMs or whatever, you know.
00:48:27
Speaker
You have to just focus on something where you can safeguard that as well, secure that enough and be super competitive about the data as you source to this LLM model. So in this case, different organizations can win, not just in medical industry or whatever. I don't know, it was super inspiring listening to how you see, how you see particular
00:48:57
Speaker
with some specific framework. I really enjoyed that. Awesome. I actually also think that they're just talking more generally, more broadly about AI. I think we're still
00:49:09
Speaker
in this early stage where a lot of the things have been tested. Twitter and the whole space is full of amazing demos that can do this and that. But those are demos. And demos are typically created on one specific success path and on one specific use case. And if you deviate from that,
00:49:36
Speaker
you'll see that it's still not there, and there's a long way to go for main-end technologies. And generally, people are thinking about these LLMs. They're almost going to be the only technology that's going to replace everything. But in reality, that's a very narrow use case.
00:49:58
Speaker
It's not that it's a text generation model that is trained on text. What it does is it predicts the next token, the next character. All the outputs that you see are generated progressively by looking back at the beginning of the output, the first token, and then essentially every next token is generated by looking back. It's essentially how it works in a very basic way.
00:50:28
Speaker
Just last year. Okay. We just lost you something. Okay. You're back. Yeah. And, um, okay. Yeah. And you can't expect from that model from that architecture, uh, to reason about anything. That's why.
00:50:42
Speaker
That's problem solved. That's why we, and there is a whole emergent kind of trend in terms of augmenting the model or the context window with different information that might explain also why you see it as like New York Times article output. Well, that's because that article was indexed and it's stored in an either embedding or some other structured storage and it's retrieved during the generation.
00:51:08
Speaker
What you input into the model is not what the actual model is processing. There is a lot of layers through which your query is going through. First of all, it's guardrails. Make sure you're not asking for something that the model should not be answering about, like harmful time to those kind of things. Then your query is rewritten to optimize it for the better results. So the actual, like what gets into the final model execution is not what you wrote, but you elucidate English in your text.
00:51:40
Speaker
Not about you in general, like if people are, you know, typing questions and queries pretty high level, like I want to do this and that, which may not necessarily be the most optimal query in the model to produce the good output, right? But all these companies are optimizing for the best output possible. So that's where you want to make sure that you generate something that is useful.
00:52:02
Speaker
But then in this pipeline, essentially, the code that is written that is before the model even figures out what is to be received to help answer the question. And it might indeed use a whole chunk or multiple chunks from this New York Times article, for sure, embedded into the context and summarize our answer from that.
00:52:24
Speaker
And that's what most developers are using this to keep in municipal grant. What they're using to build agents these days and all these applications that we see. But that's augmentation. It's augmentation and it almost looks like a model could be used in the major system or not. It's just augment of that additional context.
00:52:46
Speaker
And until we get to actual reasoning, that's probably when we will be closer to AGI. But AGI is pretty far away, I think. So we need a conceptually different architecture for the model to reach that level. Okay, Andrew, that was a lot. I enjoyed every minute. Super fascinating, you know, how you see the world, how you see the future in technology.
00:53:17
Speaker
I want to say thank you so much for that. It's super cool. And also, I would recommend adding your substag as a link to our show. Does this particular episode, anything else you want to share?
00:53:35
Speaker
No, just reach out on Twitter. That's where I post all of my blurps that I wake up with. So Twitter is your natural habitat. Okay, good to know. Thank you so much. It was a pleasure seeing and talking to you today. Yeah, take care. Thank you for inviting me. Thanks.