Introduction to Adnan Hodzik
00:00:01
Speaker
Hi, everyone. It's Yulia from Straight Data Talk. And today I'm happy to have Ednan, who joined me this Monday morning, um from ING. So, Ednan, please jump in and introduce yourself.
00:00:16
Speaker
So yeah, thanks for having me, first of all. ah So hi, I'm Adnan Hodzik, or Hodzic, and I'm an area tech lead for data analytics ah platform in ING, focusing on data and infrastructure and AI journeys on Google Cloud.
00:00:32
Speaker
ah Been with ING for almost seven years now, and ah in general, I have over like 20 years of experience various roles and mostly related to being like cyber liability engineer, building and developing various solutions over the years.
00:00:52
Speaker
And outside of work, I'm also a Google developer expert in cloud for a serverless SAP deployment. And I'm also ah very active in open source community.
00:01:03
Speaker
I ah created numerous open source projects. And my numerous, well, my most most popular one being AutoCPUFREC, which has like over 60, well, almost 7,000 stars on GitHub and over 100 contributors. I think that's very important.
00:01:21
Speaker
And I also have my little YouTube channel and 18 years blog post where I share some of these projects. So, yeah, that's a little introduction about myself as I could keep talking. so up That's a
Adnan's Daily Routine and Discipline
00:01:36
Speaker
ah The question I have at this point is, when do you sleep? yeah boom oh That's a good ah because ah ah love that question because people usually ask me, like, how do you get to do all of these things?
00:01:48
Speaker
And I like to say it's like ah discipline and determination, but it's actually just not getting enough sleep. Yeah. yeah i ah I try, i like I have a very yeah strict routine. I wake up every morning at 6 a.m.
00:02:05
Speaker
Yes, I'll work out also again every morning. And then I'll try to do some work, ah at least private work before the kids and everyone that wakes up. And it helps me with, you know, get a head start ahead of everything that's going to happen today.
00:02:24
Speaker
I have no idea how you do that because I'm not even talking until I have my first or sometimes second cup of coffee. So, um yeah, 6 a.m. sounds as a mission impossible to me.
00:02:37
Speaker
But I'm glad. I'm glad you could do that and you found the time to join me.
Overview of ING's Data Analytics Platform
00:02:42
Speaker
So, Adnan, you mentioned that you're responsible for um data platform which is built on Google Cloud and ING.
00:02:53
Speaker
And, you know, so I saw a lot of data platforms say in my experience. What is a data platform in your case? Like what it is Could you explain little bit from architecture and use cases?
00:03:07
Speaker
So basically, ah data and lyrics platform ah in ING is a place where we enable ING users and tenants ah to build ah and now explore, build various solutions on ah AI ML data.
00:03:27
Speaker
now in particular on Google Cloud. ah Until this point, and still, we're still using ING private ah cloud. But basically, that's kind of what we're doing. But ah because data and they are AI are incredibly important in ING, of course.
00:03:45
Speaker
So DAP is ah an internal data analytics platform, and we take care of ah all the infrastructure, software, and data so that you as our users can focus on the actual AI or data applications that you're building.
00:04:03
Speaker
And we also take care of ah compliance, risk, security work, so ah tenants don't have to worry about it because this is a huge thing at banks. and Yeah, all in a goal that our data analysts or data scientists can focus on transforming this data or building models.
00:04:23
Speaker
Or that you as a cloud engineer can also build your AI or data applications. ah I mean, that is also to... help you ah like get a better picture.
00:04:34
Speaker
It's also reflected in IG app, for example, with features like ah look ahead and stuff like that. So there's a couple of, that's the most important one that I can think of.
00:04:46
Speaker
But that's like a summary of what a DAP or data analytics platform is right now. ah We're also now, just so you know, ah because I'll say it a lot of time, we're also now, ah because this area that I'm talking about ah now consists of like around six teams.
00:05:05
Speaker
And now we're actually becoming a part of much ah bigger initiative called Vista, which not to confuse with Windows, with the which is a value-driven insight and scalable technology and analytics.
00:05:23
Speaker
and So, okay. what I have a practical question.
Challenges in Data Anonymity and Security in Banking
00:05:28
Speaker
So when you talk about the data platform, as far as I understood, this is making data available for different use cases, making data discoverable and and making sure the teams can access it and and model and embed into the application or if for internal use cases.
00:05:46
Speaker
but But to do so in banking is like incredibly difficult as in ah um you need to make sure that data is anonymized at some point.
00:06:00
Speaker
At some point, it should be a very um strict access, rigorous policies and places. That's a lot. So I want to talk a little bit about the scale.
00:06:11
Speaker
You mentioned six teams. I'd like to understand how many people ah potentially have can can access this data platform? ah How many projects are we talking about? How do you measure your scale? How do you kind of govern this data platform?
00:06:30
Speaker
So ah it has a lot of, because so yeah, this is six teams. I'm only referring to the teams that are working, for example, now on the Dapp GCP or the Google Cloud or transformation.
00:06:42
Speaker
There's actually a lot more teams that are part of a Dapp or data analytics ah platform. But because we also create like various things, model development, dashboards, and reporting.
00:06:55
Speaker
So I think with that, I don't know the exact numbers, but I i would say that at least we have like maybe 2,000 users internally inside of bank. Yeah, at least for the and analytics and everything.
00:07:09
Speaker
And these can also be of various different teams. So that is pretty ah big. And now you've been very modest, pretty big to sell that users can access data platform. Yeah.
00:07:22
Speaker
So the the another important thing, I actually also describe this as a part of some of the talks I give publicly. But one feature that ah the ways that different is that it has a portal, right?
00:07:36
Speaker
So ah ah then you as a data scientist can just go easily and click around and basically create your projects. and then for your use cases.
00:07:48
Speaker
And in the future, we plan to move the whole, ah oh well, cold app from on-prem to a Google Cloud with the same purpose, and that you could even ah manage ah some of ah the data permissions using this portal, as it's very hard to maintain all of this.
00:08:07
Speaker
um it's It's easier when I've given a talk, because then I also have slides. Yeah, but For example, ah for all of the data that's coming in, we we call that a data plane or a shared ah um share core data layer.
00:08:25
Speaker
And this is where we store all of the data. And it has very rigorous and strict up policies regarding the data. ah we're automat me well We're using Terraform for all this.
00:08:38
Speaker
And we're even ah actually thinking of using some of our own ah APIs because even with Terraform, the scale we're hitting it it could make it easier for us to just have our own um APIs.
00:08:52
Speaker
And this is all evidence. So for example, all of the service, ah this was ah one of the first things that we had to do ah because ah who has access to this data? Like the service accounts that are used, right?
00:09:05
Speaker
And ah this was like I remember because I ah personally started a a tool for this because PISC was pushing us like how do we like we need to know because ING for every user that has access to data or is part of a system this is evidence and manually added to like these big Excel sheets and I said would I and but Google Cloud and this keeps changing all the time you cannot do it right so So I started off a tool which basically ah uses Google Cloud APIs and using Python automatically creates the Excel sheets but all the ah with all of the, because it's you it's easier for them to see it, the RISC people, and we do it with code.
00:09:52
Speaker
So ah I think that's a very nice combination when a RISC person asks you something, how they're not seeing something, and we make everything with code. So ah yeah. and Yeah? and We don't want to be talking about the risk person at this point.
00:10:09
Speaker
Yeah. um mean My practical question is, okay, you have the spreadsheet. What happens next? Okay. You you you did well. have no questions to you, Edna. We delivered you the real-time,
00:10:20
Speaker
but evening near real-time information about the access and who has access and what kind of service account.
AI Role in ING's Security Operations
00:10:28
Speaker
But what after that, what that risk person do is that spreadsheet?
00:10:33
Speaker
Oh, so yeah, so it's it's actually real time. So the way we made it is that, and we did this on purpose. So any of them or whoever runs like a pipeline, these ah these files will be created and then they can review them.
00:10:48
Speaker
And if there's like a service account ah that shouldn't be there, they'll like see something like error, like for that field that they should look into that and basically, yeah.
00:10:59
Speaker
yeah Something is wrong. Something is not there. But we'll get to that because that's an excellent point, which basically brings us later on how we're in IG using ah AI, for example, ah for KYC and CID, know your customer and customer due diligence, because these folks also review a lot of data And this is where the AI can come in to do a lot of that work ah manually. But I'm now talking just about ah from our side, from the architecture infrastructure, because we also, for example, love the data besides this,
00:11:36
Speaker
um ah We also have numerous alerts. So, for example, if someone somehow managed to get an access ah and change something in the web UI in Google Cloud, right, we would be alerted that it doesn't match what we have defined in code with Terraform.
00:11:55
Speaker
because there's a drift, like someone changed something here and we're alerted about this. ah Because without alerts, even with all the dashboard, people are not gonna look into that and so no one's gonna notice anything.
00:12:06
Speaker
ah But we also, even if someone has ah escalated permissions and changed something, we will also get an alert that something weird is happening. ah We even had a case where RNG Red Team, were literally trying to hack us And they they got access over one of the service accounts.
00:12:27
Speaker
ah But as soon as they got it, we got an email that there was ah someone was trying to manipulate which gave them extra confidence to what degree we're in control over the platform.
00:12:40
Speaker
So they were pretty happy over this. But yeah, this is this is, so this is that core data layer and all of the data that goes in there. But there are various also so other ways and what data can even go there.
00:12:55
Speaker
If it's personalized data for some of our tenants, it will ah need to be anonymized.
Transition to Google Cloud: Strategy and Benefits
00:13:01
Speaker
And there's various things that go in this process. It's a very complicated topic.
00:13:07
Speaker
just And that's what's the data. There are numerous other, ah yeah, parts. No, no, no, this is fascinating. and The complexity is the scale and the pace at which your guide obviously moving because there is, I can understand there is data readiness even to apply AI and there is also measures in terms of data governance, which has separated things from ai governance and you're already doing it. So I'm very excited about our topic today, but before going forward,
00:13:40
Speaker
and jumping into yeah eye and all the sweet stuff. I got a question about, you mentioned that you have currently, ING has on-prem um servers, right?
00:13:53
Speaker
Which is, how do you call them? The data sensor, multiple data servers. No, there is a nice name. like But anyways, okay, so data centers, like your own data centers.
00:14:06
Speaker
Which portion of your data or workload is on a Google Cloud today percentage-wise? If you can share it, of course. If not, we are happy to move.
00:14:17
Speaker
So now it's so it's not, we're not, this is something I'm not ah allowed to reveal yet or discuss publicly, but yeah, there's there's data that's being moved because that's ah that's the thing you cannot also with everything that's happening in ah um in the role with the data, and if it ends up there, what could happen,
00:14:41
Speaker
There's also the big topic of the sovereign clouds, even with the Google Cloud and yeah air cloud providers. But ah all I can say is that i g has ah really big plans with this.
00:14:54
Speaker
It's only how to do this ah yeah as carefully as possible. as ah like this was even like ah like, I don't know, five years ago, this was like ah even a political issue for ING because all of our stuff is now in the cloud.
00:15:09
Speaker
Should our customers' data be in the cloud also, right? So it ah it ah makes a lot of people uneasy about the whole thing. But all I can say is that all the data that is on a Google Cloud ah is handled with most, yeah, in the best possible way.
00:15:27
Speaker
And there are big plans regarding this, but I'm at liberty to discuss this point. No worries. point um Okay, then the the question I have, because as you mentioned a few years ago, you were 100% on your private data centers.
00:15:44
Speaker
um How was the decision going forward with Google Cloud made? made Why not AWS? and You know, you guys still in Europe, why not Azure? My favorite cloud. How the decision was made?
00:15:59
Speaker
ah So ING actually uses a few clouds. ah So, for example, because this is something that's still being discussed, ah for example, we're using when it comes to the public cloud.
00:16:12
Speaker
So we have ING private cloud ah or IPC in case I say the acronym. We also use Azure and then we use a Google Cloud.
00:16:23
Speaker
And how these things are this the the decided, it goes through a very rigorous process of what ING g is looking for, what is the solutions that the cloud providers are looking providing.
00:16:37
Speaker
ah When it comes to Google Cloud, ah I can speak for this because i was ah back then I was actually part of the machine learning ah platform team.
00:16:49
Speaker
And I remember that ah like ah Yeah, like I was making POCs of moving the whole ah machine learning platform to Kubernetes, for example, GKE.
00:17:01
Speaker
And I even actually gave a talk on KubeCon a few years ago regarding this. ah written these resistances of resist ah Kubernetes resistance is futile.
00:17:13
Speaker
But basically there, we already, like, we managed to, in one sprint, one person, me, like, you should POC the whole thing. ah And what a Google Cloud also, especially for analytics, allows us to do, they have a lot of good ah AI, ml ah Vervices.
00:17:33
Speaker
Products and nails. Exactly. So, for example, Vertex AI. It's like a complete and ML suite with a bunch of things that are there. ah AI, I think, yeah, I mean, yeah, now we can get into which cloud provider is leading this battle.
00:17:50
Speaker
But he ne and Yeah, yeah. So, ah but it was a decision that was basically, so we we were nudging towards that end. um now it's maybe even a prior personal opinion, but I like the Google Clouds. It's very vanilla flavored.
00:18:06
Speaker
Wow. Yeah, because with others, there's too many things added. And then you're not, for example, maintaining a service. You're maintaining things on top of that for that service to run.
00:18:19
Speaker
So the bell and whistles or whatever. So that's another thing that ah so we can all push whatever we want. But ah yeah, elaborate and rigorous evaluations are made with a score cards and whatever.
00:18:35
Speaker
So yeah, for data now we're using Google Cloud. No, I mean like, so just to highlight Vanilla, Vanilla Cloud is totally into, we're talking about good whites of Google Cloud.
00:18:49
Speaker
this is This is a nice sentiment just in case, you know. um Yeah, so is for me it's 100% evidence what is happening in the future is all about ah data and how do you how one can make and churn value from it.
Advantages of Google Cloud's User Experience
00:19:07
Speaker
and And Google makes it so damn easy and fast. And no ah yeah, we we can talk a lot about um Google Cloud user experience and user interface, but it's still better than AWS Nation.
00:19:24
Speaker
yeah Yeah, I was even bothered by that. i like I keep mentioning this because especially with my background and SRE and building everything yourself, with Google Cloud, you literally don't have to worry about some of the things that I would build before.
00:19:40
Speaker
so I'm kind of like letting go of some of the things. But this is also a good thing because then you can focus on on particular thing that you were working on. And yeah, so I also like that. As I said, as much as I would build these things before myself, now it's like, hey, it's there.
00:19:58
Speaker
the And then we're going to waste the entire podcast talking about Google Cloud right now. ah Heads up, we are not affiliated with Google Cloud. I mean, maybe you as a developer, do they pay you something? that's good my name is nose No, No, not really. um I'm literally like Google, no, no, nothing.
00:20:19
Speaker
um This is Shane. I get us some ah free credits and things to keep building, but ah yeah, I'm not. This is major. Okay. Okay. ah So my last question about Google Cloud.
00:20:31
Speaker
What is your most favorite product using Google Cloud?
00:20:38
Speaker
oh I really like ah Cloud Run. Like I really, yeah, i out of all the options um or maybe like, ah so for example, ah I really like it. I even actually submitted up ah ah ah the CFP for KubeCon now, which is, yeah, how we also moved from on-premise Kubernetes to now completely running on serverless.
00:21:08
Speaker
And Cloud Run, I really like it because it's so simple. It's still K-native. ah So it's, and yeah, you don't, you don't, you really don't have, like some of my also private projects, I focus on it, but I love it how simple it is to use, but yet it's very, it's very powerful.
00:21:30
Speaker
So you read, I assume you read the book by Witsa Venema. Yeah, yeah. You read the book? Which one are you referring to? um and ah Cloud Run by O'Reilly. the O'Reilly published the book Cloud Run by um the author It's Venema.
00:21:48
Speaker
No, I haven't read that book. But I actually, ah when I was doing the Google developer experience, I had my technical interview with Witset. ah very And he was very happy with my Cloud Run ah skills because I made some projects where I really go,
00:22:04
Speaker
Yeah, explained it from from nothing to everything and sending it out. You might can't feel I'm sure. Yeah. Yeah, yeah but but this is this is fun.
00:22:15
Speaker
um Okay, we look at we we got we've got sidetracked. and then ah So let's let's get back to most interesting things.
Generating Value from Data at ING
00:22:23
Speaker
So you built the data platform.
00:22:27
Speaker
um at ING, you implemented data governance policies. You are able to track the access and um unexpected behavior across a platform and also de deliver alerts, which is fantastic.
00:22:46
Speaker
But All in all, getting data in and even giving access to service account and and teams and tenants, as as you as you say, it's not enough.
00:22:57
Speaker
we need We need to make sure there is a value. So you mentioned that you guys already apply on AI. My first question is, what are the use cases? Like, what are the use cases you can share with us publicly? And what are the use cases you're most proud of?
00:23:13
Speaker
ah So the INGs use cases are like, ah there's a few of them. There's a whole list. But ah for example, there's it's used ah like in marketing, for example, for hyper-personalized marketing, ah ah customer ah service, because that's ah the chatbots, because that's where everyone started and it's the easiest thing.
00:23:38
Speaker
But I think it's a big thing for ING because I think so really to help ah these ah folks. And what's so also, while I'm mentioning these, ah it's very important to ING's strategy in this with the whole move to AI is not to get rid of people as it is with some other banks, and they're publicly saying this.
00:23:59
Speaker
It's more basically how to leverage these AI tools to help you and get rid of ah some of the manual toil and labor that's happening in these ah and these roles.
00:24:11
Speaker
So that's very yeah very important ah to to mention. So, for example, they would assist contact centers, natural language, and to power the Gen AI platform.
00:24:23
Speaker
chatbots to talk to the customer, for example. ah So that's one important ah thing to mention. Also KYC, the Know Your Customer. So this is ah this is also a very, yeah and I would say this is one of the biggest one because ah this is also for various vectors, right? From ah Money laundry, risk evaluation, mortgages, right?
00:24:49
Speaker
And this is, for example, where it's a very lengthy process. Like I remember being in one of these calls where it takes hours to reuse this information by an actual human, right?
00:25:04
Speaker
And if we're humans, we're capable of making errors or missing some things. So this is where it's very useful for AI to, for example, give you some summary of like, is it high risk? Is it like ah money laundering or whatever?
00:25:21
Speaker
But while still keeping the person, the human in the loop to actually check if this is because ah because That's another and very important distinction between what we're doing at ING and, I don't know, general Gen AI solutions, because we cannot get it wrong, right? Because you're talking about people's livelihoods, ah financial freedom or whatever, right?
00:25:46
Speaker
So whatever you get from the AI or Gen AI, that it's accurate. Because if we use some of the other general solutions, popular ones, they lie they straight up lie to you and make up answers just to ah to make it working. So this is one thing at ING that you cannot do. It has to be right.
00:26:07
Speaker
So that's why there's so much a control ah um yeah regarding this.
AI Governance and Accuracy at ING
00:26:13
Speaker
And that's why also human is in the in the loop regarding this. And ah what would would so KYC, yeah, customer due diligence also is a big one, basically just, yeah, because...
00:26:29
Speaker
As I said, analysts have to sift through a lot of ah documentation or a lot of content manually and ah also from different parts. so this And it can also even be a duplicate effort ah and it's inconsistent information handling. It can have various delays.
00:26:48
Speaker
And I know that this has already reached over like 90% accuracy and i think at least like 50% in handling time, ah which would take enormous, enormous of times amount of time.
00:27:04
Speaker
oh And regarding, yeah, there's also the whole wholesale ah banking, which is, for example, even and reflected ah now with the recent ING app.
00:27:16
Speaker
where you can just ask a question on top and ah you're looking, I don't know, for daily limit. And then it will figure out what you're looking for and immediately give you that feature. ah ah Yeah.
00:27:29
Speaker
And also there's ah there's ah also big a push. ah And for example, that it's called ESG, which is the Environmental, so ah Social and Governance.
00:27:39
Speaker
So for example, if ING is onboarding a new, I don't know what, client, that they're determining how and prioritizing sustainable and other ethical ah practices.
00:27:55
Speaker
Because, yeah. i'm About large, large customers, I guess, right? Yeah, of course. Just so they're not like polluting, because that's another big ah thing ING is really yeah big on the sustainability and just, yeah.
00:28:12
Speaker
Absolutely. Listen, so um what I hear from you is basically going faster, the vast... and different information to identify the next steps and simplify um people's work.
00:28:30
Speaker
And you mentioned it already, but what I want to highlight, and this is one of the hurdles in ah in data the the team works, ah data team job, is how do we measure these improvements, that which which you obviously mentioned.
00:28:46
Speaker
So one is I guess, the ah ratio of errors. and Another one is the effectiveness and the speed and and how fast you can onboard clients, I guess. But what else? Like, um because you, and so I assume you guys sitting on a big workout course today, there is so many people working and you have to justify return on investments.
00:29:11
Speaker
So how do you guys um communicate that within the organization, given the data platform costs and ai initiatives? Yeah, I cannot give too much information because there are some of our tenants and they all have their own metrics and the ways they report.
00:29:28
Speaker
ah But yeah, there's ah there's a constant. ah Some of the initiatives that we're pushing for, for example, for, I don't know, because I also forgot to mention, there's also a big push in centralized knowledge and software development.
00:29:43
Speaker
ah So for example for software development or the parts we're working on ah Yeah, it's ah it's a continuous loop whatever you're building you're not gonna get it perfect from the start But if you keep ah track of metrics whatever you're trying to achieve is how you are gonna get there but that this ah this will of course shop differ from what what you're building because It might be completely different for, I don't know, some KYC and for software ah development.
00:30:12
Speaker
But this is also something that's ah vigorous. And in ING, it's also just to mention, so it's clear, ah because you were also mentioning with data, you cannot just move any data now or anyone just can just go and start using...
00:30:28
Speaker
and drop their data on the cloud, right? It's it's a whole procedure how you get there because we had to be very careful ah to ah to get it right. ah So again, how we get it right, it's it's a constant ah feedback loop of what are you're building and you need to analyze. So depending on the use case, ah yeah, that's that's ah the shortest I can answer.
00:30:53
Speaker
I see. No pressure. So you mentioned that you already use AI, lots of firm use cases on production, directly interacting with customers. They are less intrusive. They don't touch, I guess, a lot of firm personal data.
00:31:12
Speaker
But how what what are the AI governance measures are in place?
AI Service Malfunction and Risk Management
00:31:21
Speaker
How do you pick model? How do you ins ensure it behaves well?
00:31:27
Speaker
You know, it's, you know, how how do you work on that front? ah Yeah, I can just, um again, ah because, for example, from we recently had like, we had an, it was an incident where basically Gemini, for example, wasn't, it wasn't working for one of our tenants as it was supposed to work it worked wasn't delivering, right?
00:31:51
Speaker
But this is something that we immediately, ah again, we have a lot of metrics and what we're collecting, basically what we're trying to achieve. But then we were working directly with Google. Hey, like, what is what is the problem here? This is not delivering how it's ah how it's supposed to work.
00:32:10
Speaker
So, yeah, again, it depends ah what you're trying to achieve and what is what is the what is the what is what is the goal you're trying to achieve.
00:32:22
Speaker
how did you How did you code that malfunction? Like, how did you identify that malfunction in the first place? Well, yeah I think it's ah easy because ah you're just not getting some of ah the answers that you're supposed to get.
00:32:36
Speaker
You see it's completely like, well, hallucinations. But that's because, as I said previously, that's one thing where ING is very strict on because you cannot have hallucinations. You cannot get it wrong because yeah it's very important stuff.
00:32:54
Speaker
ah While this could be ah not maybe be a problem in some other industry, in financial services, it's very important you don't have or have as little health and aid students as possible.
00:33:08
Speaker
um I don't know, like, I have so many sentiments. it It's hard when it comes to LMS, because what I saw that they're trying to be as nice to us as people, and they want to, like, it it reminds me about very advanced gaming ah techniques where they just kind of when you talk to LLM, right, to any model, let's say it's Claude or especially OpenAI models, they the kind of drag you into the conversation with them.
00:33:45
Speaker
And I shared that before, I um i know the person who who got divorced after talking to chat GPT, you know, like, and this is horrifying because, and you know, you cannot use chat GPT as a therapist.
00:34:04
Speaker
It just doesn't have enough of context. It just doesn't have enough of nuances, how people behave and what is okay and what is not okay. You don't you don't take those decisions, you know? And it's horrifying to me because um sometimes as an adult, I can identify that when it comes to personal matters.
00:34:22
Speaker
But when it comes to work matters, let's say I don't have enough experience in something and I'm asking to open an eye, like, what is he but is the answer? Yeah. And then gives me back some months where, and I'm relying on it 100%, but still like capable to identify that this is, make might be not necessarily the best choice.
00:34:44
Speaker
You know, so I would take step back. Now I would check it. Things like that. Yeah, that's but that's exactly what I was ah talking before. I also had an interesting the use case where I was ah trying to buy something and I use a couple of LLMs because i don't fully, you know, like I pick which one is, is ah or you're also kind of tailoring which answer you like the best.
00:35:12
Speaker
There's also that. But um yeah, I also had like a conversation once where like I was trying to buy something and it was i was yeah using ChatGPT. I was also using Gemini and Copilot together.
00:35:27
Speaker
But I remember with ChatGPT because I left, I had to leave my home and I came back to continue the conversation what I was trying to buy. And it was telling me, now it was telling me to buy completely different thing than what it was telling me before.
00:35:42
Speaker
So it's like, but I would even say, what changed? Why? why are you Because I was set up buying that thing. So that's ah that's that's the risky part. And that's why the data you feed ah the these chatbots and AI agents is the most important thing.
00:36:00
Speaker
And that's where, like, ah basically, that's that's where you need to make sure what what the results you're getting are really the results. But it will all depend on the data you're feeding it, the temperature ah that you set it, that you're really getting ah the answers from the data it was trained on, than something else.
00:36:19
Speaker
So that's what I'm saying in the general or public gen ah use cases. they will They will even lie to you because it doesn't know what to give you next. And it will just give you an answer. like I literally type, you literally just lie to me. you This is completely made up, right?
00:36:36
Speaker
And that's so what's ah different um between financial, because you cannot get it wrong. You cannot just tell someone is like, oh, I'm trying to, I don't know, I'm making something up. Like I'm trying to ah ah figure out what my mortgage could be.
00:36:50
Speaker
And it gives you completely different numbers. Like your whole time. Yeah. Yeah, just come up with the answer. Yeah. yeah But again, it's ah also a difficult yeah with what kind of data you want to feed it with. And it will all yeah it will all depend on the data.
00:37:08
Speaker
um But it's tricky because you cannot just feed it everything yeah that you have laying around. but not the the and And the use cases where you apply it, and plus to that the guardrails.
00:37:21
Speaker
yeah we We can set the... Yeah, exactly. Because, for example, the data you feed with, ah for example, for, i don't know, ah centralized knowledge and KYC could be completely different because it it has completely different instructions. you It has a completely different goal, what you're trying to achieve, even if it's the same data.
00:37:45
Speaker
ah And that's why you could even filter some of the data and prevent ah duplicate work. For example, ah i don't know, you run certain data through KYC and then whatever you get, it can also run through customer due diligence before it comes to the human. and But this is this is another thing because i think in the future, the big thing will be a GenSec, right?
00:38:14
Speaker
Because the chatbot is just like ah the interface to all of the information you feed it with. And depending on the data you feed it with, the temperature set, what kind of answers you're going to get.
00:38:27
Speaker
But this could go in various directions, like ah to, I don't know, for self-healing systems even. You're interacting with ah with a chat bot. You have a problem, for example, for software development.
00:38:41
Speaker
ah Like, how do i how do i instead of giving instructions that it even, ah I don't know, make some actions on your behalf. But it's it's very difficult because, again, you need when you need to get multiple things right.
00:38:56
Speaker
So that's why yeah when we're building these things, I keep ah saying that's like, let's first build a shack. before we can build a skyscraper because you learn things, you learn how to set up proper foundations, and then you can build other things. But that's why i think I'm happy with the work we've done at DAPGCP regarding this because we got the foundations right, like in terms of...
00:39:22
Speaker
Yeah, from knowing if anything's going wrong, governance, ah but like data that's being fed. And I think that's very important. ah Yeah, so that's that's that's how I think it should be done.
00:39:39
Speaker
So i got um again, a kind of controversial question to you.
Metadata Management and Performance Measurement
00:39:45
Speaker
So I'm currently reading a book by Ole called Fundamentals of Metadata Management.
00:39:53
Speaker
And he talks about this new concept that he introduced on the market, which is ah metadata greed. And this is basically... but the data from um as a um metadata from data stack, data infrastructure, metadata from IT, and metadata from um who users, whatever the documentation they have in their hands about maybe customers, whatever, and how we can benefit an organization if that metadata will be kind of intersected and communicated sort of like microservice architecture, but for metadata.
00:40:34
Speaker
Yeah, this is pretty interesting concept. but What strikes me at the beginning of the book that he pulls the example where um he was working and some organization, big organization, I don't remember.
00:40:51
Speaker
Let's assume it was pharma, whatever. And um the manager who he used to work with pulled a big map of all the solutions that are used in IT.
00:41:07
Speaker
And Ole, being a student or intern, remember, Ole, please don't hold it against me, goes like, but what what is used and what is not? It's way too much.
00:41:21
Speaker
And this manager goes, this is this is a trick. We don't really know what is used. They're turning off. may have a bigger consequences than just retaining it and continue continue paying for it.
00:41:35
Speaker
So I want to know because I'm not asking you about the IT landscape. And today we're kind of good at identifying the solution that we're paying for if they are in use or not. is you know like this is ah easy today.
00:41:50
Speaker
But I'm mostly focused on the data platform. Again, I'm coming back in to help you to, this is fascinating to me because um I see it a lot of, we talk a lot of to enterprise customers. We, as mass heads, deployed at larger organizations, I see more and more use cases when data platform team, well the manager says while they while they tend to catch this, um
00:42:22
Speaker
like in not intended bad or fraud behavior, they are good in this. But the second part goes to data and the governance as in this data products, because it could be in your case, machine learning model.
00:42:40
Speaker
AI use case, some Vertex thing is running in and creating the output. How do you guys measure the performance of these data products and data itself as in what is should be continued? What is not in use?
00:42:57
Speaker
Who is accessing it? Which data products have the IAID data? Which but data products do not have the IAID data? like i Because it sounds to me that you have something in place, but i would love to understand what do you how you guys handle it.
00:43:12
Speaker
That's how I'm saying I wish this was like a talk because then I can also give up show slides. I'll literally have a talk on this in 10 days at our Google DevFest in Sarajevo.
00:43:25
Speaker
But because I'm really describing that. So for example, I was mentioning how we have the DAP portal. And it's a very nice looking portal. But we did also when you're creating a project, ah you will literally, for example, there's a checkbox if your data has any personal identifiable data, right?
00:43:46
Speaker
if it What is the metadata classification? Because ah also before I was only mentioning, for example, ah the data plane, where all of the data is, right?
00:43:58
Speaker
But the DAP portal will also have ingestion, which is what you were talking about, ah how many percentage of the data is from ah private cloud to Google, right?
00:44:09
Speaker
This is a whole team that's doing ingestion from on-prem to ah to Google Cloud. So there's also DAP box. And then this data will then be from ingestion.
00:44:21
Speaker
It will be going to the data plane where it will be stored. But you will also have a self-service ah platform, right, which can be used ah to create these projects and also to grant read-write access to the data plane.
00:44:37
Speaker
ah Then you will also have ah what to store, what metadata to store to the data catalog, for example. And also for the data plane, for example, how to access whatever is in the data catalog.
00:44:52
Speaker
And then you also have the dashboards ah and how we can use the data that's presented in the data plane. So that's why I'm saying it's ah it's a lot easier if I showed you pictures, but there's a lot of different components between this. It's not just we're just dumping data.
00:45:11
Speaker
We also put it on a ah For every asset, there's a data owner. You can think of it, i don't know, data manager, whatever you want to think of it.
00:45:22
Speaker
But we also put it on ah on a tenants or users to also know what data they're working with, right? Because if you're not aware of what do you have in your hands, you can also yeah do stupid things with it, which is is a no-no. And we actually have... ah Yeah, data leakage systems, and we have numerous other places, so this doesn't happen.
00:45:45
Speaker
But like a how I like to compare it to, because we had one use case where ah they were saying that we prevent from users, for example, ah this was literally a discussion, to prevent users from ah putting in personal data, personal PII data, right? Right.
00:46:05
Speaker
So that we would recognize, hey this is PII data. You cannot upload it, right? But then ah it goes very deep. You're going to technically because then there will always be a use case where something is going to slip through the cracks.
00:46:18
Speaker
yeah And I literally described it. But now we're trying to like we build a car, right, for you. But now we're trying to determine we build, I don't know, an electric car for you.
00:46:29
Speaker
And we're determining if you're trying to put in gas, right? So this is where you as a tenant, when you're like filling up the tank, that you should know, hey, this is not what should go in here, right?
00:46:42
Speaker
So that you know, also so that you also have some responsibility. Because otherwise, people will just dump stuff on you and then you need to figure it out. And that was kind of what we were also starting with. But then we also self-corrected on that.
00:46:57
Speaker
So it's kind of a split responsibility. It shouldn't only be like... ah here I'm just dumping it over you figure it out that at least you can click there's still going to be a mechanism that's going to detect hey this is actually PII data but that you also do some due diligence and be like okay this has no API data so I think that's the proper ah way to handle this instead of just ah dropping it on um one side You know what, i'm so um I can tell that you are not too much on LinkedIn.
00:47:33
Speaker
ah You're not too much on all the influencers and book writers thing. You're working, let's put it this way. There'll be work at an engine building stuff, I can tell it.
00:47:44
Speaker
But this is so fun because you describe with your like in the simple words what ah people are. you know writing books about it like data contracts data customers our responsibility shared responsibility data stewards like I find it so fascinating the fact that you kind of
00:48:05
Speaker
build it yourself without, not build it yourself, but you guys have this architecture in mind without this hype, let's put it this way, around this, you know, concept.
00:48:20
Speaker
they you You just put it in and have that in place. This is so beautiful. Yeah, yeah. Please, no, I didn't build. No, no, you don't. You don't. I can understand it. understand. Yeah, yeah.
00:48:32
Speaker
But that's also, yeah, we're also, Yeah, I also take ah pride in this because we we're also like, for example, strict during the interview process. So when whenever we're assembling a scene or we have we're on a mission to do something, that we're really rigorous in this process and that we really get the people who can get the the word done so uh yeah we uh yeah so that's that's the easiest way but i that's that's why i'm also sharing this story with you and that's why i'm also trying to give talks because i think we did something really really good here and uh we techies don't really yeah we're not very promoting good or good at promoting things or explaining some of the concepts
00:49:18
Speaker
Because I said, even this podcast that we have here, it's pretty complicated because it branches it branches into, we could talk like three hours about this. ah It branches into numerous ah aspects.
00:49:32
Speaker
And I probably- answers well nuances as well. Like there are so many edge cases. Yeah, it's just about transferring data from on-prem to Google Cloud. There is also ah different regions. How do we make sure that they don't not get lost? How do we make sure that ah transfer happened without losing data or duplicating data?
Data Migration Challenges and Connectivity
00:49:54
Speaker
Yeah. And not only losing the data, like, how for example, ah we're also working on ah that the DR. So, for example, with all the weird things happening so that you can automatically switch, i don't know, from Netherlands to Germany.
00:50:07
Speaker
But it's not only losing the data, it's also like ah you have so many ingestions that are going on. How do you cut it off? Hey, like, we're switching the region. the BigQuery or whatever, it's being switched another region, like, stops. So there are so many mechanisms and so many teams. So it's it's very ah it's very complex. And it's not ah so, um yeah.
00:50:31
Speaker
It's not a big topic. Absolutely. That's what it makes it all so interesting. because And there's also a lot of ah topics so that are literally, like, ah you you don't have the answer. You really need to dig deep.
00:50:45
Speaker
And as you said, then people end up writing books about it because ah there's a numerous topics. It's like, yeah, but if you go deep, you could ah yeah you could ghost yeah you could go places with it.
00:50:57
Speaker
i have now we're we're really you know i appreciate all the time you spend with me this morning. i have the last question. What is the challenge you can share with us, with the audience right now you guys are having related to data and AI? Like what is on top of your mind?
00:51:15
Speaker
the biggest challenge the biggest challenge you have today? ah The biggest challenge is how to ah get all of the necessary data.
00:51:27
Speaker
That's kind of the biggest ah challenge for me ah because Yeah, how many users, like in hands of you tenants, users, customers, or how to get the data into the cloud?
00:51:39
Speaker
How to get to the data to the cloud, but also keeping it, ah for example, so users cannot download something that they shouldn't download from cloud. I don't know, their ah their notebook or something like that.
00:51:52
Speaker
But ah getting a hold of all of this data, even some of the Google services, for example, are not ah like ah the arc they don't have native DR capabilities.
00:52:06
Speaker
So you have to work around it. ah But just getting all of this data in and keeping control Because it it goes, it's very ah complicated. For example, even it reaches so far. For example, even if you had, a I don't know, an incident, right?
00:52:24
Speaker
And today, AWS is having an app meltdown. Let's say tomorrow is ah Google, and they support, that Google support needs to help you with some of it, right?
00:52:36
Speaker
How do you make sure that they can help you debug your application or whatever without revealing too much and data because then they can see whatever you can see right and yeah it's not very it's not very ah clear cut and or that you as I was also mentioning there are data owners or that it's 2 a.m and you have an issue you need to get a hold of data owner if they give their yes They can Google can look at this data.
00:53:08
Speaker
So it's ah that's what I think, because it's not just ah how do you feed the data? It's also how to. you Yeah. How do you keep that data? How do you make that data accessible when needed?
00:53:20
Speaker
Which parts can be accessible? Who is able to see what? So ah that's ah that's an interesting one for. for me in particular. And it's ah it's also with ah just as in general, like ah the AI journeys and things, ah because we're talking about this, you know how to do it.
00:53:42
Speaker
But in general, a lot of people don't know ah most of the things we're discussing. So that's another thing. How do you present these things in a very nice way so people can work with it?
00:53:54
Speaker
Because now it's just, yeah, it's not magic. That's what I like to say. yeah AI is not magic because without data, it's it's nothing. so then But how do you put those two together as kind of a... How do you put it into business?
00:54:10
Speaker
This is a case. We can collect all the data, but if we cannot generate value for business, it's many less. Exactly. like but Okay, Edna, that was a very good morning. I'm glad we finally made it. and Thank you so much for fitting me into your tight schedule.
00:54:27
Speaker
ah Last but what not the least, where people can find you? What would be the best channel? Oh, they can find me ah on a lot of places. i am on LinkedIn. um I am on YouTube. I am on Twitter. I am. ah So i'm I'm not sure how to ah share the links.
00:54:48
Speaker
I'm going to share it the links. OK, then i'll I'll give you the links. And that's the best, the best way ah to find me. And if you want to get in touch regarding some of the things I'm working on and yeah, feel free to reach out and ask me anything.
00:55:03
Speaker
Beautiful. Edna, I'm um um very excited. Thank you so much for making the time. Super insightful. And i um that's that's great.
00:55:14
Speaker
butre like You're doing a great job. Thank you. Thank you. Thank you for having me.