Oops!Zencastr was unable to start because Javascript is disabled
To fix this problem, check your browser's settings and enable Javascript

Become a Creator today!Start creating today - Share your story with the world!

00:00:00

00:00:01

Episode 7: APIs, AI and Application Reliability - A Deep Dive with Vinayak Hegde

Observability Talk

63 Plays7 months ago

In this episode of Observability Talk, we uncover the critical challenges that startups face in building AI-enabled applications. Join us as Vinayak Hegde, a tech leader and startup advisor, shares expert insights on navigating the unique hurdles in the AI space—from wrangling data quality and compliance to balancing scalability and infrastructure costs.

Vinayak dives into the often-overlooked role of observability in AI, explaining how real-time monitoring and design responsiveness are crucial to keeping customer trust and delivering reliable experiences. Whether you’re in the early stages of your AI journey or scaling an existing solution, tune in for practical advice on building resilient, scalable, and compliant AI applications in a fast-moving tech landscape.

Also Check out:

Recommended

Episode 15: Rethinking Transaction Banking in a Digital-First Era image

Episode 15: Rethinking Transaction Banking in a Digital-First Era

Observability Talk

00:59:14·2 months ago

Episode 14: Leadership Strategies for Digital Transformation: Navigating AI, Observability and Automation image

Episode 14: Leadership Strategies for Digital Transformation: Navigating AI, Observability and Automation

Observability Talk

00:37:07·2 months ago

Episode 13 : Enhancing System Resilience and Reliability through SRE image

Episode 13 : Enhancing System Resilience and Reliability through SRE

Observability Talk

01:24:53·3 months ago

Episode 12 : Observability is a Mindset- Unlocking True Visibility Beyond Tools image

Episode 12 : Observability is a Mindset- Unlocking True Visibility Beyond Tools

Observability Talk

01:03:01·4 months ago

Episode 11 : Building Highly Reliable, Scalable and Observable Fintech Applications image

Episode 11 : Building Highly Reliable, Scalable and Observable Fintech Applications

Observability Talk

00:39:10·4 months ago

Episode 10 : OpenTelemetry to eBPF: Understanding the Changing Landscape of Observability image

Episode 10 : OpenTelemetry to eBPF: Understanding the Changing Landscape of Observability

Observability Talk

01:26:53·5 months ago

Episode 9: Network Automation & Observability - Critical for Digital Transformation Success image

Episode 9: Network Automation & Observability - Critical for Digital Transformation Success

Observability Talk

00:58:17·6 months ago

Episode 8 : Redefining Observability and Trust in a Digital First World image

Episode 8 : Redefining Observability and Trust in a Digital First World

Observability Talk

00:55:19·7 months ago

Episode 6: Crafting Resilient Products - A Closer Look with Sarika Atri image

Episode 6: Crafting Resilient Products - A Closer Look with Sarika Atri

Observability Talk

00:35:58·9 months ago

Episode 5: The Changing Face of Cybersecurity in India with Vinayak Godse image

Episode 5: The Changing Face of Cybersecurity in India with Vinayak Godse

Observability Talk

00:42:40·10 months ago

Episode 4: The Impact of Digital Transformation on Banking and Payments in the Middle East image

Episode 4: The Impact of Digital Transformation on Banking and Payments in the Middle East

S1 E4 · Observability Talk

00:28:35·11 months ago

Episode 3: Building Resilient Systems: SRE Best Practices and Insights image

Episode 3: Building Resilient Systems: SRE Best Practices and Insights

Observability Talk

00:55:18·1 year ago

Episode 2: Observability Trends and Market Dynamics with Abid Neemuchwala image

Episode 2: Observability Trends and Market Dynamics with Abid Neemuchwala

Observability Talk

00:20:33·1 year ago

Episode 1: Understanding GenAI and its Impact with Dr. Balaji Srinivasan image

Episode 1: Understanding GenAI and its Impact with Dr. Balaji Srinivasan

S1 E1 · Observability Talk

00:31:13·1 year ago

Transcript

Guest Introduction: Virayak Hegade

00:00:12

Speaker

Hi, welcome to a new episode of Observability Talk. Through this podcast, we offer aim to look at observability from different perspectives. Today, we are glad to introduce you to Virayak Hegade. Virayak is a seasoned technology leader with over 20 years of experience in product development and software

Virayak's Career Journey

00:00:31

Speaker

architectures. Virayak has led team across continents, driving innovation in multiple verticals like ad tech, developer tools, and CDNs. Currently, he serves as the CTO and residence at Microsoft for Startups, and his previous role includes technology leadership positions in organizations like Zookgar, Inmobi, and Akabai Technologies. Hi Vinayak, a warm welcome to you to Observability Talk podcast. Thank you so much for joining us today. ah Thanks, thanks Bharat for having me on this podcast. Looking forward to this conversation and having some good pointers for startups to build on. Yeah. Yeah. Virag, you have been a CTO in your prior role. And more recently, you have been working closely with a lot of startups, CTOs, and enterprises now. And most of these startups are building AI-enabled applications.

AI Startup Challenges

00:01:23

Speaker

What are the top three critical challenges

00:01:25

Speaker

ah you you believe these startup ctos today face um thatra

00:01:35

Speaker

the biggest problem for building any end of AI model is the quality of data and availability of data. So finding that data, cleaning that data, making it ready for training and also like validation, I think that is like a big challenge, right? Because data can be incomplete, it can be noisy, it can be biased. How do you take care of all of that further? You can also have challenges with compliance or data privacy loss, right? So all of these has challenges. I think the second big factor, I would say, is scalability and infrastructure costs, right? So building and deploying AI model requires robust GPU infrastructure. You know, startup needs to balance obviously performance, but also speed and cost. So both of these kind of matter, right? So for example, things like checkpointing, you know, planning the training runs, that becomes important. Also, I think, you know, managing the costs in terms of, because compute is like, compute infrastructure required is quite large for AI startups, I think. So that becomes a challenge. So, I mean, and I think I would say the other challenge is, you know, like once a model is built, you know, evaluating the responses. This is where I think your data set also kind of becomes important. Do you have like evaluation data set that can, that you can actually, that you can actually figure out, you know, it is working for your particular use case, for your particular domain? There might be some human in the loop for that, right? Because we're getting to a point where, you know, AI itself can evaluate in another AI models. Like we have, maybe we'll talk about agentic software and autogen generator later, but that those are approaches that are there but I think evaluation is an important thing and one maybe one extra thing I would probably talk about is also you know there's a shortage of I would say talent and expertise you know finding the right kind of people because AI and LLMs is a relatively new field like AI has been around for a while and different models are being built. Traditional AI has been around for a while, but LLMs are very new. People are still figuring out a lot of things. There's also a lot of tribal knowledge. So how do you get the right kind of talent? And once you have that talent, that's not enough. How do you upskill people? How do they learn? How does your organization learn?

Tech Stack Recommendations for Startups

00:04:04

Speaker

Especially many startups I see, they don't just build one model. They build multiple models that look at parts of the workflow, multiple parts of the problem. So how do those best practices are disseminated within the group and also people learning from both inside and outside the organization? I think that is like, I would say, you know, the top three or top four challenges, I would say. Very interesting. We see, right, a lot of enterprises and startups are building these applications. At these applications, the architecture has been ever evolving, right? We started with something like two-tier, three-tier architecture to SOAP, to microservices, to now serverless.. And this citizen-scale applications you talked about, scalability is one of the top critical challenge which CTOs look for. For citizen-scale applications, which everybody is trying to reach to, what are the key considerations from a technology stack perspective? Because you have been advising a lot of startups, right? And you have reviewed their technology stack, which they are building today. What are your recommendations or your considerations from a technology stack perspective? Yeah, I think I would say there are multiple different challenges. I think one is obviously scalability, right? It's basically because you said citizen scale challenge, right probably you're looking at very large number of requests, right? So how do you respond to those requests? So you have to look at uptime, latency, but also scalability because often in such kind of applications, you'll find that you should respond to requests in a very fast, very low latency, but also the batching. I wouldn't say the batching, but the periodicity of requests is very spiky. So you'll have during some time, maybe during office hours or post office hours, depending on what the app does, how it fits into the, I I would say in the life of the person who is using that app you can have different kind of interactions over a period of time also it also depends on you know who are the people you are servicing and where they are based so if you have an app that works around the world maybe some of that will get smooth but if your users are largely say in in India India or or in the US, during their waking times, I think we'll see spikes. So that also kind of, that curve of, you know, responsiveness kind of matters. Scalability is important. Also, I would say data management, especially for EIFs, becomes important, right? So I think maybe we're just talking before the podcast started that I think what matters is how do you build applications because AI applications are built very differently. Data management and real-time processing becomes very important. You might have distributed databases, you might have streaming data, you might have different kinds of of data data pipelines. And wherever you have data, I think the next point I think I would say is, I would say security and compliance is never far behind. You can't think of security as something of as a bolder. You need to think about it from day one. And the later you push it, I think the harder it becomes to integrate. And then probably what will happen, I think, the later in the cycle that you try to fix, I think you'll spend a lot more time kind of reworking maybe some of the bad architectural decisions, which is probably the worst thing. Re-architecting, which takes a lot of time and effort, but also like, you know, simple things like maybe data, encryption and data address, encryption in transit, any kind of regulatory compliances like GDPR, HIPAA, especially if you're servicing like say the European market or HIPAA, for example, if you're in the healthcare sector, I think high-risk sectors, I think it becomes very important. It's super important in the B2B space. And last but not least, I think I would say, you know, performance and user experience is also important. I think, you know, whether you're using like a CDN for caching or, you know, if it is not a frequently used, maybe endpoint, maybe you are using maybe like a serverless. So how does that kind of bootstrap and respond? That becomes very important. So efficient API design also becomes important if you're, so again, the integration point also matters. For example, if your app is providing a service, let's say you are like an insurance provider and maybe you want to supply a port and you're doing that for lots of users. So then API and API gateways also is like super important. How do you manage traffic? How do you manage authentication? How do you do rate limiting? How do you streamline API interactions? How do you version them? Like when you deploy, I think all of these become very important when using microservices. So yes, I mean, having microservices kind of helps. Containeration also helps. But all of these different factors, whether it is security and compliance, performance and user experience, scalability and microservices architecture, data management, all of these are important factors to look at. Very true. Thank you so much for that insight in terms of what all people will have to consider when they are trying to build some AI based applications. But another way, right, that ChatGPT and other AI applications, right, which are available, like similar to ChatGPT, has made it quite easy for people to develop applications. So everybody is trying to deliver their application and product faster and maybe cheaper. That is possible, right? But in this hurry hurry of of doing doing things or try to go to market a bit faster, you would be talking to a lot of companies. What sort of important aspects you have seen some of these enterprises or startups that have missed when they are trying to deliver their products and applications? Is there anything specific which has come out in your discussions with most of these companies?

Automation's Role in Startups

00:10:09

Speaker

Yeah, I think at top of mind, I would say I think automation. Automation is like super important because I think once you do automation, it's a one-time cost, but it kind of avoids reputation. And it becomes, if you build that skill both individually in your developers as well as organizationally, I think it adds to your speed. And I think speed is like, as you also know, Bharat, I think speed is like very, very important for startups, right? You have to be good to market, right? But how do you balance it with, especially when you're building like AI application, I think responsibly, yeah. With security, you know, with observability, I think these three factors are equally important, right? Because if you push out a product and, you know, I think initially it's probably okay you fail a couple of times, but I think you should fail on the right axis, right? I think you probably want to, it's probably okay if you fail in terms of the feature not being like complete or, you know, not really 100% value, but some part of the value. But if it is like only intermittently available, let's say it's not reliable, right? Maybe the sliver goes down or, I mean, it need not even go down. It can be partial failure. For example, it can be that it's not very responsive. Whenever there's a spike, I think the response time goes up. And then there are time modes. That could be one example. Or it's not giving the right responses in some cases. Which can be a case in, say, AI. Or it's giving very aggressive or wrong kind of responses. So correctness and accuracy is super important. And finally, I think observability also is very important because that acts as a safety net, right? Like, so you can do everything right. But if you don't know if your application is failing and the customer has to tell you, and this is a very common, there's a very common challenge that you have with startups, right? I keep telling them that you have to figure out if something is failing before your customers. Find out because customer is not going to come and tell you. You're going to see and then your business is going to come back and tell product links saying that hey why are customers not sticking or they will raise tickets. So again like so not only have is it a delay response and you lost the trust of the customer, but also spending more time figuring out what went wrong. And also, especially in the case of B2B application, there is a deficiency in service. You might have to compensate the customer as well. So there's loss of repetition, there's compensation for customer. Because you'll have a cadence also in delivering. So now that has to be disrupted. So it's very disruptive, right? So the safety net is there in terms of observability to kind of be for you to be able to deliver faster, right? So I talked about automation and speed and I think observability is part of that. And again, similarly, privacy and responsible AI, making sure your application is fair, it's accountable, it's transparent, especially in the high-risk industry. Let's say if you're building AI applications in healthcare, in fintech, which should be finance finance-related related stuff, stuff, investing markets, capital markets or any other kinds of markets or, you know, any kind of financial application or even like insurance. So, I might be okay if, you know, if you're building like a customer service bot and maybe it gives me a wrong mess, you know, it's not going to be cause a huge harm, but if it gives a wrong, maybe say, cancer diagnosis or, you know, wrong brain scan or if you wanted to invest money and the algorithm is like putting in some random small cap because, you know, it kind of showed some characteristics and you're not well tested. I think all of those will lead to problems. And I will not trust you with my money. I will trust you with my uh um business if that that is the case so that is like uh super important like having uh having those basics right uh i know there is also a tendency to move fast and break things but uh again like again these are things that you need to kind of um think through before actually build your product. And having those safeguards actually gives that speed. So it might feel that in the short term, I think you're doing a lot more work. But it becomes an organizational muscle. And if you build it as an organizational muscle, then I think it will make you... You can go 100 miles per hour and not like at 60 miles per hour or like 100 you won't kind of vacillate between 100 and 20 like like Bangalore

Metrics in AI Applications

00:14:52

Speaker

traffic we keep on talking about right you want to go on the highway but then suddenly if you have a like a speed bump or you know traffic jam then you're slowing down going first slowing down going first you don't want like a smooth, consistent kind of highway experience, right? You want like the German auto one and not like, you know, some of the inner roads that we have in Bangalore where things can come out of the blue. This is very, very interesting. So I think in our discussion with a lot of our customers, I think these three, four things which you talked about, right? One being automation, another being observability, third being security, privacy, responsible AI. Some of these things have come up in multiple discussions in different ways and forms, right? Second thing which you touched upon, which has become very, very critical for a lot of our customers is the customer experience bit. And you very rightly said that, right? That customer is not going to complain. A lot of us, right? That they don't complain. They just start using some other services, right? And that's something which is a direct hit to your business, revenue, brand, and so on, right? So, somewhere this becomes very critical for these enterprises to make sure that they have all these four, five things which you talked about taken care of to some extent. Extending this, you already talked a little bit about how do you see people doing observability for AI-based applications? Do you want to go a little deeper in terms of what you have seen or recommended to some of your startups, which you have worked closely with? And what sort of outcomes you have seen? Yeah, so I would say, I think when building AI applications, AI applications are fundamentally, I would say, different from building deterministic applications, right? I think the output is probabilistic. And hence, it's called generative. When you generate, I think you get something different. And that is both a feature and I won't say a bug, but it is one of the characteristics of the system. So how do you kind of, why is it fundamentally different? Because when you typically deploy a normal web app or any kind of app, you have the port, you have versioned the port, and then you generate artifacts, you know, Java files or binaries, and then you deploy it. Or in the case of interpreted, like Python or Ruby, just deploy and then maybe restart or it picks it up depending on how your end system is. But with AI systems, it's not like that.

00:17:30

Speaker

ah you have quote you have data um and code has to be words and it has has to be voice and and um there are various reasons why a new model ah can be deployed You can probably learn more about your domain. Maybe you learn more about the business rules. You will probably, ah you know, maybe some of your ah ah data science or AIMN scientists are ah looking at maybe newer architectures and they want to experiment with that. It could be that you have got better quality data. We talked about data in the beginning and you got better quality data. nights we at it So you figured out a way to get higher quality data. And we know very clearly that higher quality data leads to better models and better outcome sets. So all of these things um um kind of ah matter in you know getting the ah you know getting the right oh mix when it comes to EI rate. So it is a very iterative process. So um the other way from from the production perspective, this is like the development perspective. From the production perspective, you obviously are looking at scalability, reliability, ah observability that we talked about, like, you know, the metrics like uptime, latency, you know, how many instances, requests per second, all of these, which are like the traditional metrics. But you also need to look at other metrics, for example, you know, how is the accuracy of responses? ah ah ah What is the, you know, are there any kind of privacy concerns? How do you, maybe, for evaluation, we talked about evaluation being a problem. How do you evaluate it? Because you have, so when you're building a product, we have a conception of how the customer will interact with the product. But that may not be actually how the customer interacts with the products. How do you incorporate that feedback? Right? um And again, as I said, like you know customer interested in the product does not get a good response back, churn happens. But can you build it into the design that the customer can give you feedback? And how do you incorporate that traces and evaluation back into the workflow? Which is kind of fundamentally reference it so from a oneway ticket it has become a constant conversation that iteration actually ah ah is ah is very different and observability is fundamentally different. So just to recap, I think ah not only you do you have to look at the operational metrics, but you have to look at the AI related metrics like maybe Siri under the curve, accuracy of responses, any kind of privacy concerns and generate the traces. So I think that is i would say you know ah something different and last but not the least I think um this is something that probably we're talking a little bit earlier right like um when it comes to tools like how do you build this automation do you need to build all of this automation yourself the answer is no right you can probably buy tools and one of the challenges that I've seen and again, again I have also probably made this mistake in my career as well, i not having been a CTO, that I have probably not spent enough on tools. So I think um if you spend on good tools, ah you will get your money's worth many times over. It's not always necessary to hire people to do a certain thing because a tool is like more repeatable ah more predictable ah so if you find that ah the tool using a tool any kind of observability tool um gives you um ah you know gives you an edge ah then probably I think it is worth kind of spending into it ah because then I said like like, we we have have this this philosophy, I think you already know called left shift, right? You want to move testing as early as possible, security as early as possible. Even observability also should kind of shift left and, you know, um building observability into a system should be like a first class citizen and not an afterthought. Because if iteration, iteration is going to be a differentiator, is going to be ah ah your ally in terms of speed, how do you create like a safety in net and actually observability helps you create that safety net. Especially so when it comes to EIR, because ah the responses are probabilistic. The field itself is new. The customers are learning how to interact with the product. You are trying to figure out ah how to build a product. So the number of signals that you need to get, ah ah obviously you'll get him as explicitly signals by doing focus groups and all of these other things, but you need to get some of these ah signals very implicitly. And I think observability, ah whether it's operational or yeah actually both help in that

00:22:05

Speaker

ah well better be better this is very drastic ah couple of think which you touched up po right as really ah card with be what big ah that they should start the kick as duability as as so as they start developing their application should be first class de up that reliability or scalability should AI, actually both help in be really thought through when you are designing your system or when you are starting developing your product. Just very curious because you have been talking to a lot of CTOs around, right? How many do you think really have started thinking this way? Where they have started designing or

Tech Ecosystem Maturation

00:22:52

Speaker

what is that obviously the infrastructure they will deploy their application on as well as their application itself, right? How many CTOs and how many these enterprises do you think have started thinking in this way that they need to do this from day zero? Yeah, I think, so problem showing my age here, but I've been like doing this for more than 20 years, right? So I would say two trends before we're getting to the numbers are two trends that have kind of accelerated this. One is obviously, I think our maturity as an ecosystem has gotten better, right? India was always probably good at delivering some e-capitation. But as we have built products and we understand the user experience, and when I say user experience, I think the responsiveness is also an important part of the user experience, right? So as we have become better, I think there is an understanding as compared to say five ten years back that you know this is something that needs to be invested on so what used to be like a differentiator earlier has become like a piece line right so people are definitely uh thinking uh about uh using uh better tools um also i feel that um there is an understanding in terms of uh speed being important and they see that this is giving an edge. So earlier, for example, selling... I mean, it was very difficult to convince people that spending on developer tools because they will say, hey, there is something that is open source, which is like, or not necessarily open source, that is free, right? And we'll use that. But paying a little bit more incremental money to give that maybe like 2x, 3x kind of boost, actually people are much more ready to do that now. And hence you'll see actually one in, this is reflected in the ecosystem, right? There are many startups now that are building tools for the local market and also for the world and that is this is also I think a result of osmosis because I think many of the founders have also worked at large companies and as said in the past I think automation like is a key differentiator in large companies they have to automate you know know, to to survive, to get that efficiency and productivity. And they see that the same lessons can be applied in startups as well. So I would say, again, it depends on the maturity of the ecosystem. So if you ask me in Delhi versus Bangalore versus Bombay, I think there are differences, right? So Bangalore probably I feel that because it has been through many cycles and I think the propensity to use is probably close to like 70 to 80%. Whereas I think it's slightly lesser in other places. And the other trend also driving, right? We talked about telemetry and doing that from the get-go, right? There are like projects like open telemetry. So what used to tick a lot off, like, I mean, you are old enough to know that, you know, we have, for example, use NatGeo's and all of this, it was a pain to configure. But two things, right? I think nowadays, a lot of the configurations that need to be pushed can be auto-generated from port thanks to AI.. That is one part. And also I think infrastructure as a port has been also something that has become very easy. So pushing that data point either into the log or into an API endpoint has become really easy. So I think the tooling has become better. The understanding of using that tooling has been better. I think the business understanding and the maturity of the ecosystem in terms of seeing why and when it can give an edge, that has improved. So I think these two trends have converged in actually making, I think one is like the infrastructure being easy to set up and monitoring being easy to do that is one huge trend and the other one is the maturity of the ecosystem why this is important so just the infrastructure being easy does not mean people will use it so that understand why it has to be used and to use to get an edge and I feel that again more and more as I said like it's becoming a baseline so again like depending on the maturity of the cost and the team that I think that is becoming more and more prevalent which is a good thing to see right right I mean I'm so happy with the number which you mentioned 70% is very very good because I mean a couple of years back almost anybody and everybody was just creating something and just throwing it out right for customer to try it. But if people are starting really thinking about that, okay, if I give it to my customers or if my customers started using this, how do I know that they are really getting the best user experience or how is my application performance? Is that if they have started thinking from day zero, that gives me a lot of happiness from that perspective. In that response, you also mentioned something about logs, which at least at VueNet, we are very, very interested in, right? Because most of the time, people use this as ultimate source of truth, but at the same time, do not use it to its full potential, right? So I wanted to know from you, you have that reliability, you have that reliability for your applications in your earlier avatar, right? And you are now working with a lot of startups and enterprises. How much do you see, first of all, you value or look at logs and then how much do you recommend your people you have worked with on making sure that they should have good logs and then use it to its full potential?

Structured Logging Significance

00:28:43

Speaker

Yeah, so I think logging is important, right? And again, it's a well-known pattern that there are seven different types of logging. And depending on how much fidelity you want, logging can turn them on and off. And again, this is where I think the tooling comes important. Can you turn it on and off? Because I think if you have very verbose logs, then finding stuff becomes easy. Computational, it becomes more painful to kind of manage from a data management point of view. So what is the right balance to have in terms of logging? I think that becomes important. It was also important to know both from a metrics point of view about how well your application is performing and also in the case of AI tools, you might also say when I'm initially building an AI product, I want to see how the customer is interacting. So you can notify the customer, say, hey, we will not use your data, but we want to know how people are interacting with the product. So that logging will also not help in metrics, but also use in quality evaluation. Because then that, if for example, if you find certain kinds of queries, let's say you have built like hospital health bot. Just a random example. Certain kinds of queries are dominating in the dataset in which people are interacting with the bot. And maybe in some cases, there are a lot of interactions. Both of these are signs that if there's a lot of interaction, the customer is not getting the data that they want very easily. So can you shorten that? And also if certain modes of interactions are common, then can you tune the model and the responses to give that information faster? So this has a direct impact on the browser. So logs are obviously used for how fast the response was given, what were the different systems that it touched, how did it flow through a system. So the tracing part is very important from an operational perspective, from a latency, uptime, reliability point of view. You want to know what or example, right? And it will happen, right? So you want to keep it below a certain kind of threshold. So you will probably look at the percentiles, right? You will look at 90th, 95th, you know, 90th kind of percentiles, which is very, which is now fairly, like, traditional, right? If you had this conversation 10 years back, I think it would have been very different. But this is now kind of well understood. But also the quality aspect of it, like, you know, how the response is, what were the interactions from customer side and what were the interactions from your side? That becomes a good data set, right? So this is the iterative nature I was talking about, right? So your interaction doesn't just stop from the customer point of view once the product is released, but you're looking at how the product is behaving with the customer and how the customer is also interacting. Like I said, like, you know, if you have a lot of interactions for a simple query, then probably want to probably collapse that. But if you are getting the answers quickly, I think that is a good sign. And if you are getting good answers, good sign. So maybe, I think one maybe thing that we can maybe touch on, you know, since we are talking about observability, but another aspect that I often get asked why, you know, now that everybody has access to this model and hopefully I think access to the compute also will become very slightly easier and it's become easier you know over the last year when there was huge clamor for GPUs and the Vanotto will have gotten easier. How will we differentiate? I think the way AI startups will differentiate I mean other than what you already talked about the thing that we have not talked about is like say product design. We have to differentiate in design and traceabilityability is an important part. Because let's say I am evaluating two different models and it has passed my evaluation. I feel that the newer model is good. Now I want to kind of test that in the real world. So I want to do an A-B test. So I will probably show you two responses. For certain amount of responses, not every response, and I will tell you which one one is is better thumbs up, thumbs up. Now that is like a great data set for you because you are seeing how many thumbs up come for the newer model as how many thumbs up come for the older model and then you can evaluate. You can also explicitly ask the customer did we answer your query and And then then basically basically give a rating. So you have to think of how do you use product design to give those signals. And once you have built a product to give those signals, because of the iterative nature, how do you use maybe logging traceability to actually feed that back into the product? So logging is one way of doing this effectively. So other than the normal part which we talk about in terms of the operational capability. Yeah, I mean, so the example which you gave, right, that basically hits the nail, right? The kind of information you can get from logs which could be a business cortex, which could be more information about the transaction, which could be more information about how the request or transaction has flowed through your infrastructure, right? It's something which you cannot get anywhere else, right? And if you are able to process that information and use it for whatever purposes, it could be for observability, could be for security, could be for like you talked about, right? The feedback or iterations on top of what you are basically trying to do for your customer. That is something which we have found very, very interesting, right? One of the other things also what we have also realized is that 90% of the time, a lot of developers write logs in a very non-structured format. Now, if they can write these logs in much more structured, maybe JSOR, KV, CSV, XM, some format which can be processed quickly, then the amount of information somebody gets out of these logs itself would be very, very useful for the enterprise as well as for their customers, which can be used for various purposes like what we discussed. Now, because we are talking about logs, there is another thing which basically has become quite synonymous with any enterprise today is API. Now, you obviously would be talking to a lot of different startups or CEOs, CTOs, and enterprises who have started creating APIs or exposing it to their vendors or partners for whatever work they basically provide, whatever product or services they expose, right?

00:35:45

Speaker

If today somebody comes to you, I'd say that, boss, we have built a couple of APIs which do A, B, C, D. What are the questions which comes into your bite as soon as they talk about that they've exposed APIs today? Yeah, no, I think it's a very important question and it's also a big point of being, right? So, earlier being in the CTOC, Tata, product startup, we did a lot of integrations with a lot of different types of customers, people who provided maybe insurance, people who provided payment services and a whole bunch of things. And these are just a couple of examples. And what I found that is many times, especially if the team on the other side was new, they wouldn't implement DPI as well. What do I mean by that is they would not follow basic principles of HTTP, how HTTP methods should be used, when should post be used, when should get be used. Then there are, you know, all of these observability concerns. But also, you know, the most important point is that how do you send a response, right? For example, if there is a timeout, like let's say your database timeout, your database is under heavy stress, you know, lots of people are calling you. And the database connection or whatever does not respond in time, so you're sending a response back. Do you send a blank response? Do you send a 200 OK? Do you send like, you know, try again? So there are multiple HTTP codes. And I found that often people do not understand the REST framework enough. So they would probably send us like a 200 OK. And for me, 200 OK, HTTP 200, I'm again getting a little more technical here but if i get a 200 okay i mean that's everything is working fine and i can process the response but of course the response is blank or in the worst case and this i've seen far too often and i get nightmares also about this people will send 200 okay and then they will send me the error response and that is not acceptable right because i need to uh if if there is maybe some danger or something doesn't work out like if you have error on my part or error on me but you need to signal that contract has to be maintained between the caller and the colleague and if that is not followed then i don't know as an application how to respond uh so there is a lot of error handling code that had to be developed many times i actually did a lot of trainings for vendors who are building this saying that hey this is not how you should respond uh because you're telling me everything is okay but you're telling sending me the error code which tells me it is not okay right so your your api API is acting very schizophrenic. I mean, to use that term, right? So, it does. So, you have to tell me, like, you know, how to integrate it. Like, so that, I think, becomes important. The other, I think, aspect is, like, API security, right? It should not leak any data. So, people have seen many times, especially startups, you know, they do not sanitize the data or do not check if my authorization level has the capability to access that data. So if, for example, let's say if I want to get metadata about, say, a policy or a user and it is like maybe more, typically it might be monotonically increasing pattern. I can write an iterator and pull that information. If I get an auth token from somewhere and then basically I don't even need access to a database. I have all of that information and now I'm going to kind of use that information to scam whoever. And again, this is like very, so you may not have basic security for authentication and tokens might be stolen. That is one. So things like CSRF should be implemented. Other things is that if something is doing a lot of requests, it should probably flag. Again, which is where I think observability kind of comes in. If you see kind of anomaly, there should be flagged and there should be like a proper major system. Also find, I won't name the company because it's very sensitive, but like, you know, a very well-known company in the fintech space we had integrated with did not have an incident management response. So they made some change on their side and the product stopped working, started throwing errors. And we had to actually tell them that, hey, this is a problem with your system. And this is what has typically happened. And we literally, as a customer, it is not my job to debug your system. But in this case, not only we had to tell the vendor that your system is broken, we had to tell them the system is broken and we have debugged it. Now we need to find out who on your site can go and fix the code so we can use your service. And I was told, okay, you can just WhatsApp this number. I said, hey, this is not an acceptable response because the nature of the product was such that we are losing money every minute. So again, these kind of challenges actually are there when it comes to API. So it's not just you're providing an API, you're providing a service and it has to back with some kind of legal contract. So what we started doing from a legal point of view is we said, if you fall below this level, we'll maybe, we'll pay you like in terms of responsiveness if for this number of requests in this kind of environment we put clauses both in terms of volume in terms of responsiveness and if that service is not available to us because we have a business loss we are going to kind of cut your payment as well and if you if some if the service is efficient and don't get the response to an xyz then also we'll have. So we had to build that into the contract to legally enforce it. So again, people need to understand it's not just an API that you're providing. It's a contract that is written in software and both the callee and caller should both kind of handle it. And the API is a way of doing that. As you get more and more software, this understanding becomes more important. So think about security, think about how you build it, think about responsiveness, but also think about if not as something that you're just kind of responding as an engineer or as a service, but you're actually enforcing a contract between two different organizations. And that contract is enforced through an API. So this is very important. See, one of the things what we have started seeing more recently now is that almost every customer of ours have some of the other API getting caught.

API Design & Monitoring

00:42:15

Speaker

And a lot of times their transactions are taking more time or really getting paid purely because everything is working on their side of things. But when they call the third party APIs, those third party APIs are not able to take care of their part of the company. Then things start failing. So we have started monitoring all these APIs as well as part of the application observability. We also bring it this API monitoring as part of that. That's super important. Because again, there might be... So again, this also depends on how you build your systems. Will your system respond properly if you don't get a response from a third party? Let's say, it depends it depends on how your application and business logic is written. Let's say you're writing a recommendation system, like an e-commerce company. And based on what I'm browsing, you're giving me a recommendation feed. Now, initially, you want to give real-time feed based on what I've browsed in the last 10-15 minutes or even lesser lesser. But can I, let's see if the service stops responding, can I give you a default which may not be as great. So you don't fail spectacularly but you kind of have a fallback at every level. So if it's in a critical path, it becomes very important that the contract itself is enforced via legal means in addition to kind of having the right observability other than because your organization your critical your critical uh metrics such as revenue margin um uh you know churn all of this depend if they are if they are in the critical path uh um if if the third party is in the critical path of that I think then I think either I have to get it in house which I did in couple of cases as a CTO because I said hey we cannot have this particular aspect field so maybe it might make sense at this certain scale to build it myself or I need to have a legal contract where I can rely on another organization to give me and respond to me, give me high quality responses and give me a certain time bound response. A good example is any kind of marketplace like if you are an Uber, Lyft, if you are Google or Facebook, you have this real time bidding. So if I go to a webpage, what it does is in the background, it will ask me for an ad. It will send a bunch of information. If I'm not responding as an ad network or a participant in that, I'm going to lose out. You know, I'm not going to make any money. And if to get that, you know, if I get past certain attributes about the user, but either my own database or a third-party vendor that I'm getting to do the targeting is not responding. Worst case, I don't respond, in which case, you know, I have nothing. If I give a suboptimal output, then also, you know, maybe the interaction now, the user may not interact with that, and then also I lose out. So the quality and quantity both should kind of match. So I'm just giving example to make it like real. I hear you. I think this is from our perspective also like what you talked about about HTTP status code people not understanding the difference between it. What did okay with error code and find it as something coming in right? We have seen it multiple times in our when we are trying to implement observability for some of these applications. And then we go back and say that these two are not matching because of the way the responses are coming in. Then they go back and re-look at their APIs and then they fix it. That also has applied. Yeah, I think debugging, and I want to emphasize that point for our listeners is that debugging also becomes that much harder, right? Because if you have a tool that filters, typically you will not look at the raw logs as it is. You'll say, okay, you know, show me the errors. But if you're getting like a 200 and you filter that out, you don't even know that is kind of going wrong. But the other side is actually complaining, saying that, you know, I'm not getting a response. I'm like, no, everything looks fine on my side. Right? Because you didn't respond to me with the right, because you didn't follow the contract. Right? So as a observability person also, it makes my life hard because I cannot even debug because my system is telling me there's nothing wrong because that is what you, so you send me the wrong signal. Yeah. So I think, yeah. So then that adds again, like more friction, right? Because now you have people who are trying to debug and what went wrong and they are not able to find what went wrong. In spite of having a good infrastructure, right? You have everything, all observability in place, but you are not able to find. So it's like finding a needle in the haystack, right? So design becomes like prime important. Yeah, prime importance. Very true. question on uh most recent developments right uh for the last couple of years you have been working in ai at gen ai uh and you've been working closely with a lot of folks working at this side of uh technology uh can you talk about some use cases which have really excited you that yes because thenDI also a lot of people are saying that say hi it is basically useful only customer support kind of use cases not everywhere. But do you see can you name some couple of use cases which where you feel very excited about? Yeah I think so where ZDI kind of absolutely shines is I think what I call as a co-pilot, right? Where like I know my field really well. Like I'm a lawyer. I know my field really well. And then AI kind of adds to that, you know, says, hey, do you want to maybe reference this case law or, you know, this has come up somewhere? Or if I'm a designer or anything and then it kind of maybe like generates maybe images like base on. So basically basically, or or you know I'm searching for something and it gives me like maybe like suggestions and then you know that acts as a starting point so always I think for any any person looking at a blank slate starting off might be difficult right so it sometimes provides a good baseline so I think those are the I think places in creative ways where I kind of find it like really useful. The other thing is like, you know, hard science, I would say, like for example, drug discovery. Right. Anything that is like, that involves a lot of combination and requires some kind of filtering iteratively and requires some kind of, you know, some kind of, I would say, partnership between human and machine. I think that is also where it is kind of existing. So protein folding is like one well-documented example. Drug discovery is another. Finding out what are the possible candidates. Material science is another one. Where you have a lot of candidates and you want to filter down and provide a first layer of filter. That process is very laborious when it comes to humans. Can you do a filter based on the past experience and figure out? I think these are, I would say, some of the most interesting use cases, both on the creative field and I would say the hard science part of it. Right. And I mean, as far as people start talking about Gen AI, its application, use cases, and so on, in the same breath, people also start worrying about their jobs.

AI as an Augmentation Tool

00:49:32

Speaker

Right? I mean, will I be having jobs? I mean, if you are doing something which is something replaceable through AI, right? While I don't believe AI or Gen AI in particular will take away jobs, I also feel like you said, it will assist you becoming better in whatever you do. What do you think? What are your thoughts on this? Yeah, so I'll give you an example. I think both Bharat, you and me are wearing spectacles, right? I think I'm kind of calling this out because maybe some of the people listen to this podcast and can't see us. Do we think of it as technology? No. And the reason we don't think about it as technology is because it was invented probably before we were born. So, it is an augmentation. Now, if you showed this to maybe people 500 years back, I mean, they will say, okay, these people are like very technologically advanced. But we don't think of it as a technology. So, AI is also a tool in the same way. It is augmenting our already inherent ability. We cannot maybe remember as much. We already use search engines, right? We use Google, Bing, whatever. Multiple search in public city is the latest one. Because we cannot know everything. Now, has that changed our behavior? It has changed our behavior. Now, earlier, I think you and me are old enough to know that if you had to go you had to you know you had to go to encyclopedia britannica or child craft or what you know there was this world book and all of these reference books and then actually go to the index and find it we don't do that anymore we search on the web or we go to wikipedia and read an article now uh that's that's a good thing has that uh resulted in some displacement or just Probably. probably People who are building those contractors have kind of moved on to other jobs. So I feel that it is definitely an augmentation. And also I feel that there has to be especially high-risk products, you know, women in the loop. So I feel it is a productivity game. So the challenge for you is not that a robot or something else is going to replace your job. The challenge will be somebody who understands that technology is actually probably going to be more productive than you. And hence is going to be more attractive to an employer or is going to be more productive in competing with you. If you're a business or, you you know if you are competing in the employability market. Today you cannot survive, maybe today might be an over exaggeration but maybe like five years from now everybody will kind of know AI. Like we use phones, right? Phones have changed our behavior, social media has changed our behavior so it is way. And I definitely see it as an augmentation. So maybe you may not have to compete with somebody with a robot, but you'll definitely have to compete with another human who uses this technology better than you. Just like if I know how to get opportunities using phone and using apps using my phone, I will probably be better off than you, right? So, this is something which is in line with what I will be thinking about this, that it is going to help augment, assist, co-pilot, guide, whatever you name it, but it is going to make us better in some form or fashion, right?. Now the last part, which I know you read a lot, I want you to talk about or give a book recommendation which has really a profound impact on you and you would recommend that everyone in the world should read it. Wow, I think that's a tough one. So what I will do is maybe I will talk about maybe like two or three books. I think one on software systems. I think the one that has stood the test of time, I feel is the Frederick Brooks' The Mythical Man-Man. It's amazing that it's I think more than 40 years old now. It's done on IBM OS 360. Correct. And still so relevant. And it's funny. And yeah, it has stood the test of time. So I would want everybody to kind of read that book. That definitely was really useful. In terms of business, I would say I found that Jeffrey Moore's Crossing the Chasm. When I did my first startup, I obviously had not read the book and I fell in what is known as the Chasm, right? You have the early adopters and then when you go to mass market, the needs of the mass market are slightly different from the early adopters, right? So if you get very enamored by the early adopters, I think the market size will not grow. So how do you kind of do that? And yeah, I said, my first startup actually fell. So when I read that book, I had a profound epiphany, right? Oh, wow. This is what happened to me. And then, you know, and I see that kind of happening to other entrepreneurs as well. So I think Crossing the Chasm is like also one of those fundamental fundamentally good books that I've read I would say a couple more on the AI since you are talking about AI I think also AI is in some sense the flavor of the season or the latest wave as we call it there was a book by Cade Metz called Genius Makers it is a history of uh deep neural networks with like yan likon uh jeffrey intern and you know bunch of other people how how deep learning and gender tvi came to be i i found that book really good because i'm not just interested in technology but also the history of technology right that is like my personal passions as well uh and there was um i think another book by uh another ei pioneer called uh uh fifi lee uh i think it's called the worlds i see uh it is a memoir about her experience uh building image net and i think image net is also very storied because I think that was the, I mean, again, this is another good example so that, you know, Kate Metz talks about Genius Maker which comes from the neural network architecture and code point of view but faithfully actually talks about data, right? And how ImageNet provide like a substrate to kind of prove that technology works. So I think these two books are written by or cover the pioneers of AI, the latest wave, I would say, deep learning and generative. And I think reading about them is very educative because it kind of answers the question, how did we get here? And hence, maybe it can help answer the question that where are we going, right? So you need to know your past and your present before you can, in some sense, bring the future. Which is what, like, most startups or enterprises are trying to do. Very good. Thank you. Thank you so much, Irak. Really, really appreciate your time and insights. It has always been a great pleasure talking to you. Thank you so much for doing this. Thanks. Thanks for having me on this podcast. I hope this information was useful to the listener and open for feedback. Definitely. I'm sure everybody will find it very, very informative. Thank you so much. Thank you. Thanks. If you enjoyed today's episode, please consider sharing it with your colleagues who have similar interests. It will help us spread the word.

00:57:11

Speaker

but more information about v sister please visit us at ww ww dot