Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Episode 10 : OpenTelemetry to eBPF: Understanding the Changing Landscape of Observability image

Episode 10 : OpenTelemetry to eBPF: Understanding the Changing Landscape of Observability

Observability Talk
Avatar
49 Plays16 days ago

As organizations increasingly rely on digital ecosystems to deliver seamless user experiences, the complexity of managing and monitoring these systems grows exponentially. Observability has emerged as a cornerstone for managing modern infrastructure, applications, and business processes. This topic explores the shift from traditional monitoring to business-centric observability, emphasizing the integration of AI, automation, and domain-specific intelligence. Join Bharat Joshi and Amit Srivastav as they discuss how observability enables businesses to gain actionable insights, reduce operational blind spots, and align IT performance with strategic objectives.

Transcript

The Collaborative Nature of OpenTelemetry

00:00:00
Speaker
But, um and I think that's the way forward. I think most organizations will move towards this. With only one tweak, which I, if I may add here by the way, Bara. I see open telemetry because you know a lot of people contributed towards this and it they tried to make it a lot more, you know, horizontal. The amount of data it throws ah needs to be you know kind of contained at some place. right And that is where um that is where some of the work has to happen. But what I think OpenTelemetry is allowing users, the the customers to really do is to be able to move from one vendor specific technology to the other and seamlessly.
00:00:49
Speaker
um In fact, as you probably already know, you can point to a multiple different collectors at the same time with OpenTelemetry. That means you if you just want to before making a decision, you just want to make sure that which collector makes sense, which backend technologies make sense, you can just use multiple at the same time and compare the results. but I think this is making a lot more sense for customers. um This is maturing also with ah only one caveat is that um People should actually rather two feedback. One, ah technology leaders should start implementing it in the design phase and shift left.
00:01:30
Speaker
And the other is, ah look at what is meaningful to you in the entire, um you know, hotel landscape. So, you know, just for ah example, if you don't want all the traces, don't take all the traces. Right. You just just don't don't have to, and you know, just get bombarded with so much of data. Just take what is meaningful to you. But this is a lovely technology. I think this is going by use and google I think that's the future.

Introduction to Amit and Observability Evolution

00:02:10
Speaker
Hi, welcome to a new episode of Observability Talk. I am very excited to talk to Amit in today's podcast. Amit is a founder of Visaris.com, which has been started with a vision to accelerate observability transformation for enterprise customers. Amit has more than two decades of experience in sales and customer-facing roles at global observability companies like Rakuten, AppDynamics, and CA.
00:02:38
Speaker
Amit has seen the technology evolution from reactive monitoring to highly proactive observability. He has developed a strong technical understanding of this domain and has worked with most of the open source observability tools. He has a unique perspective on customer pinpoint and what customer needs. Hi Amit. Thank you so much for coming to the podcast. We are really very excited to talk to you about observability.
00:03:06
Speaker
Hey Varath, thank you. i mean I'm equally excited too. ah this is so This is the domain which is very close to my heart and I would love to see if I can contribute your ah your podcast. And by the way, I've watched quite ah quite a few of them. I love the format. So yeah, let's play along.
00:03:25
Speaker
Thank you so much Amit for the feedback. ah As we discussed right that we basically started doing this to create some sort of awareness about observability as well as ah tell people from different angles how observability really matters.
00:03:41
Speaker
right ah Coming to my first question, ah see you have been in this space for more than a decade. ah And you have been ah seeing this evolving from a highly reactive metric-based monitoring, right and ah which was maybe 10 years back, to a full-stack observability now. Can you talk about how this space has evolved over time? And you have seen it from from the front row.
00:04:08
Speaker
ah Yeah, thank you. um But this is so probably the most important question that one has to really look at an address and If you look back a few years back, and especially um you know the the the complex complexity that we were seeing in the IT infrastructure, managing the IT infrastructure applications, et etc.,
00:04:29
Speaker
were not that much. ah We did not have the speed of, ah you know in a way, digital users as much. We did not have the speed of development and innovation as as much as we see now. And we did not have a digital infrastructure readiness at that point of time. But what it brought along was the complexity right a I've seen, I've witnessed, by the way, when we were moving from that bare metal server architecture to virtualization, you know the the mass virtualization or x86 virtualization. um And then suddenly ah microservices containerization came in.
00:05:12
Speaker
um and they and the Kubernetes changed the world.

Complexity in Modern Systems and Proactive Management

00:05:15
Speaker
So at one side you are breaking those silos, but at the same time you are also creating a lot of connection which can fail it at any given point of time. So earlier one monolithic system, um everything was written there, the communication was not really that big of a problem. Now you have so many problems.
00:05:35
Speaker
um So, that was one part. The other is the digital dependence that has that that has caused this you know becoming more proactive. um Earlier, people could wait for a few days to just get the service running. um And then it got yeah kind of but reduced to hours and then minutes and now people don't wait for seconds also. So if something is loading and I just switch to something else, it's is is just how the um the new generation is or the digital users are.
00:06:08
Speaker
but You got to be proactive if you're a business owner and if you're running a business which is serving it through digital, which eventually everyone is by the way, um you will have to be proactive. You can't really ah stay on reactive. But here there are fundamental um evolution which also have happened. um Is that?
00:06:29
Speaker
A lot of ah possibilities came along. right ah Earlier, it wasn't really possible to get all the information that you were seeking to become proactive at ah you know the those days. One and a half, two decades back.
00:06:45
Speaker
Now, you have tools, you have solutions, you have pre-built signals or probes that can help you understand the not only the health of your systems, which used to be a reactive method, but also on a Ziffy just to, you know, just link it from one to the other and trace it to a mean ah root cause and quickly act up upon that.
00:07:08
Speaker
Right. Right. So this has become a really, really interesting time when things are now moving from reactive, actually moved already from reactive to now proactive and and know in a way prescriptive also ah to to an extent. Right.
00:07:27
Speaker
And I see the future evolving a lot more ah towards you know automatic um or or a world where three what there used to be a concept called no ops, eventually possibly leading towards that. I don't know how, but yeah, I mean, there are a lot of discussions and thoughts around that as well.
00:07:51
Speaker
but Very interesting times Amit and really thank you for coming to the podcast to just explain this part itself that how you have seen it evolving from one end to another end. ah See early on ah when I was talking to some of the SREs and solution architects or technical architects and so on. right ah They were looking at it more from application architecture evolution. Like you said, right that we had a monolithic, then it became a SOA sort of architecture, then it became some microservices. Now people are mixing it a with serverless architectures and so on. had a um One of the challenges they were facing, which I think you can put a little bit more lights on, ah are the monitoring bit.
00:08:39
Speaker
right While it was so easy to monitor a monolithic because everything is within a single application, um what changes you have seen um ah from ah ah monitoring a monolithic which is like a metric based or agent based or so on eight to a serverless which is a cloud native or microservices architecture which is cloud native architecture.
00:09:04
Speaker
And then there are a lot of customers which are still some part hybrid, some part fully on premise, some parts ah some people are already fully ni cloud native. So can you give a mix of like how people ah are looking at monitoring some of these applications which has evolved like this?
00:09:25
Speaker
Yeah, I think this is um this is where I was coming to ah from a complexity standpoint. So you know what changed? um And by the way, the cloud definition ah and definitely has changed um the quite a bit. But if you go back to about you know a couple of decades um and you would see there were cloud providers which were providing you bare metal as a service sort of stuff. right where they used to take care of monitoring bits and pieces and you deploy your you know workloads, etc. um The interesting point which has ah happened by the way in between is the evolution of multi-cloud hybrid environments and businesses inability to really transform fully to just one type of infrastructure.
00:10:17
Speaker
What we miss nowadays, and and by the way, this is probably one perspective that I've seen most of the people struggle with. is that when it comes to SRE, they believe that SRE is only about application availability. It is not. It is about the holistic digital service availability. And that the foundation for that is also the digital infrastructure. And whether it is on-premise or cloud or a server ah-less, um all of that has to be um learned or monitored to see how your applications are performing and then the linkage.
00:10:56
Speaker
The other thing which has happened, by the way, Bharat, is um the when evolved from monolith to it became microservices, architecture, and serverless, your ability as ah as a necessary to do things beyond um is very little. Most of the time what do you do is you restart your files and you change your MLN.
00:11:23
Speaker
you know, you do, you take some API calls from, ah you know, you ah mono letth your your serverless functions. um All of that.
00:11:36
Speaker
is is kind of now if you have to fix it and you if you're an SRE, that means you have to rely a lot more signals, which are cross signals. So you'll have to look at your application signals, but you'll also have to look at how your infrastructure is behaving or providing the context, and then you should be able to take the action ah based on that. So importance of monitoring has become too much now. Earlier, we were if you were just, um let's say, a monitoring person, you you were only required to know our People were very domain-centric or specific. I'm a storage monitoring person. I'm a database monitoring person. I'm a server monitoring person. You know, those kind of people. Or I'm application monitoring. That boundary is broken now. SRE has to know everything end-to-end right in order to make sure that the applications are running, functioning. That means the monitoring and the tools, whatever those are supporting,
00:12:31
Speaker
has to be you know intuitive enough, simple enough for SRE to just consume it. right And by the way, in SRE practice, tools are just a part. they have to make sure this is this is running. So that means a tool should not create a burden on them. It should create a yeah you know if ah ah method or a way for them to act appropriately. That means the this is more like um you know a source of pro that they can just look at and take their actions so that and their digital infrastructure, digital applications and digital services are up and running all the time.
00:13:10
Speaker
Very, very true, Ahmed. One other the thing which I will definitely ah follow up on, ah to give you a view, like you said, monitoring is becoming very critical. So in a couple of customer environments, most of these customers have the criticality of their applications. They will say that there are 10 platinum applications, 10, 15 gold applications, and then like that. right So earlier we were used to, I mean, um our monitoring platform used to come in maybe a branch or or something which is very, I mean, not so critical, right? But for last two years, we are seeing that our application is also coming onto gold and platinum, because they are now realizing what you rightly said, right? That ah if I do not have this monitoring,
00:13:55
Speaker
then I am completely blind. I am not even having a view into my how my applications are working, how my Kubernetes cluster is performing, or how my serverless function is performing. right So ah the monitoring is actually becoming quite critical for all these digital first sort of businesses. ah I have another point to add by the Webera, just on this topic. And I generally explain this when I explain this to people, this is how I explain. I say,
00:14:24
Speaker
You know, building an application um is the process of, it is very similar to building a house. You will take, you know, that painful one, two years to build your house the way you you want, right right? But eventually you'll have to manage and maintain that property for decades, 20, 30 years when you start living it.
00:14:49
Speaker
right So, if you do not plan where your water tap is or the pipeline is going or your electrical fittings are going um and if you do not maintain that, you can't live in that house and you cannot be in forever building mode.
00:15:06
Speaker
you have settle down someday. in In my opinion, I think that we have seen in the last couple of years or maybe five, seven years, a lot of digital transmissions have happened that led to a lot of development of applications and critical applications on a customer side. But that building phase is now getting over. You can't really keep building every single day. right You'll have to utilize that. Now, when you want to utilize this, you'll have to maintain it.
00:15:34
Speaker
and maintenance has to be done in in such way that it does not break your systems, your applications, your user experiences. I guess this this is where the importance of monitoring comes in. right You know, you build a house, ah put some money, you will realize that maintaining costs a lot of money.
00:15:55
Speaker
Right. You know, that's good old saying of 80% money goes into, and you know, keeping the lights on. So that's where we probably will be heading to.
00:16:08
Speaker
Right. and Very true. um We talked about these various tools. right One of the things which basically have been making news for the last few years is open telemetry. right It's sort of a standard from CNCF and they basically have a standard tracing ah framework which also have metrics and lock collection which is ah currently being touted as something like an enterprise wide tool which can be completely vendor independent.
00:16:37
Speaker
ah Can you tell us a little bit about more about what your experience has been with OpenTelemetry and how you have been using it for your customers? m so like me Before I answer this question, let me just take clear my stand on this. okay I think the way forward is OpenTelemetry for everyone, but a tweaked version.
00:17:02
Speaker
In my and belief um and it's being really, really long time that ah CTOs and the technology officers have only focused on building the application.

Building Reliability with OpenTelemetry

00:17:13
Speaker
and never built a reliability um component as part of their you know but design. right So that shift lift did not happen. and and And when I speak to a lot of leaders and and you know the IT leaders, especially CIOs, whose jobs are to maintain those digital infrastructure once they're built for a long time, they this is their primary concern.
00:17:40
Speaker
that what they receive as as their application did not have does not have um you know the proper reliability hook points or information. ah williams like Open telemetry yeah and and and hence you will have to probably go to a vendor and get logged in with their technology to just collect the same information.
00:18:05
Speaker
um I'm really thankful to CNCF by the way for that matter to come up with open telemetry as a standard to collect all these data.
00:18:20
Speaker
But, um and I think that's the way forward. I think most organizations will move towards this. With only one tweak, which I, if I may add here, by the way, Bara. I see open telemetry because you know a lot of people contributed towards this and it they tried to make it a lot more, you know, horizontal. The amount of data it throws ah needs to be you know kind of contained at some place. i And that is where um that is where some of the work has to happen. But what what I think OpenTelemetry is allowing users, the the customers to
00:19:05
Speaker
to really do is to be able to move from one vendor specific technology to the other and seamlessly. um In fact, but as you probably already know, you can point to a multiple different collectors at the same time with open telemetry. That means you if you just want to before making a decision, if you just want to make sure that which collector makes sense, which backend technologies make sense, you can just choose multiple at the same time and compare the results.
00:19:33
Speaker
but I think this is making a lot more sense for customers. um This is maturing also with the only one caveat is that um people should actually rather two feedback. One ah technology leader should start implementing it in the design phase and shift left.
00:19:53
Speaker
And the other is, ah look at what is meaningful to you in the entire, um you know, hotel landscape. So, you know, just for ah example, you if you don't want all the traces, don't take all the traces. right You just just don't don't have to, ah you know, just get bombarded with so much of data. Just take what is meaningful to you. But this is a lovely technology. I think this is going by use and we will, I think that's the future.
00:20:22
Speaker
Right. I mean, I completely agree. In fact, we also adopted Hotel in ViewNet to bring in the visibility into distributed tracing and tracing for applications right and services. ah One thing which I really liked about what you said on the shift lift part ah I was talking to one of the SREs and ah he was very, very clear in terms of, ah like you said, right that reliability is usually an afterthought. Once the application development is already done, now people start thinking, how do I make it reliable? So he started with the very simple statement, reliability is a feature.
00:21:03
Speaker
which is a must feature for your application. You should not treat it as something ah which basically will come in future and then you can live with it. eight ah So that was something very interesting and then the whole lift shift left as well as we started calling it embedded observability ah that the thought process of ah how you will monitor or how will you observe your application.
00:21:30
Speaker
ah whether it is logs or traces or distribution of services or infrastructure and so on, ah it has to start from day zero. eight i a And OTEL actually is ah very, very useful because you really don't need to ah put any money or buy any licenses and then you can actually start from day zero.
00:21:50
Speaker
i yeah And it also makes sure that you are ah ah having an application which is already instrumented ah as well as it's quite vendor independent like you talked about. So there is no issue even if you are moving from let's say UAT to production and you can actually move it with the hotel instrumented agent as well and start monitoring it from day one. eight ah yeah With this on the topic, ah a lot of time, right most of these tech leads or tech architects are really worried about their application performance.

EBPF vs OpenTelemetry: Different Approaches to Monitoring

00:22:28
Speaker
ah if they use an inclusive APM agent. eight ah This is something which we have heard a lot from some of our customers who have been using certain APM tools ah and were insisting to us that is there any way ah where you can actually monitor application performance non-intrusive?
00:22:49
Speaker
right And this I'm talking about sometimes 8 to 10 years back and that's when we came up with an option ah which we proposed to them that maybe we should use log based application performance monitoring. Though it is not fully fully software ah performance, it is at least your application which is you are using it for transactions. So we will provide you sort of a transaction performance.
00:23:14
Speaker
monitoring using locks. So that's one of the options. But ah very recently a new technology called EBPF ah has come up and a lot of people are talking very good things about this and then how ah this is non-intrusive in nature and then you can still observe your applications, transactions or API calls and and so on.
00:23:37
Speaker
they Can you talk about a bit ah about EBPF? Because I saw some of your posts come from LinkedIn about EBPF. So I was having this in my mind that I will definitely ask you this. um ah Can you talk a bit about it? yeah What is this and why it is getting popularity?
00:23:55
Speaker
and ah why you already have answered, by the way, why it is. ah and Look, open tailor tracing is intrusive and it it will have um a lot of instrumentation headache also. You'll have to go down to your application, understand your application, hook it there and it will throw a lot of data and a lot of that is, and it you know, is not bad. It's just that it is too much. It is overwhelming with the open telemetry.
00:24:26
Speaker
um Whereas ah most of the places, ah especially high transaction press places, people don't have time to really look for everything. EBPF is making it possible.
00:24:38
Speaker
Now, ABT, as the technology, um and by the way, this existed for some time, yeah and but but primarily in the space of um and Kubernetes, the network and security ah components, ah most of the virtual networks or SDMs function on this. Telecom ah uses it, but the challenge earlier was the payload capacity was what it was supporting.
00:25:05
Speaker
Now it supports a lot more and then that means you can take a lot of data which is your, you know, your system centric from observability standpoint and make system specific logs and put it back. System specific trace and one interesting thing is about the profiler so it can capture the profiler.
00:25:25
Speaker
Right. So interestingly, all four golden signals, what we call it, MELT, right, and so ah metrics, events, logs, and traces, across network, infrastructure, um yeah to some extent, the application or or kernel-level applications, and security. So those four areas, it covers pretty well. And I go a little bit deeper into this and why it is gaining popularity.
00:25:50
Speaker
If you look at it EBPF, what it has made possible is that you now are abstracted from your application. That means EBPF programs can run straight on your operating system or on a kernel. like okay That means you don't have to know your applications and you don't or application types and you don't have to instrument anything on that.
00:26:10
Speaker
Also, because it is near near to um you know kernel and it is a kernel-based program, it will understand what is being executed, what calls are coming and what is going there. and If you're able to instrument what we call it a hook point from your application translation or pod transitions to your kernel or to your executions, then you should be able to create a first level of trace automatically there in EVPS.
00:26:38
Speaker
okay So that way you get, you already get a lot of data which most of the SREs can automatically rely upon. For a async environment, this is amazing tool, I must tell you, because then you're anyway not looking at distributed tracing, there is no information on a context going on. But for most of the other, the other reason why I think it is getting popularity is because of the simplicity what it offers.
00:27:10
Speaker
The data is very defined, that means your collector can be easier and it ah it also has a lot lesser space requirement. So when it runs, it is um is you can simply say that it is one fourth, one fifth of most of the application agents, what it consumes. It provides you system level information and it provides you first level of application related information also.
00:27:35
Speaker
some logs profiling and some traces so that you can just build upon this. So I think EBPF is gaining a lot of popularity because of this is a lightweight yeah and it is, um you know, agnostic.
00:27:48
Speaker
Right. But there will be some challenge also if somebody is, you know, willing to take that EBPF is a kernel space program. And that means it has to be ah instrumented carefully. ah So, you know, choose your technology ah carefully and also um It is also difficult to, so it is not, it is not as easy in terms of, you know, it is not shift left technology per se. It has to, it it will be done during the ah deployment. um So in a way, lightweight, very easy to instrument, very easy to get the data, but it is not necessarily, um you know, pre-production building thought processes, mostly about the systems management.
00:28:42
Speaker
Right. um yeah Yeah. Right. So ah I was reading up somewhere. So this EBPF currently is available only on Linux platform, right? I mean, from the OS perspective. So that limitation... and Yes, it is currently available on we Linux. Windows, I think, Windows Server 19 onwards, ah there have been ah some efforts on building up.
00:29:08
Speaker
um translator So but the way it works is the EBPF programs are current programs so that you have to have the access. Windows it does not offer that. um So yes, you're right. It it is currently mostly on um Linux and Unix. But I'm assuming that with ah with the newer versions of Windows coming out,
00:29:30
Speaker
as they are also committing themselves to CLM and ah to EBPF projects. CLM, by the way, managers like cncf managers clilia manages these and ah programs for development of EBPF. I think they have some understanding with Microsoft on exposing some metric, not the entire access, but some access which can provide the hook points.
00:29:56
Speaker
So the way it works on the EBPF is that there are hook points which can be exposed. And you can then hook your EBP program instead of directly accessing the kernel. Because kernel does not allow that. um and Even in the Linux also. I think Windows, Microsoft is making that effort.
00:30:15
Speaker
Okay. Okay. Interesting. ah You talked about ah something about async calls. Can you elaborate a bit on it? Because see, when you come from the APM words, right?
00:30:28
Speaker
APM would do the or or distribute tracing also from that type perspective. We'll actually be able to ah monitor the sync calls very well because then they know and that a request has come and a response has gone back. But a sync calls usually the challenge for ah those tools. So how is EBPF able to solve that challenge?
00:30:50
Speaker
No, no, it is not. It's not about, follow so by the way, in OpenTelemetry, you can solve async by the putting up our context preparations, etc. and but um yeah so So look at the foundation of eBPF. So eBPF runs on ah um the node or with you know every node in a cluster. This is natively Kubernetes cloud native technology for observing you know behavior and performances.
00:31:20
Speaker
So what it does is that it looks at every part of their communication and it will take the information and will... So those connections are part to part connections or deployment to deployment connections are all um um a TCP connections. So it will read that and it will provide you one label. Now what you can do is that you can instrument OpenTelemetry on top of this to give further distributed tracing. For async,
00:31:49
Speaker
um in both cases, in case of ah pure play open telemetry or combination of EBPF and open telemetry, you'll have to do the context propagation in order to see the entire trace.
00:32:03
Speaker
okay okay So this would be something like a trace ID or something is passed between the view and then okay to correlate yeah okay to correlate that. okay okay okay ah Second thing is that somewhere I was reading that this EBPF applications are very similar to this Linux modules, kernel modules. Yes, but the only difference is that maybe EBPF provides much more flexibility and you can write a full program while kernel module actually is very very fixed or static.
00:32:35
Speaker
No, no. So I'll explain as the way it works is that you have EBPF programs that runs on a kernel level or kernel space. Right. Okay. Now EBPF programs uses kernel to directly execute it. Right. Most kernels don't allow. But what you do, you take some Selium libraries, which create what they do is that Selium has already ah written some libraries on the kernel and they expose the hook points.
00:33:02
Speaker
ah So your EBPF programs use that hook point to capture that data and send you that information. okay So that may that makes it easier. Celium program has become part of and they are also EBPF programs only.
00:33:17
Speaker
but they they have become part of the kernel foundation itself. right So if you and I write a program um directly capturing the information, we'll have to do a lot of so we'll have to go, we'll have to recompile a kernel, you know, um and ah put our programs there. So you can write it will run like a module, but ah most cases you wouldn't want to do that. Okay. Okay. Yeah. I mean, you'll have to, you can do that is just that it is too, too much to work. Okay.
00:33:47
Speaker
So just to summarize, eBPF, non-intrusive, much more lightweight, can be done quickly. It's just that you need to get things right ah in terms of kernel module or kernel application or whatever eBPF application you're writing. So other than that, that is correct. That is why. i and ah Absolutely. and And you just have to be careful on what you're capturing. Yeah, because it is dual edge, by the way. So what also it allows EBPF is a two-way. You know, it can run execute something which can control your kernel.
00:34:27
Speaker
Right. Okay. And that is where it is being used in security do domain also quite a lot. Okay. And so, you know, on the runtime. So yeah, just be careful that it's somewhere that they were basically using this to configure IPv6 headers for initially it it started like that.
00:34:45
Speaker
Yeah, the BPF. Right, right. Interesting. ah See this, we be basically talked about multiple things. One more thing I wanted to add to the mix of things. APM, full stack observability and AI ops. How do you, because there are a lot of overlaps.
00:35:06
Speaker
When we talk about full-stack observability, it does include APM. When you talk about the AI ops, if you look at what Katner sort of created as a definition, it also put NPM, APM, infra monitoring, and automation and AI in in the middle and so on. right So how do you see ah some of these things and how it mature in custom environment? Like we talked about reactive, proactive. You talked also talked about prescriptive and maybe automating, right? Then AIOps actually maybe, do you think, not star sort of thing for people now?
00:35:46
Speaker
um ah One should aspire for. Yes. Northstar, I would still say, yeah you know, SI metrics, and you know, the reliability. um But ah the interesting point here, Bharatan, this and APM is foundational. So you'll need it um because there are critical applications ultimately that is earning your revenue. Most companies And by the way, I don't know which company it is not. perhaps So I'll just get to that. And full stack is required. Full stack is required because the world has moved um now. ah you know The IT world has moved from a Monolith now to Hybrid to Cloud Native to Kubernetes, etc et cetera, et cetera. So if you don't understand your underlying infrastructure, you wouldn't really know.
00:36:42
Speaker
And by the way, there are a lot of other complications in terms of different type of browser, different type of accesses and mobile devices, et cetera. That means you'll have to know end user the performances also. And you'll have to have an ability to stitch it to what we call it a user journey based views. Although the consumers are different in my opinion and within the organization for the data that you will be capturing, but full stack allows you to see that in a context.
00:37:10
Speaker
Right. um So that is that is where most organizations are today. um dave and And that, as you said, um that includes your ah you know the critical components of network, which is allowing you to look at the part-to-part communications, server-to-server communications, e etc.
00:37:31
Speaker
um your performance of infrastructure, performance of your application, faults of your application, traceability, logs, et cetera, et cetera. So this makes a right mix of full stack. um The AI ops side is something which is, I mean, imagine what we are talking about here. right but um Imagine full stack in a true world where the amount of data, you're like, terabytes of data getting generated every day. night ah How is it possible for that data to be analyzed by SREs? If you really want to make your decision tomorrow, right? And if your business really is about, um you know, understanding that all weather factors and forecasting it for tomorrow.
00:38:21
Speaker
humanly impossible. That means you'll have to have AI ops. But here is what my take and i this is only my take. okay This is not for every organization. And this this also depends on different maturity level. So one shouldn't really rush for just making it AI ops enabled, but use some AI modules. Don't have to do entire AI ops. Entire AI ops has got a lot of components to it. right Let's say if you're looking at a pattern matching, and let's say you're able to create a baseline, compare against it.
00:38:55
Speaker
right Okay, those kind of maybe some root causes causation, maybe some correlation, those kind of AI functionalities within your data make sense for most organization or some which depend on, you know, high volume, retail, e-commerce, etc, etc, where they need a prediction for tomorrow also from that ops.
00:39:18
Speaker
You make a AI of full-fledged system where you have a early warning system, you have a forecasting system, e etc. Otherwise, it doesn't make sense. Why would you be really worried about what your performance will be in the next six hours? um Or how dynamic that environment will have to be?
00:39:38
Speaker
Yeah, yeahp I mean, I don't see banks also going through that. I mean, probably Flipkart and those kind of e-commerce portals, when the sale happened, etc, etc, they want those and forecasting capabilities and early warning systems. But most people, AI, of course,
00:39:56
Speaker
but those as a module, AI ops as the plethora. I mean, you shouldn't really just brush for ah rush for it in a FOMO. Just thinking that the world is ending tomorrow. kind take your Make your house in order first. Understand your business metric. i you know I love what you guys do.
00:40:15
Speaker
If you don't have the business metric, what is the point of having tomorrow? you know and and So you'll have to have that first and then look at the volume, see what your smaller modules can do and then look at that. On this specific point, by the way, I don't know if you have a question on this or not, Bharat, but There are a lot of possibilities of using those you know the newer-age AI applications and LLMs and GPT kind of AI. I'm guessing that you probably would have a question on this. Definitely. So if that is there, then I'll just park it, but there are applicabilities around this.
00:40:56
Speaker
No, no, so definitely. I mean, there are a lot of different things happening ah ah with AI itself. And this GenAI is brought in new type of possibilities. right And we will definitely discuss that because that is one of the topics which I also want to understand from you, you being in the market, talking to a lot of customers. Where do you see we are going? So you talked about right that APM to full stack observabilities currently happening.
00:41:23
Speaker
Now, from here, where do we go and how we go? We will talk about that. and he ah But before that, ah see ah ah you have founded a company which basically is looking at accelerating observability transformation for enterprises.

Challenges in Observability Adoption

00:41:38
Speaker
ah yeah I wanted to understand from you, what are the top three challenges you see with enterprises today in adopting an observability platform?
00:41:46
Speaker
ah Can you talk about like whatever you would be talking to a lot of customers? What what exactly ah comes up? the The first one, the first one, first and foremost, what I say is yeah the mindset.
00:42:01
Speaker
They are still living in that reactive world as you are talking about it, right? And most of the organizations have, this is after part, why should I do this? I'm already using some legacy application from, I don't want to name those companies, but they are the products. So, you know, I i have yeah EMS, I have NMS, those kinds of things, right? So, not that they are wrong, they probably are serving some purpose.
00:42:29
Speaker
but that won't lead you to, you know, next phase of being proactive for your organization, right? So I think the first thing that most customers and most CIOs say that the reliability is not in their design. And that is what I think the CTOs are to be honestly blamed for this. They've never thought about it and they still don't.
00:42:56
Speaker
So I think the first and foremost is that how do you change the mindset and it is not about touring. It is about the entire, you know, what will happen if I, if my code goes into production. They'll have to think about that. They'll have to write those, you know, what signals to one, you know, SRE use.
00:43:17
Speaker
to do what, you know, those kind of things that is not happening. So the first and foremost is that this sit-in is the the bigger problem. I think a lot of vendors have been culprit of this is that locking customers beautifully in a cage so that the technology, you know,
00:43:40
Speaker
that kind of created a lot of fatigue in a customer's environment. right So if you really ask a customer which is using, you know, it's a billion dollar company's product in observability, their first challenge will be how do I migrate? I have so much of dependency.
00:43:56
Speaker
yeah ah agent movement, what will I do in that blackout time, how will I get data, all that stuff. So right that portability and vendor locking is a massive problem. i think i think that is ah another so I think this has to go through one transformation to just move entire customer and that can only happen when CTO start allowing and taking newer stuff on open telemetry or their own you know whatever version they want to build. Build a telemetry. This is what I'm saying. Build telemetry from the day one.
00:44:26
Speaker
and you know compatible with open telemetry standards and then you know and anybody can use and then your CIO does not have to really be on ransom on some product simply put so that is one this is the second point you know the vendor lock-in the challenge that we face and the third one is a skill set gap I think Bharat there is a massive need of what we call it you know operational people. If everyone is building, who's maintaining right and who's equipped to? right So essentially, if you see the project creeps happening, it's happening because the operational people are sucking up the innovators and development people.
00:45:09
Speaker
Because things break right so you'll have to really go back to the same engineer and developer and understand what really has happened so i think you need to have. This is a lot to do with um you know skill set issues skill set gap.
00:45:25
Speaker
um And it's not only about instrumenting your, you know, APM or tools or all different stuff is is about the SRE as a practice and a mindset. Right. I think three top challenges in my opinion, I'm just summarizing it. The first one is shift left, which is not, which still has not happened. And um I think this should happen. and Second is, you know, moving to a vendor agnostic technology. And third one is building a capacity for SREs. Yeah. So these top um No, I think ah the first one I think again ah very important, ah the the way you explained it right, I was just recalling one of my conversations with one of the CIO, maybe we will be doing a podcast with him ah very soon about the culture of organization for observability.
00:46:14
Speaker
eight Is there a culture already in place or not? Or is CIO or IT heads or everybody is working towards creating such a culture? it Because ah if you do not have how much ever tools you put, whatever things you do, right maybe it still still not work. right so You will continue to just monitor. I am monitoring it.
00:46:38
Speaker
yeah You will never become a site reliability engineering um yeah or SRE focused organization. You can't be because SRE focused organization is looking at How do I and make sure the digital services which are being delivered are reliable and are exceeding the expectation every day? That can only happen when you build that thought process. And that starts from shifting left, you know go to vendor neutral, make it um agnostic.
00:47:09
Speaker
right So, ah just extending this to the ah top three challenges you see, ah if you have to advise a customer on their observability journey, necessarily somebody is trying to figure out how do they ah do, maybe they are doing a reactive monitoring today, but they want to go on this observability journey.
00:47:29
Speaker
What could be the roadmap and typical things which you will basically suggest or advise for an observability solution in the current context? How they should go about doing on this journey? I'm ready to call on this. I must tell you this. I think one shouldn't get into FOMO first.
00:47:57
Speaker
and I think a lot of hype is there around observability. One should look at what exactly do they need. And I'll tell you two different customers and their needs I've recently encountered. One customer and when I looked at that and they really wanted the world's top solution. um And when I really looked at their need was only the metric that they needed. Only the metric.
00:48:26
Speaker
um Do you not? I mean, some trace, yes. But they were mostly on Kubernetes, mostly pods, and I think mostly. I mean, most of these are SPAs going to payment gateways, et etc, etc. What will you do there? Right, right with having so much of trace in those stuff.
00:48:52
Speaker
That is where I think maybe EBPF works very well. i And mostly metrics and events and logs, of course. Not too much of trace, but then there is and just contrary to that customer, there was a customer which has, you know, which is is in banking environment, has the entire journey.
00:49:14
Speaker
multi-page journey. And anything breaks at one point, breaks the entire revenue. right That is where you need a trace also. And also they have hybrid, majority is on-premise. You can't beat it. You can't really do the modern way. So you'll have to have that mix. So my point is that Don't just do it because you you know somebody said or you found a better tool or just paying for it. Do it thoughtfully. It's not that you just have money you can spend anywhere. right The other thing is about, you know once you have the tool,
00:49:52
Speaker
You also should look at, Bharat, I'm genuinely saying this and I've seen most of these failing by the observability journey here in India. It's because most of the time you buy a tool, but you don't work with the vendor, they don't work like a partner. If you don't work like a partner, these tools are not for one day.
00:50:14
Speaker
ah Your application is changing. This is pretty intrusive, right? One has to change according to this also. right I mean, imagine you are one of the customer in 10,000 customers list of a world's, you know, top product, one of the top product.
00:50:31
Speaker
How will they treat your smaller requirement of you know a new Rust-based application with certain features that you want now as per your telemetry requirement? You will just go around nuts, right? And this is exactly what happens. So here is what I also say, that value of observability keeps diminishing.
00:50:58
Speaker
um month by month. And one reason is because you still buy that ah tool. You don't take partner's help.
00:51:09
Speaker
or you don't work with your audience as a partner, or don't partner them with your journey. So you'll have to go there, you'll have to cross that bridge, and have to ah make sure that you know your your your communication is is is there with them. um You enable them, they enable you, but for that you need to look for a partner, not look for a vendor which is, you know. We're shipping you a product and doesn't care of your success.
00:51:37
Speaker
Right, right. So just to build on what you just said, right, that this is something which we get to hear from a lot of our customers that ah they are using some of these very large OEM products, but it has not been implemented correctly.
00:51:56
Speaker
And that has also happened, I believe, because the partner did not have enough skilled people or expert people. right And obviously, the OEM was not into the implementation part at all. are you And that is a... I'll tell you one example. um I'll tell you two examples. where and Very interesting. One side I've seen, by the way, pretty much all these places, the large companies, the observability projects,
00:52:24
Speaker
It starts with a real enthusiasm, a super enthusiasm, and then it starts with declining, declining, declining, declining, and then goes flying. But I know quite ah you a couple of customers, and I've seen them using observability, um you know, like really, really sustained uses. And one thing I found common with both of these customers that teams there were really focused on reliability.
00:52:53
Speaker
And they understood they only took what they needed and what they could chew in a way. right And they only worked with that. Forget about the large OEM or a smaller OEM discussion right now. right But they looked at that and they looked at why they needed it.
00:53:09
Speaker
And as long as this is, you know, fulfilling their needs. So I've seen those implementation of, unfortunately, I can't name them because I did not really take the permission, but I love those teams. um and ah and And their users even, you know I've been associated with them for about eight, nine years in a friendly relationship, not a business relationship anymore. But um I've seen them using those tools sustainably for now. In fact, they have been just renewing instead of moving to the other ah tool, which is also a good thing because they know what they exactly want.
00:53:49
Speaker
If somebody is just taking a path right now, I think I'll suggest don't get into FOMO, what is not ending, make your real, real, ah you know, a use case is clear to you. and People can sell it different ways, but I think what is important is is your use case. That is really good advice because a lot of time people, if you look at RFPs, right?
00:54:16
Speaker
I mean, you can see almost anything and everything which they, even they do not want it, it would be there in the RFP. So what I really liked what you're saying is that you need to create a journey, maturity journey for yourself and then take it step by step what is important or critical for you and then get that implemented and then move on to really yield to it correctly. Very interesting.
00:54:46
Speaker
oh Just switching gears a bit, you have been talking to a lot of customers, partners, vendors, OEMs, and so on. right I just wanted some sort of market intelligence in terms of what exactly people are thinking, what are their actual pinpoints, and what do they expect from observability as the technology.
00:55:09
Speaker
And by the way, a lot of this ah is a myth, by the way, that one knows, but I can only share what I but i hear from market.

Market Demands and Trends in Observability Solutions

00:55:18
Speaker
okay yes yes and and One thing that's very interesting, which is coming up, Bharat, is pretty much all IT leaders now, when I talk to, they have this tools for fatigue.
00:55:33
Speaker
Right. And I think it has, it has, we have reached to that stage because of the digital transformation push that we saw in last few years. And most of that were in building phases, right? So we were in building, building, building, building, building.
00:55:49
Speaker
And for every single thing, just, you know, you needed something and you just created one piece of it and you start. So there's a tools fatigue and they don't have enough people to, and to deal with ah these are things. So but one, um one ah feedback that I keep hearing from, ah from people now is that they say, don't ah sell me tool or not anymore.
00:56:20
Speaker
Yeah. ah Tell me ah the outcome. Sell me the outcome. Sell me, um you know, um in a way a solution to my problem. um joan Don't dump your stuff on me. um And I think there was the mad rush few years back when people were just buying left, right, center on these things. They just bought everything, whether it fits their need or does not. Now people have become conscious.
00:56:50
Speaker
Both reasons because of cost and also because of usability, as I said, the average utilization of tool in the capacity or capability per se is not more than 30% in any organization. um Observability tool. Right. um Except few maybe. So I think ah what what they're saying is that, come work with me, take the ownership, put your skin in the game, be my partner in this journey and help me reach there.
00:57:18
Speaker
um So that is the first and foremost stuff. The second is that You know, there was a lot of discussion around, you know, move to um how you are different than those fee and people used to just pull up that feature list. They don't care anymore. I've seen pretty much all of them. They say we don't care. My journey is different. And I was talking to one of the streaming companies,
00:57:47
Speaker
an IT leader, by the way. and And what I loved about it, he said, that yes, I implemented a lot of these tools just because of you know the hygiene factor.
00:57:59
Speaker
But my problem never got solved on, um you know, ah if something goes wrong at some point of time, how do I get the result right? So i glad again, remains the mystery. And I have to end up you know restarting few things to just get it back. So I think the people are looking for, again, just look at my problem, solve my problem. I'll pay for you. However small you are, however big you are.
00:58:24
Speaker
And I think eventually people are now have come to a conclusion that brands don't matter as much. And that is the reason I think some of the leading analysts you know the so-called analysts are facing the heat now that people are not taking them seriously as well. But yeah I mean this is so this is the first feedback that I'm getting. The second is that newer technology adoption. So people always wanted to go for non-intrusive because it was not there. And that is the reason why eBPF you see a lot more. This third is of course vendor neutrality. So I think the the feedback is coming up from pretty much every quarter stack.
00:59:06
Speaker
We want to move. Unfortunately, they don't they are not moving shift left. They want the vendors only to support. right But ah essentially this becomes like vendor comes and installs the OpenTelemetry and makes their own OpenTelemetry version.
00:59:22
Speaker
You'll kind of hook it a little bit. But yeah, I think that is also happening ah some bit. What they are asking, yeah as I said, more solution-oriented approach, the nick in the game um you know as ah as a partner or shoulder it as a partner. um And um again, don't sell me a tool. and That's, I think, a big no-no right now.
00:59:47
Speaker
night No, I think i i I am sold on all the three points, and I think some of our interactions are also leading to some of these things, where um they have started taking much more ah view into how they observe these applications. And obviously, they want their problems to be solved now. I mean, they are not looking at, yeah like you said, right that and not looking at okay whether you have this feature or that feature.
01:00:14
Speaker
I have this problem. yeah Can you solve this? Whichever way is correct you do it. right So while customer expectations are changing a bit, how do you see this observability platforms or tools evolving? So one is obviously people are going towards CNCF, ah open telemetry, open source open telemetry, vendor agnostic, and some of those things are happening. But is there any other evaluation you you see for observability tools?
01:00:45
Speaker
Yeah, i think I think it is a multi-directional evolution happening at this point of time on observability space and spatially tooling space. So I'll just bring it back to a tool, you know the the vendor side of it. So there are two, three different dimensions which are now coming up. One dimension, which is about, you know, bring more and more signal towards it. So that your SRE will become much more equipped. And obviously a natural extension to that, build more and more AI capabilities, et cetera. Do that. you know You get a lot more within the same platform. You can call it a unification. And I think it is a good move one way is because you know you're you ah enabling your team and you know bringing your team up to the speed becomes ah much quicker in a way.
01:01:40
Speaker
And the other evolution that I'm seeing is also going light. So some of them are checking their load and experimenting with the smaller components of their own. A lot of equations are also happening towards or within this domain. So one side, if you look at it, I think there is a also one thought that why to have a massive one solution all type of situation, let's have more modular approaches, which are, you know, to the point will create and also should you want, you can create. So and but but those are not happening with the older technology, they are happening with EBPF, they are happening with, you know, the yeah the the other kind of technologies. So um ah whether you have a security observability on layer seven or kernel level, you know, or so those kind of things. And all of these are part of observability only. So
01:02:38
Speaker
You'll see one domain is coming up as API security in production, right um API security observability in production. In test and development, it used to be there anyway. And then runtime security, and all of these are observability extensions only. So like people are ah vendors are experimenting towards this.
01:02:58
Speaker
to create a lot more meaningful information for their buyers. And because they understand now, you will have to talk the language or you will have to solve a problem. You can't just push your platform. right So they are now looking at, okay, you have a security problem. I only provide you security. You don't want to buy APM or full-stack from me. This is ok okay.
01:03:22
Speaker
So that ma is a lot more, you know, centric. But I think this these two approaches are also targeted towards a different type of customer. Some customers are large and they need ah more unification. right But then there are quite a few customers who are, you know, they are digital native, native for example, or fully cloud.
01:03:45
Speaker
what will you do there with your you know NMS and those kind of tools. So you just need you know specific security, runtime executions, maybe maybe BPF can help only on understanding part-to-part communications etc etc i will do. So I think the preferences are changing because of these and so the vendors are changing and the evolutions are also happening but I I see a lot of scope on um both of these and I think these will continue to evolve in both ah pockets. It's just that the focus of of the vendors are different. Also, there are personas of buyer's fault as well, right? So some of the um of these vendors target different personas and hence, but one thing I'm very, very clear that the time for
01:04:37
Speaker
larger so but I don't think the future will will be based on how big is your and you know observability product, how many customers you have got in full-stack observability. It doesn't really matter. right Now, the the this technology is getting them democratized so much that everyone has an equal playing field.
01:04:58
Speaker
Very true. Very, very true. So ah why this is whole thing is evolving, right? ah I also wanted your view in terms of if you, obviously this whole thing is evolving, but if you put the mix of AI or Gen AI, right? What sort of thing you believe or if you have any specific use cases you believe um then AI or AI will actually be able to use Pulfor. Can you talk about some of these things which is happening? Yes, yes. So some of that and um I'm guessing that you guys are also working towards that. um See, because you have a massive data, one thing that you will always have to do is about understanding the pattern right of the data.
01:05:46
Speaker
So ah from the AI standpoint, um yeah because of the volume and you want to make sense out of that data, you will have to rely upon yeah you know AI algorithms and engines and and and models to understand what patterns are coming in. And that will lead you to make it much quicker, what we call it RCA. Either you do it through correlations or you do it through causation. right And one has to continue evolving these models.
01:06:23
Speaker
But this cannot be an independent thought process and run you know another GPU to just do this stuff. It has to be ongoing, even if it is infraction. I think i think the the important thing here is about the the cost and speed is not about so much of quality.
01:06:44
Speaker
Because it's okay to have one causation drama or one correlation drama. um But it does not make sense that for one causation correlation you are paying you know hundreds of dollars or thousands of dollars just to make one inference of that.
01:07:00
Speaker
na So, AI side, I think these are the areas on observability which will continue to work. There is one area that I'm particularly very, very excited and I'm kind of working on. Bharath is towards Andijani I use case, by the way.
01:07:18
Speaker
And you can have multiple Gen A use cases, just having a chatbot asking, what is my top 10 and top, you know, I mean, those are just, I don't know if I, vanity way of doing this. Here is one thing that I think I find it really useful. So every single place I go to the customer environment, by the way, even if I do you know, all that. And because every environment is very, very different, the instrumentation is different, applications are different. When the data starts coming into the collector, there are, especially in the open architectures, something or the other goes wrong. yeah So I think what next evolution has to happen and something I'm working on right now is the
01:08:07
Speaker
is a AI based listener which can understand what data and what type of data is coming in and can I spin off a profile or a doctor or a transformer or a collector itself which can ingest the data on the flight. Okay. Okay. Yeah. That will lead to a massive shift left movement in observability. ah Because in my opinion, most of the CTOs and engineering leaders don't build the telemetry in their application because they fear that even if they build, it can't be consumed by the vendors, um you know, collectors straight away. But I think something that you can do
01:08:57
Speaker
is on consumption side of it from you know this telemetry consumption and that can it it can help you know ah generate a new specification for for collector itself. So the other side is so just a question on that. What you're trying to say is that it will try understanding what type of data is coming in and then process it. That's what you are that we build up, build a new collector, build a transformer. Right. This is very, very useful, actually. OK. Yeah, please. The other one is on a consumption side.
01:09:32
Speaker
Do you seriously believe, and i'm I'm just thinking about it, and I've been thinking for a few years now, that who really sits there just to keep monitoring what is going wrong on the dashboard? Right. Doesn't make sense. got And if you configure your incident, there are always chances of missing one or the other. Other is that there are so many dashboards already. How can I consume? Then there are automation tools as well.
01:10:01
Speaker
So you have observability data. It is being consumed by multiple different ways. There are business users. And at least in your case, I know there are business users who consume that data, right? Then there are IT ops, right? Then there are leadership, IT really leadership. And in some customers, I know these data go into board meetings also. Yes. And by the way, in government, I was working with one of the customer. He said PMO dashboard hes also has some observability data. I don't know if it was systems observability or Yeah, but my point is that consumption of that data also is a pain.
01:10:39
Speaker
API is only set way of taking that data. Why can't chat GPT or let's say your own GPT models allow you to just define a specification that creates um API on the fly, give it to you, deploy it, host it, give it to you, secure it, all different stuff. so So both sides of um you know the the large language-based models can work. And in between, in my opinion, only the standard ah you know AI models will be required. I don't think you will require too much of re-engineering or forecasting what will go wrong in 2026 if I continue doing that stuff.
01:11:24
Speaker
So again, same is a different thought process. But if you really want to make it proactive, make it proactive from users to adopt it. Make it proactive from users to be able to build telemetry or your engineers to be built telemetry, um you know, in your code base itself, but rather than just, you know, CIO thinking that what will go wrong next week if I, if my sales starts.
01:11:51
Speaker
Very true, very true. So, I think I really like the first one, which you talked about, that that that will solve a massive problem, ah like you said, right? That how do... ah because I just can't send data, I will have to do a lot of configuration, a lot of things to make something work, right? um So, that will be very, very interesting to see. Whenever you are done, please do show it to us as well.
01:12:17
Speaker
more way That's a really interesting thing.

Future Directions: AI and Automation in Observability

01:12:21
Speaker
ah I just want to talk about a little bit of use case, what we were thinking from the Gen AI perspective and just see if this will be something which will be useful. So what we were thinking is that ah there is a lot of observability data available. right And then we have already have AI models to do some sort of correlation.
01:12:44
Speaker
ah contextualization, add this business part to it and then maybe give a sort of RCA. If there is any indicator lagging or if there is a problem happening in terms of either failures are happening more or the transactions are taking more time and so on. right So you get in a root cause analysis.
01:13:02
Speaker
Now, this root cause analysis can never be ok, this is the root cause right, it is always probable because you you you may not even have the data because of which the problem is happening right. So, you will come up with certain probable root causes and ah the extension to this is ah that root cause is known, now how do I resolve this? So, we are looking at, I mean in fact the work has happened to a large extent, we are looking at sort of bringing in recommendability into the system.
01:13:36
Speaker
in terms of, okay, this root cause has happened. And this root cause is maybe, let's say, a service is taking more CPU or a database is having deadlock or a user log or a, let's say, WebLogic or any middleware is having some sort of hunt threads and so on. right And then the recommendability will basically recommend action in terms of for this type of issue. These are the recommended action, right? And then maybe like you talked about the AIOps part, if we can grit some sort of ah get some sort of automation script for each of those recommendability, we can actually let people run, OK, I want to do this recommendation right then and there. a So though that sort of thing is what we were we have ah working on. And I think you get into beta very soon now.
01:14:29
Speaker
yeah Yeah, so now, of course, this is the need of the Arab. I'll tell you, um and as per some statistics, i I think this is skills engineering statistics somewhere, that 98 or 99% of the severity one issue, the severity zero issues, do not see the RCA, ever.
01:14:51
Speaker
You're so busy into this, right? and Some data is available. It it is somewhere in 90s. I may be wrong on 98, 99%, maybe 92, 95%, whatever. But that's a big number. That's a massive number, right?
01:15:05
Speaker
um So the only way you can solve it is by using AI. And if you do not know what has caused it, or if you don't have the RCA, yeah you can't really fix it. right Because the issues are so intermittent.
01:15:21
Speaker
And that is where I think AI can definitely help. So, kudos to you guys. I think the this part is really bang on. right My only worry that I have here, see, use the right trainable models and you know right and whether it is large language model or short language model or small language models, ah use RAG based on those methods, generating generating the next section is trickier.
01:15:55
Speaker
na it has a lot of context to it. okay So how it is configured and can you really straight take it to automation? I think eat it some time. Because even today also, even if I am a no brainer, everyone knows this, you know, you do a copilot with your codes and still look at the copilot code and maybe 50-60% times it is correct, whatever it is suggesting, not more than that. That too is just a smaller context. yeah Talking about here, the bigger context, you don't have as much a data. you will not so so they So the practicality of recommendation is something that I will be wary of right now, but I think it will mature.
01:16:47
Speaker
from a park process standpoint, I think ah you're you're bang on to this. And that's what I was also explaining, right? You should be able to generate those stuff. they thank But the only problem is that if you want to generate it, there are a lot of variables to it. And the other area that I think, and I was explaining this to somebody else also,
01:17:06
Speaker
and you know the AI hype that we have right now, the LLM hype that we have right now and the investment that every company is making, they're still looking for a right use solution use case sorry right use case for that for for those investments. So I think it will get sucked up into that.
01:17:26
Speaker
this recommendation engine probably will get sucked up into that so they can probably take this and so as a part process I understand this but as a independent product or tool how much viability it will have especially given that all organizations now have a chief AI officer and you know those kind of profiles who are responsible for AI initiatives, they want to probably and consume this side of it and that is where I'm slightly wary of. This is my genuine feedback. but then But then I think we only as observability product people only understand what data it is and how it makes sense. right
01:18:09
Speaker
So at least from that perspective on correlation, causation, providing you the RCA, those kind of stuff, they're bang on to this. Right. and Thank you so much for that genuine feedback. ah I mean, we all know the accuracy you get through AI, right? And the iteration required, the training required and then sort of supervised learning required to really reach to that place where things will be. So what I'm trying to say, and so i'm trying to say that don't bring something that will create another fatigue.
01:18:41
Speaker
You know, it's like few years of learning and then say, Oh my God, this is just giving me only so much. Yeah. um the india So because the domain domain is very narrow, right? So the outcome for you to do this, for example, if something goes wrong in Kubernetes world, you just restart the part, got right? Is that is already automated. So of the world is moving towards already. And then there are AI on that as well.
01:19:09
Speaker
So a lot of that is already being taken care of. But what is not done, and I think it must be done, and it has a merit, is about the RCA part of it. Building that up, creating meaningful information so that customers can consume. I think 70-80% of what you said is already there, bang on.
01:19:27
Speaker
i maybe maybe i'm just Maybe I'm just living in an old world. maybe Maybe customers will like it. I don't know the outcome, what you're talking about in a recommendation side of it. the Recommendation is OK. I'm saying automation. Up till recommendation is OK. Just the automation and vary off. Because that can vary. Right now, i'll just I would want to pause it here to give it a space ah for AI to mature, for data to come.
01:19:56
Speaker
and then probably take that journey. to ah Up to recommendation, I think it here. The use cases are pretty vanished. Now, I want to talk about a bit future. So if you were ah to ask you, like, what would be top three observability trends you see in 2025 and going forward or going beyond 2025, what would those be?
01:20:19
Speaker
um Seriously, one is You know, adoption of open source, I think that is coming up. Open standards, what I mean. Right. The source has got so much of meaning. So I'll say open standards. Right.
01:20:35
Speaker
and I think it just has to happen now. There is no way you can avoid it. kind of um The other is ah finding those non-intrusive lightweight methods of capturing the data with the rebpfs. Some of these are open standard, still not fully open standard, but yeah um but moving towards more and more lightweight programs, capture lesser information. and then
01:21:07
Speaker
absolutely about more and more solution oriented problem you know addressing the problem I think it's gonna go surgical now okay it has been wide enough for long time hmm focused So customers would start looking for solving and I've seen by the way a lot of large customers who were paying a lot of money to bigger biggies by the way billions of dollars have started looking for a smaller vendor smaller problem and then they know
01:21:40
Speaker
they reduce the scope of a larger vendor and say only few things and then smaller smaller pieces of course they reduce their burden na um in terms of their spendings and then they kind of have created their solution oriented it's not only burning all the money.
01:21:58
Speaker
right right Right. Okay. Cool. This is pretty interesting. um ah We are towards the end of the podcast and this is the question which we generally ask almost every guest um about their favourite book. Something which basically they always go back to or something which they gift to people, something which they always recommend.
01:22:23
Speaker
oh By the way, I just gifted the, I don't know what this question is. I have several favorite books, by the way. My ex-boss used to gift me a lot and there are quite a few books.
01:22:41
Speaker
um And there are quite a few with the recent one that the one that I have gifted to people was mindset. Okay, as a Carol Dweck book. um I love the book is because it, you know, it challenges your mindset, it allows you to be present and you know have ah ah in your In your psychology, become more mindful okay um and practice in ah in a build business sense. Essentially build the mindset, the one that you want.
01:23:17
Speaker
and And there is a another one which, you know, i I love it and a lot of time if you, you know, you would just you would have seen this book for the name of the writer, by the way, or author is. It's called Extreme ownership. It's written by an ABC. Got it. Yeah, I give that book to quite a few people. And I love that book. um as good and That
01:23:45
Speaker
you know i'm I'm a salesperson. right so just It naturally comes to a salesperson to take the ownership, has to you know own things from you know building that up to delivering and standing to your commitment. um and And I think I love that book.
01:24:06
Speaker
um all the context it is, you know owning what you, and or or in a way and preaching what you believe, or doing what you, know standing for what you believe. yeah right I have heard about both the books, but both are in my reading list. I have not i reached there. My site is a really quick reading, by the way, a very, very quick read. um I keep gifting that to a lot of people. um It's a small book. I think it must be about four.
01:24:40
Speaker
couple of hundred pages and okay maybe a couple of days read is very easy going but I think it challenges you to think about psychology of and building right mindset because in our type of jobs and corporate and there's a lot of points for one to just slow down and ah start doubting yourself and start thinking too much This allows you to stay focused and and stay on or build a mindset which allows you to get going right or even for what you want to know.
01:25:14
Speaker
Cool. Thank you. Thank you so much Amit for your time, insights and amazing thought-provoking view into observability. um I am sure that we would definitely want to host you again sometime um because of the way the whole observability domain is changing and so much to learn from you, right?
01:25:36
Speaker
So ah thank you so much for your time this time. and a problem but definitely been It was all my pleasure. there um As I said, and you also need to talk that language, right? This is a TCP connection, so you'll have to acknowledge and that what I'm sending to you. If you don't, then and but this was an interesting world. um
01:25:59
Speaker
Unfortunately, very few whom you can talk to about this. right um ah So, this this this side of the world where you have to manage, maintain, make it reliable is something is evolving, as I said. And I would love to, you know,
01:26:15
Speaker
ah Again, come back on your show or in person, etc. We'll connect on this topic whenever you feel yeah appropriate. Definitely. Thank you so much. Thank you. Thank you for hosting.
01:26:32
Speaker
Hope you found my discussion with Amit insightful. If you did, please consider sharing it with your colleagues. For more information about BUNET systems, please visit us at www.bunetsystems.com. Thank you.